...
If you cannot find any error message in the model listing, check the listing ending on *.s. When having submitted the simulation with Chunk_lance have a look at the listing 'cjob_*.s'. When all went "well", this listing will be empty. But sometimes you can find messages in these files like the following:
Volet |
---|
slurmstepd: error: *** JOB 17891032 ON nc20539 CANCELLED AT 2023-06-14T04:50:44 DUE TO NODE FAILURE, SEE SLURMCTLD LOG FOR DETAILS *** |
=> Obviously a problem with a node. Just resubmit (continue) your simulation.
Volet |
---|
slurmstepd: error: *** JOB 13690472 ON nc30342 CANCELLED AT 2023-02-12T00:50:04 DUE TO TIME LIMIT *** |
...
=> Possible reason: MPI-tiles too small
2) CLASS
a) BAD CANOPY ITERATION TEMPERATURE
Volet |
---|
oe-00000-00071: 0BAD CANOPY ITERATION TEMPERATURE 4 51 373.24 6 1 oe-00000-00071: 5301.31 384.41 315.63 1100.35 0.00 4224.56 234.27 13.73 0.00 oe-00000-00071: 373.24 281.25 273.15 oe-00000-00071: 0******** END TSOLVC ************************************************************************ -2 |
b) Crash in aprep.f
If job was restarted from the restart file make sure that the permanent bus is still the same as before. To do that you can compare the current listing with the crash with the previous one (that should be archived in ${CLIMAT_archdir}/Listings/listings_....zip) with 'xxdiff'. On Narval you will have to load 'module add difftools' to get access to xxdiff.