Comparaison des versions

Légende

  • Ces lignes ont été ajoutées. Ce mot a été ajouté.
  • Ces lignes ont été supprimées. Ce mot a été supprimé.
  • La mise en forme a été modifiée.

...

If the Fortran executable finished well - see section above - you can ignore this "FAILED" message.

Problem with memory, time or node

If you cannot find any error message in the model listing, check the listing ending on *.s. When having submitted the simulation with Chunk_lance have a look at the listing 'cjob_*.s'. When all went "well", this listing will be empty. But sometimes you can find messages in these files like the following:

Volet
slurmstepd: error: *** JOB 17891032 ON nc20539 CANCELLED AT 2023-06-14T04:50:44 DUE TO NODE FAILURE, SEE SLURMCTLD LOG FOR DETAILS ***

=> Obviously a problem with a node. Just resubmit (continue) your simulation.

Volet
slurmstepd: error: *** JOB 13690472 ON nc30342 CANCELLED AT 2023-02-12T00:50:04 DUE TO TIME LIMIT ***

=> Your job ran out of time. If your jobs usually fit in the wall time you asked for this might be due to slow access to the filesystems. In this case you can wait until the filesystem problems have been solved or just resubmit and hope for the best. You can also ask for more walltime (BACKEND_time_mod) or run less days per job (Fcst_rstrt_S).
If you just started your simulations your should either ask for more walltime
(BACKEND_time_mod) and/or run less days per job (Fcst_rstrt_S).

Volet
slurmstepd: error: Detected 3 oom-kill event(s) in StepId=13861528.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.

=> You job ran out of memory. Ask for more MPI tiles (GEM_ptopo). You could also ask for more memory (BACKEND_cm) but this usually means that your jobs will be queued for much longer.

Common error messages and their meanings

1) Message:

Volet
oe-00000-00000:  size(pp,           1 )=       71280  high=       73062  low=           1
oe-00000-00000:  ERROR ERROR: gmm_create, requested dimensions differ from previous specification (res
oe-00000-00000:  tart/create)
oe-00000-00000:  ERROR: gmm_create, variable name ="XTH                             "

=> Possible reason: MPI-tiles too small

2) CLASS

Volet
oe-00000-00071: 0BAD CANOPY ITERATION TEMPERATURE     4 51          373.24   6   1
oe-00000-00071:      5301.31    384.41    315.63   1100.35      0.00   4224.56    234.27     13.73      0.00
oe-00000-00071:       373.24    281.25    273.15
oe-00000-00071: 0********  END  TSOLVC  ************************************************************************      -2