Comparaison des versions

Légende

  • Ces lignes ont été ajoutées. Ce mot a été ajouté.
  • Ces lignes ont été supprimées. Ce mot a été supprimé.
  • La mise en forme a été modifiée.

...

     ~/modeles/GEMDM/version/bin


2) Fortran partexecutable

In the Fortran part, every MPI process (there is one MPI process per "tile") writes its own listing. Once the executable (Fortran part) finished running, MPI will collect the listings from all the processes and add them to the main model listing, ${GEM_exp}_M*. To be able to see which line was written by which process, all lines are preceded by the number of the process, for example:

...

If the model stopped in the Fortran part, most of the time you can find an error message at the end of the listing of process 0. To get there jump to the end of the listing and then search backwards for the end of the listing of the main process. (When using 'vi', 'vim' or 'less' you can jump to the end by pressing 'G' and then search upward with '?00000:'). But even from the end of the main model listing you might still have to look several lines up to find an error. However, once you reached a line saying:
    THE TIME STEP  n IS COMPLETED
there is probably no error above anymore and then you will have to look into the listings of all the other processes.


If the executable started running but was not able to finish the first timestep, meaning if you do not have at least one line saying:

    THE TIME STEP  n IS COMPLETED

it is possible that there was a machine problem and the MPI processes could not all get started or that your restart files are corrupted. If this happens for the first time for a given month you can just restart the simulation. But if this happens more than once I would restart the simulation from the previous month, assuming there is a problem with the restart files.


If the executable stopped somewhere in the middle and you cannot find an error message but the last line of the listing of the main process says:

    OUT_PHY- WRITING ...

chances are the model got stuck while writing the output. In that case it might be enough to restart the simulation.
However, it is still a good idea to to check out listings of ALL processes.


If the model stops more than once at the same timestep have a look at the listings of ALL processes to see what went wrong.


When the Fortran part finishes fine, you will see the following messages at the end of the main process listing:

    oe-00000-00000: Memory==> ...
      :  
    oe-00000-00000:  __________________TIMINGS ON PE #0_________________________
      :  
    oe-00000-00000:  .........RESTART
And then a big '****' box with an "END EXECUTION" inside.
INFO: END of listing processing : Mon Aug 30 17:13:58 EDT 2021
INFO: RUN FAILED
INFO: first 10 failing processes :
fail.00000-00062.00062 fail.00000-00103.00103




3) Second shell part

The second shell part starts with the lines:

Volet
   :
==============       end of parallel run       ==============
INFO: END of listing processing :

...

date & time

However, sometimes there can be an error message like:

Volet
==============       end of parallel run       ==============
INFO: END of listing processing :

...

date & time
INFO: RUN FAILED
INFO: first 10 failing processes :
fail.00000-

...

...

...

fail.00000-...

In the Fortran executable finished well - see section above - you can ignore this "FAILED" message.00103.00103