...
For example on Beluga/Narval:
cd ~/listings/Beluga
resp.
cd ~/listings/Narval
List all the listings of the month that failed
...
ls -lrt ${GEM_exp}_[MS][_.]*
2) Open the last listing in your editor or with 'less'
...
If you find any of the above error messages in your model listing there was most likely a problem with the machine and you can just restart the simulation.
2) Fortran part
In the Fortran part, every MPI process (there is one MPI process per "tile") writes its own listing. Once the executable (Fortran part) finished running, MPI will collect the listings from all the processes and add them to the main model listing, ${GEM_exp}_M*. To be able to see which line was written by which process, all lines are preceded by the number of the process, for example:
...
Sometimes the model crashes so badly, that MPI is not able to gather listings from all of the processes. If this happens you can find the listings of the processes under:
INFO: temporary listings for all members in directory_name
You can find the line above in your model listing!
In the directory 'directory_name' you have one directory per process, ?????, which contains the listing of said process.
If the model stopped in the Fortran part, most of the time you can find an error message at the end of the listing of process 0. To get there jump to the end of the listing and then search backwards for the end of the listing of the main process. (When using 'vi', 'vim' or 'less' you can jump to the end by pressing 'G' and then search upward with '?00000:'). But even from the end of the main model listing you might still have to look several lines up to find an error. However, once you reached a line saying:
THE TIME STEP n IS COMPLETED
there is probably no error above anymore and then you will have to look into the listings of all the other processes.
3) Second shell part