How to find where/why a simulation crashed/stopped
1) Check if the model stopped in the scripts or model job
Go into your listings directory:
cd ~/listings/${TRUE_HOST}
For example on Beluga/Narval:
cd ~/listings/Beluga
resp.
cd ~/listings/Narval
List all the listings of the month that failed:
ls -lrt ${GEM_exp}_[MS]*
Open the last listing in your editor or with 'less'
If the model stopped in the ...
a) Scripts listing ${GEM_exp}_S*
- Jump to the end of the listings (when using 'vi', 'vim' or 'less' you can jump to the end by pressing 'G')
- Search upwards until you find an error message
b) Model listing ${GEM_exp}_M*
Jump to the end of the listings (when using 'vi', 'vim' or 'less' you can jump to the end by pressing 'G')
Each model job consists of 3 main parts:
- It starts with a shell code,
- followed by the Fortran executable,
- followed by another shell part.
If all goes well, the first shell part ends with:
INFO: MPI launch after 4 second(s)
INFO: START of listing processing : Mon Aug 30 12:33:56 EDT 2021
============== start of parallel run ==============
2. Fortran part
Jump to the end of the listings (when using 'vi', 'vim' or 'less' you can jump to the end by pressing 'G')