General description
Chunk_lance allows to run a sequence of monthly (or sub-monthly) model jobs in one big job.
...
The wallclock time for which one chunk_job will be running can be set in the file 'configexp.dot.cfg' with the parameter 'BACKEND_time_mod'.
ChnkChunk_lance offers the option of re-executing a model job multiple times in case of a crash. The key '-attempts' can be used to tell Chunk_lance how many time to re-execute a model job.
'-attempts 1' means just one attempt,
'-attempts 1' means two attempts, etc.
Depending on the version you are using the default is 1 or 2. You can check the default with the command:
| Volet |
|---|
| Chunk_lance -h *** SEQUENCE D'APPEL *** Chunk_lance [positionnels] IN -attempts [1:1] Restart attempts IN -start [0:1] Restart simulation from beginning IN -nosoumet [0:1] Do not submit but execute chunk_job [-- positionnels] |
...
Since the running job is now always called "cjob_${exp}_..." one cannot see anymore from the job name how far the simulation has progressed. But one can always have a look at the listings directory and also a log file is kept in the config file directory called 'chunk_job.log'.
Restart using Chunk_lance
In case a simulation stops and you want to find out which job (scripts, entry or model) crashed, you have have look in the listings directory (~/listings/${TRUE_HOST}). Check which of the following jobs has crashed:
...
In any case, you can restart your simulation by simply executing
Chunk_lance
again in the config file directory.
Of course AFTER you fixed the problem - unless it was a machine problem. In the latter case, just restart the simulation with 'Chunk_lance'.
Restart from previous restart file
Continue simulation from a restart file:.
Note, in the description below replace all text in cursive/italic and all '...' with the actual names!
- Copy restart file of previous month (if there is more than 1 part, copy all parts!) from the archive (
- You will find all previous restart files under
- ${CLIMAT_archdir}/Restarts ) back
- Go into the execution directory "
- :
cd ~/MODEL_EXEC_RUN/$TRUE_HOST " - Gunzip and unarchive (cmcarc -x -f ...) the restart file(s) in ~/MODEL_EXEC_RUN/$TRUE_HOST
The cmcarc-command will create a new directory, but the *.ca file will remain in the directory. You can remove it again. - Untar the restart file from which you want to restart your simulation. With something like:
tar xvf ${CLIMAT_archdir}/Restarts/....tarz - Go into your Go into the config file directory
- Edit the log file 'chunk_job.log':
(First I suggest to make a backup copy of the log file. Just in case.)
The Then remove all lines concerning the month you want to rerun and as well as all following lines. So the last line should contain something like:
... previous_month_M finished at ...
Make sure the last line is not an empty line! - Still in your config file directory, you also need the script called:
${GEM_exp%_*}_month-to-rerun_S
If you do not have this script anymore in your config file directory you can find it in the archive in the file:
${CLIMAT_archdir}/Listings/jobs_....zipIf you are running the entry in parallel, you will also have to remove all the ${exp}_entry_finished flags in your config file directory for all the months you want to rerun. Otherwise the entries for these months will not get rerun! - Execute "Chunk_lance" again (without the '-start'!!!)
...