Comparaison des versions

Légende

  • Ces lignes ont été ajoutées. Ce mot a été ajouté.
  • Ces lignes ont été supprimées. Ce mot a été supprimé.
  • La mise en forme a été modifiée.

...

The command to submit a job/script with soumet could look like this:

Volet
    soumet jobname [ -t time_in_seconds  -listing listing_directory  -jn listing_name  -cpus number_of_cpus  -mpi ]

...

  • 'jobname' is the name of the shell script, also called job, you want to submit.
  • '-t  time_in_seconds' specifies the wallclock in seconds time you acquire for the job. Even if the job is not finished after this time expired the job will get terminated. So better always ask for enough time. However, on larger clusters like Compute Canada clusters jobs asking for more time will be queued longer.
    On our UQAM systems the default wallclock time for single CPU/core jobs is 10 days. For multi core jobs the default time is 1 minute.
    When running on clusters of The Alliance check out their wiki : Time limits on clusters of The Alliance
  • '-jn  listing_name'  specifies the name of the listing or log file of the job. Everything that would appear on the screen when running the job interactively will now get written into this file. The default listing name is the basename of the job.
  • '-listing  listing_directory'  specifies the directory in which the listing will get written. The default directory is:
           ~/listings/${TRUE_HOST}
    If you want to use the default directory, you should first create the directory ~/listings and then create a symbolic link inside that directory, pointing to a place where you have more space, for example to a directory (that you have to create!) under your data space:
            mkdir -p /dataspace/Listings
            mkdir ~/listings
            ln -s /dataspace/Listings ~/listings/${TRUE_HOST}
    Replace 'dataspace' with the full name of your data directory.
  • '-cpus  number_of_cpus'  specifies the number of cpus you want to use when running the job in parallel using MPI and/or OpenMP. The syntax is the following:  MPIxOpenMP
    For example, if you want to use 4 MPI processes and 2 OpenMP processes you would write:  -cpus 4x2
    If you want to use pure MPI with MPI=4 processes you would write:  -cpus 4x1  or simply  -cpus 4
    If you want to use pure OpenMP with OpenMP=2 processes you would write:  -cpus 1x2
    The default is 1x1.
  • '-mpi'  needs to get added when running a script (or the executable that will get executed within the script) with MPI.

...

To get more information about the command simply execute:

    soumet -h

Check on jobs

As said above, also the way to check (and kill) jobs depends on the queueing system. Therefore, we created a script called 'qs' which you can find in my ovbin:

...

On our UQAM servers as well as on Beluga clusters of The Alliance (when using the RPN environment) you already have an alias pointing to the above command, called:

    qs

The column 'ST' or 'state' shows the status of the job:
    PD : pending
    R    : running
    C    : cancelled

The column 'TMAX' or 'wallclock' shows the wall clock time. The run time the job requested. Once this time runs out the job will get terminated.

Kill a job

Check the job-ID number with 'qs' or '~winger/ovbin/qs' and then use 'qdel' to kill your job:
  qdel job-ID

Depending on the machine, this might be an alias again.

...