Vous regardez une version antérieure (v. /pages/viewpage.action?pageId=115288104) de cette page.

afficher les différences afficher l'historique de la page

« Afficher la version précédente Vous regardez la version actuelle de cette page. (v. 6) afficher la version suivante »

When you type commands in a login shell (window/terminal) and see a response displayed, you are working interactively. This is the most common way of working. The main downsides are that one cannot disconnect while a process is running (it would get terminated) and if many users run many jobs interactively the computer might get "overworked", slowing down all processes. Therefore, it is sometimes more practical to send jobs in the background or to submit batch jobs.

Background processes

Send a job/process in the background

Processes that open windows like emacs, Matlab, xrec, xxdiff and others block further usage of the terminal (window) from which they were opened. To be able to continue using the terminal for other things one can send such processes in the background. Either right away by adding a '&' at the end of the command. For example:

emacs filename &
matlab &


If you forgot to at the '&' but still like to continue using the terminal you can send the process in the background with the commands:

    Ctrl-Z
followed by
    bg
(for background)


It is also possible to send commands like rsync in the background with:

    rsync [keys] source destination  > logfile 2>&1 &

The logfile will contain the output of rsync which usually appears on the screen.

Check running processes

Once a process which did not open it's own window is running in the background you cannot see it anymore in the terminal from which you started it. To see processes running in the background (as well as all other processes) you can use the command 'ps'. For example:

    ps -fu username

To get more information about 'ps' execute the command : man ps

Another way to see all running processes is with 'top'. For example:

    top -u username

Kill a background process

Once a process is running in the background you cannot terminate it anymore with Ctrl-C or Ctrl-D. If the process you sent in the background has its own window you can kill the window and with that usually the process. But if you sent a process in the background which does not have a window you do not even see it anymore in the terminal from which you started it. But you can see it with 'ps' or 'top' - see above.
Once you found the process you want to terminate you can kill it with:

    kill -9 Job-ID


Batch processes

Usually, the way to submit, check, and kill a job depends on the scheduling system and on the way it is installed.
To make your life easier, the RPN environment contains a set of tools that will do all these "adjustments" for you and that you can always get used the same way. They will basically do a "translation" for you.

Submit a batch job

To submit any job on our UQAM servers, always use the command 'soumet'.
On the Compute Canada clusters it is up to you if you want to use soumet or not.

"Soumet" can only get used to submit shell scripts! If you want to submit anything else, for example an executable, you need to create a script first which executes the executable.
The scripts that will get executed to not know from which directory the job got submitted. Therefore, if you want to run a script in a certain directory, make sure you 'cd' into this directory first!

The command to submit a job/script could look like this:

    soumet jobname [ -t time_in_seconds  -listing listing-directory  -jn listing_name  -cpus number_of_cpus  -mpi ]


Where:

'jobname' is the name of the shell script, also called job, you want to submit.

'-t  time_in_seconds' specifies the wallclock in seconds time you acquire for the job. Even if the job is not finished after this time expired the job will get terminated. So better always ask for enough time. However, on larger clusters like Compute Canada clusters jobs asking for more time will be queued longer.
On our UQAM systems the default wallclock time for single CPU/core jobs is 10 days. For multi core jobs the default time is 1 minute.

'-jn  listing_name'  specifies the name of the listing or log file of the job. Everything that would appear on the screen when running the job interactively will now get written into this file. The default listing name is the basename of the job.

'-listing  listing-directory'  specifies the directory in which the listing will get written. The default directory is:
       ~/listings/${TRUE_HOST}
If you want to use the default directory, you should first create the directory ~/listings and then create a symbolic link inside that directory, pointing to a place where you have more space, for example to a directory (that you have to create!) under your data space:

        mkdir -p /dataspace/Listings
        mkdir ~/listings
        ln -s /dataspace/Listings ~/listings/${TRUE_HOST}

Replace 'dataspace' with the full name of your data directory.

'-cpus  number_of_cpus'  specifies the number of cpus you want to use when running the job in parallel using MPI and/or OpenMP. The syntax is the following:  MPIxOpenMP
For example, if you want to use 4 MPI processes and 2 OpenMP processes you would write:  -cpus 4x2
If you want to use pure MPI with MPI=4 processes you would write:  -cpus 4x1  or simply  -cpus 4
If you want to use pure OpenMP with OpenMP=2 processes you would write:  -cpus 1x2
The default is 1x1.

'-mpi'  needs to get added when running a script (or the executable that will get executed within the script) with MPI.


To get more information about the command simply execute:

    soumet -h


Check on jobs

As said above, also the way to check (and kill) jobs depends on the queueing system. Therefore, we created a script called 'qs' which you can find in my ovbin:

    ~winger/ovbin/qs

On our UQAM servers as well as on Beluga (when using the RPN environment) you already have an alias pointing to the above command, called:

    qs

Kill a job

Check the job-ID number with 'qs' or '~winger/ovbin/qs' and then use 'qdel' to kill your job:
  qdel job-ID

Depending on the machine, this might be an alias again.


  • Aucune étiquette