Comparaison des versions

Légende

  • Ces lignes ont été ajoutées. Ce mot a été ajouté.
  • Ces lignes ont été supprimées. Ce mot a été supprimé.
  • La mise en forme a été modifiée.

Sommaire

top

To check cpu and memory usage of active jobs on a server use the command 'top':
    top

...

  • PID: Process ID.
  • USER: The owner of the process.
  • PR: Process priority.
  • NI: The nice value of the process.
  • VIRT: Amount of virtual memory used by the process. On our servers, currently, the maximum virtual memory a job can use is 25% of the total memory. Which means 64 GB on most of our servers.
  • RES: Amount of resident memory used by the process. This is the actual memory your process is using!!!
  • SHR: Amount of shared memory used by the process.
  • S: Status of the process. (See the list below for the values this field can take).
  • %CPU: The share of CPU time used by the process since the last update. Can go up to a little more than 100%. When using shared memory parallelism (OpenMP) this value can go up to 100% times the number of shared memory processes.
  • %MEM: The share of physical memory used.
  • TIME+: Total CPU time used by the task in hundredths of a second[minutes:seconds].
  • COMMAND: The command name or command line (name + options).

In theory, all users together, including the system, can use up to almost 100% of the total memory before things are starting to get really really slow. But since one user usually does not know what all the others are doing, we ask each user not to use more than 25% of the total memory for all of her/his processes together.

For as long as the CPU time (column: 'TIME+') keeps increasing, you do not have to worry about jobs with the status (column: 'S') of 'R', 'S' or 'D'. But if the CPU time stops increasing for a while, you should check if this job is still needed or if you can terminate it - especially if it uses several % of memory.

Jobs with a status of 'T' or 'Z' should always get killed.

And if you see you have processes running that you recognize(!) that should not be there anymore, they are probably zombies and you should kill them.


The following is from:
      https://www.howtogeek.com/668986/how-to-use-the-linux-top-command-and-understand-its-output/

...

  • D: Uninterruptible sleep
  • R: Running
  • S: Sleeping
  • T: Traced (stopped)
  • Z: Zombie

ps & kill

You can check which processes you have open with:
   

Volet
ps -fu $USER | less


If you cannot find where you opened a process to close it down properly you can kill it with 'kill'. You only need to kill the master process. For example, if you get something like the following:

                  parent

UID          PID    PPID  C STIME TTY          TIME CMD
username 1460822       1  0 May23 ?        00:01:48 tmux
username  945179 1460822  0 May25 pts/20   00:00:00 -bash
username  945272  945179  0 May25 pts/20   00:37:36 /sca/.../jupyter-notebook --no-browser
username  969070  945272  0 May25 ?        00:10:16 /sca/.../python -m ipykernel_launcher -f /.../kernel-...json
username  987828  945272  0 May25 ?        00:09:37 /sca/.../python -m ipykernel_launcher -f /.../kernel-...json

In the example above, the PID (process ID) '1460822', is the main master process. It does not have a "parent", the PPID (parent process ID) is 1. This is the one you need to kill, then all it's "children", "grandchildren" and "great-grandchildren" etc. will get killed as well.

Note that processes are not always sorted in order!

Sometimes, it happens that processes do not have a parent anymore, then you need to kill them with their own PID.

The command to kill a process is:
 

Volet
kill -9 PID

So, for the example above:
    kill -9 1460822