Comparaison des versions

Légende

  • Ces lignes ont été ajoutées. Ce mot a été ajouté.
  • Ces lignes ont été supprimées. Ce mot a été supprimé.
  • La mise en forme a été modifiée.

Sommaire

top

To check cpu and memory usage of active jobs on a server use the command 'top':
    top

To only see only jobs of a specific user you can call it with:
    top -u username
Alternatively, you can also press 'u' once top is open. In the 6st line you will then see :
    Which user (blank for all)
Then you can just start typing or copy-pasting the username.

By default, 'top' sorts all jobs by cpu usage. To sort them by memory usage just type 'M' (capital 'm') once 'top' is open.

To quit 'top' just press 'q'.

Output explanation

The column headings in the process list are as follows (most important ones are in red):

  • PID: Process ID.
  • USER: The owner of the process.
  • PR: Process priority.
  • NI: The nice value of the process.
  • VIRT: Amount of virtual virtual memory used by the process. On our servers, currently, the maximum virtual memory a job can use is 25% of the total memory. Which is means 64 GB on most of our servers.
  • RES: Amount of resident resident memory used by the process. This is the actual memory your process is using.!!!
  • SHR: Amount of shared memory used by the process.
  • S: Status of the process. (See the list below for the values this field can take).
  • %CPU: The share of CPU time used by the process since the last update. Can go up to a little more than 100%. When using shared memory parallelism (OpenMP) this value can go up to 100% times the number of shared memory processes.
  • %MEM: The share of physical memory used.
  • TIME+: Total CPU time used by the task in hundredths of a second[minutes:seconds].
  • COMMAND: The command name or command line (name + options).

In theory, all users together, including the system, can use up to almost 100% of the total memory before things are starting to get really really slow. But since one user usually does not know what all the others are doing, we ask each user not to use more than 25% of the total memory for all of her/his processes together.

For as long as the CPU time (column: 'TIME+') keeps increasing, you do not have to worry about jobs with the status (column: 'S') of 'R', 'S' or 'D'. But if the CPU time stops increasing for a while, you should check if this job is still needed or if you can terminate it - especially if it uses several % of memory.

Jobs with a status of 'T' or 'Z' should always get killed.

And if you see you have processes running that you recognize(!) that should not be there anymore, they are probably zombies and you should kill them.


The following is from:
      https://www.howtogeek.com/668986/how-to-use-the-linux-top-command-and-understand-its-output/

...

  • D: Uninterruptible sleep
  • R: Running
  • S: Sleeping
  • T: Traced (stopped)
  • Z: Zombie

ps & kill

You can check which processes you have open with:
   

Volet
ps -fu $USER | less


If you cannot find where you opened a process to close it down properly you can kill it with 'kill'. You only need to kill the master process. For example, if you get something like the following:

                  parent

UID          PID    PPID  C STIME TTY          TIME CMD
username 1460822       1  0 May23 ?        00:01:48 tmux
username  945179 1460822  0 May25 pts/20   00:00:00 -bash
username  945272  945179  0 May25 pts/20   00:37:36 /sca/.../jupyter-notebook --no-browser
username  969070  945272  0 May25 ?        00:10:16 /sca/.../python -m ipykernel_launcher -f /.../kernel-...json
username  987828  945272  0 May25 ?        00:09:37 /sca/.../python -m ipykernel_launcher -f /.../kernel-...json

In the example above, the PID (process ID) '1460822', is the main master process. It does not have a "parent", the PPID (parent process ID) is 1. This is the one you need to kill, then all it's "children", "grandchildren" and "great-grandchildren" etc. will get killed as well.

Note that processes are not always sorted in order!

Sometimes, it happens that processes do not have a parent anymore, then you need to kill them with their own PID.

The command to kill a process is:
 

Volet
kill -9 PID

So, for the example above:
    kill -9 1460822