Sommaire
top
To check cpu and memory usage of active jobs on a server use the command 'top':
top
...
- PID: Process ID.
- USER: The owner of the process.
- PR: Process priority.
- NI: The nice value of the process.
- VIRT: Amount of virtual memory used by the process. On our servers, currently, the maximum virtual memory a job can use is 25% of the total memory. Which means 64 GB on most of our servers.
- RES: Amount of resident memory used by the process. This is the actual memory your process is using!!!
- SHR: Amount of shared memory used by the process.
- S: Status of the process. (See the list below for the values this field can take).
- %CPU: The share of CPU time used by the process since the last update. Can go up to a little more than 100%. When using shared memory parallelism (OpenMP) this value can go up to 100% times the number of shared memory processes.
- %MEM: The share of physical memory used.
- TIME+: Total CPU time used by the task in hundredths of a second[minutes:seconds].
- COMMAND: The command name or command line (name + options).
In theory, all users together, including the system, can use up to almost 100% of the total memory before things are starting to get really really slow. But since one user usually does not know what all the others are doing, we ask each user not to use more than 25% of the total memory for all of her/his processes together.
For as long as the CPU time (column: 'TIME+') keeps increasing, you do not have to worry about jobs with the status (column: 'S') of 'R', 'S' or 'D'. But if the CPU time stops increasing for a while, you should check if this job is still needed or if you can terminate it - especially if it uses several % of memory.
Jobs with a status of 'T' or 'Z' should always get killed.
And if you see you have processes running that you recognize(!) that should not be there anymore, they are probably zombies and you should kill them.
The following is from:
https://www.howtogeek.com/668986/how-to-use-the-linux-top-command-and-understand-its-output/
...
- D: Uninterruptible sleep
- R: Running
- S: Sleeping
- T: Traced (stopped)
- Z: Zombie
ps & kill
You can check which processes you have open with:
Volet |
---|
ps -fu $USER | less |
If you cannot find where you opened a process to close it down properly you can kill it with 'kill'. You only need to kill the master process. For example, if you get something like the following: parent
UID PID PPID C STIME TTY TIME CMD
username 1460822 1 0 May23 ? 00:01:48 tmux
username 945179 1460822 0 May25 pts/20 00:00:00 -bash
username 945272 945179 0 May25 pts/20 00:37:36 /sca/.../jupyter-notebook --no-browser
username 969070 945272 0 May25 ? 00:10:16 /sca/.../python -m ipykernel_launcher -f /.../kernel-...json
username 987828 945272 0 May25 ? 00:09:37 /sca/.../python -m ipykernel_launcher -f /.../kernel-...json
In the example above, the PID (process ID) '1460822', is the main master process. It does not have a "parent", the PPID (parent process ID) is 1. This is the one you need to kill, then all it's "children", "grandchildren" and "great-grandchildren" etc. will get killed as well.
Note that processes are not always sorted in order!
Sometimes, it happens that processes do not have a parent anymore, then you need to kill them with their own PID.
The command to kill a process is:
Volet |
---|
kill -9 PID |
So, for the example above:
kill -9 1460822