CPU_load
Posted: Mon Jan 15, 2024 11:33 am
Hello
I'm puzzled by the following.
I'm trying to monitor the CPU load of a server and get a warning when the average CPU load is over 80% and critical when it's over 90%. To do this I added this line on the host's nrpe.cfg
command[check_load]=/usr/local/nagios/libexec/check_load -r -w 38.4,38.4,38.4 -c 43.2,43.2,43.2
I got these values using the y = c * p / 100 formula. Where c is the number of CPUs and p the desired threshold percentage.
The server has 48 CPUs, so if I want the value for 80%, I would use 48*(8/10)=38.4
Nagios is reporting this:
CPU OK 01-15-2024 16:19:27 0d 1h 7m 11s 1/3 OK - load average per CPU: 1.02, 1.02, 1.02
But when I do uptime on the server I can see that the load average is load average: 49.65, 49.12, 48.91
I'm confused. Something I don't fully understand here. Shouldn't the load average match the one displayed by uptime?
Many thanks for you help.
I'm puzzled by the following.
I'm trying to monitor the CPU load of a server and get a warning when the average CPU load is over 80% and critical when it's over 90%. To do this I added this line on the host's nrpe.cfg
command[check_load]=/usr/local/nagios/libexec/check_load -r -w 38.4,38.4,38.4 -c 43.2,43.2,43.2
I got these values using the y = c * p / 100 formula. Where c is the number of CPUs and p the desired threshold percentage.
The server has 48 CPUs, so if I want the value for 80%, I would use 48*(8/10)=38.4
Nagios is reporting this:
CPU OK 01-15-2024 16:19:27 0d 1h 7m 11s 1/3 OK - load average per CPU: 1.02, 1.02, 1.02
But when I do uptime on the server I can see that the load average is load average: 49.65, 49.12, 48.91
I'm confused. Something I don't fully understand here. Shouldn't the load average match the one displayed by uptime?
Many thanks for you help.