Hello
I'm puzzled by the following.
I'm trying to monitor the CPU load of a server and get a warning when the average CPU load is over 80% and critical when it's over 90%. To do this I added this line on the host's nrpe.cfg
command[check_load]=/usr/local/nagios/libexec/check_load -r -w 38.4,38.4,38.4 -c 43.2,43.2,43.2
I got these values using the y = c * p / 100 formula. Where c is the number of CPUs and p the desired threshold percentage.
The server has 48 CPUs, so if I want the value for 80%, I would use 48*(8/10)=38.4
Nagios is reporting this:
CPU OK 01-15-2024 16:19:27 0d 1h 7m 11s 1/3 OK - load average per CPU: 1.02, 1.02, 1.02
But when I do uptime on the server I can see that the load average is load average: 49.65, 49.12, 48.91
I'm confused. Something I don't fully understand here. Shouldn't the load average match the one displayed by uptime?
Many thanks for you help.
CPU_load
Re: CPU_load
Hi @kabamaru,
Looking at the documentation for the check_load plugin, it looks like the "-r" parameter you are using tells the command to "Divide the load averages by the number of CPUs (when possible)" -- Given that you have 48 CPUs, is it possible that would account for the discrepancy?
Looking at the documentation for the check_load plugin, it looks like the "-r" parameter you are using tells the command to "Divide the load averages by the number of CPUs (when possible)" -- Given that you have 48 CPUs, is it possible that would account for the discrepancy?
Re: CPU_load
Hi jsimon
Thank you for your time. You are right.
I have removed the -r and I can see that the values nagios is reporting are very similar to the ones that uptime outputs.
They can't be exactly the same because they are constantly changing.
Thank you so much for your help.
All the best
Thank you for your time. You are right.
I have removed the -r and I can see that the values nagios is reporting are very similar to the ones that uptime outputs.
They can't be exactly the same because they are constantly changing.
Thank you so much for your help.
All the best
Re: CPU_load
I'm glad we were able to figure that out for you! I'll go ahead and lock the issue then.