NagiosXI dint triggered notification

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
apteancloud
Posts: 47
Joined: Wed Sep 09, 2020 4:05 am

NagiosXI dint triggered notification

Post by apteancloud »

Hi Team,

NagiosXI dint triggered notification on CPU load spike for one of our servers in Azure, As checked in Azure metrics we can see CPU utilization spiked up to 98%, and due to the CPU load spike, sever was hung for an hour and we had to reboot it. Please find the Nagios plugin we are using.

The alert dint triggered on Nagios, at that particular time frame it was 17% in Nagios performance graph

Code: Select all

[nagios@NagiosXIAzPrd ~]$ /usr/local/nagios/libexec/check_nt -H 10.179.1.33 -p 12489 -s "sprt575" -v CPULOAD -l 15,85,90
CPU Load 3% (15 min average) | '15 min avg Load'=3%;85;90;0;100
Attached are both the Azure metric graph and Nagios Performance graph at the same time frame. Please check on this

PFA

Thanks in Advance
You do not have the required permissions to view the files attached to this post.
dchurch
Posts: 858
Joined: Wed Oct 07, 2020 12:46 pm
Location: Yo mama

Re: NagiosXI dint triggered notification

Post by dchurch »

How many CPU's are in the host?

Because the spike only reached ~20%, it seems to me that the load is being calculated by NSClient as being across all CPU's (absolute maximum being 100%) whereas I think your assumption was that it was that it would be in terms of individual CPU's, e.g. 100% for 1 CPU pegged, 200% for 2 CPU's pegged, etc.

You could try lowering the average time scale to, say 5 minutes, and decrease the check interval too. With a 15 minute average, the CPU would have to be pegged for 7 minutes straight to get the needle to move to 50%. So it would become -l 5,85,90

I'm not sure why the value is different between the Azure console and what Nagios captured. Perhaps NSClient is miscounting the CPU's? You could try decreasing the thresholds to 12% to work around this.

Really, though, NSClient (is deprecated, insecure, and hasn't been maintained since 2014. I'd consider replacing it with NCPA or NSClient++. You may have better results with NCPA, since I know that actually gives you an option to report on load averaged across CPU's, or summed.
If you didn't get an 8% raise over the course of the pandemic, you took a pay cut.

Discussion of wages is protected speech under the National Labor Relations Act, and no employer can tell you you can't disclose your pay with your fellow employees.
Locked