Page 1 of 1

Consistently inaccurate notifications

Posted: Fri Aug 13, 2021 2:30 pm
by rgage_hhsc
Our organization is using Nagios for monitoring several servers on several criteria -- one in particular has picked up a very confusing pattern.

The alert being triggered is a CPU Usage limit -- what happens is a nightly maintenance task that pumps CPU usage for 10-20 minutes.
Nagios' usage graph very accurately portrays the situation:
Nagios CPU Usage graph
Nagios CPU Usage graph
What Nagios reports in its history is even more detailed, with six events every day in that spike-time:
Nagios 1-day History
Nagios 1-day History
But I consistently get THE FIRST THREE notifications about this spike every day, and nothing else: WARNING, RECOVERY, WARNING, all within a few minutes of each other; then nothing until the next day when it happens again.

Judging solely by my emails from Nagios, there is a few seconds of recovery time each day amidst a CPU Warning event that has been happening for months. Looking at the graph up there, this obviously is a false picture.

This is not a critical worry, we know it's just one spike despite what Nagios' emails are telling us … but a bug's a bug, and Nagios can't be fixed unless it's reported. So, consider this reported. :-)

Thanks!
boB