Acknowledgement Refresh

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
joe1871
Posts: 28
Joined: Tue Feb 01, 2011 3:36 pm

Acknowledgement Refresh

Post by joe1871 »

We have technicians who are using the Acknowledgement feature of Nagios to quiet on an alert, but then they are not following up on the problem. THis reaed up its very ugly head today when a disk capacity warning that had been acknowledged turned into a full drive on a key machine. We are now recovering from that minor disaster. I am surprised that an Ack of an alert would supress that alert indefinitely. I would suspect that Nagios has some logic that says if the condition persists for x period of time after an Ack then re-alert? Anybody know if this is in there? Thanks.
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: Acknowledgement Refresh

Post by jsmurphy »

I don't think it has that functionality... at least not that I am aware of (I've never really looked). We had the same issue early on and we deemed it to be an education issue more so than an application issue, because chances are if they just acknowledged and ignored it the first time then they will probably do the same thing if it re-alerts. We made sure our users understood the importance of using the right type of downtime and there were a couple who were resistant to doing things the right way, but after the first business visible failure management bore down on them for misusing the system after they had been taught how to use it and the problem pretty much went away.

If larger numbers of people are still doing it... it could be indicative of the fact that your Nagios is being too verbose and your engineers are struggling to work out what's urgent/legitimate and what's just white noise. Just ask them and they will tell you if they are struggling with your current alerting regime... ultimately the monitoring is there to help them prevent failures and if they aren't finding it useful then it's not doing it's job.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Acknowledgement Refresh

Post by mguthrie »

I would second what jsmurphy said. I know of a user with a large installation who ran a cron job to automatically delete comments and acknowledgments older than X amount of days, but I think re-tuning the notifications and also addressing the personnel issues about how problems are being handled is the real issue on this one...
Locked