Keep Status until user intervention

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
greglynn85
Posts: 2
Joined: Fri Aug 17, 2012 4:28 am

Keep Status until user intervention

Post by greglynn85 »

I am new to Nagios and have been asked to look into an issue we are having.
The other day one of our services stopped and then restarted again in 1-2 minutes, Nagios picked up on this but the email was missed so no one know. We use Nagvis as a visual reference for our servers and I was wondering is there away to keep a status of a server Critical or Warning until a user comments or acknowledge the problem. Is this possible or is there another solution/plugin that could assist with this?
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Keep Status until user intervention

Post by mguthrie »

I think a much better solution would be to examine why no one was notified. You can adjust the notification settings so that there is no delay at all. If you set the max_check_attempts to 1, and the first_notification_delay to 0, then Nagios will notify immediately for the problem, and will not retry the check at all before sending an alert.
greglynn85
Posts: 2
Joined: Fri Aug 17, 2012 4:28 am

Re: Keep Status until user intervention

Post by greglynn85 »

Many thanks for your reply.

The service desk did get an email notification about this, but no one was monitoring the group mailbox at this time. We have escaltions set up, but as the service recovered, we never got this.

If there was a way to keep the status of the server/service at critical/warning until user intervention then it would not be missed on the big screens in the office.

If this can not be done, we will have to review our monitoring process internally.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Keep Status until user intervention

Post by mguthrie »

That isn't really how Nagios works. The Nagios UI *needs* to be showing the *current* status information of what it is monitoring, otherwise if the information in the UI is out-of-date, and it isn't really monitoring anything. I would look at re-examining how alerting is handled from Nagios. If alerts are being missed altogether, then it might be worth emailing contacts directly, or having somewhere that the alerts aren't just going to disappear.
Locked