I am new to Nagios and have been asked to look into an issue we are having.
The other day one of our services stopped and then restarted again in 1-2 minutes, Nagios picked up on this but the email was missed so no one know. We use Nagvis as a visual reference for our servers and I was wondering is there away to keep a status of a server Critical or Warning until a user comments or acknowledge the problem. Is this possible or is there another solution/plugin that could assist with this?
Keep Status until user intervention
Re: Keep Status until user intervention
I think a much better solution would be to examine why no one was notified. You can adjust the notification settings so that there is no delay at all. If you set the max_check_attempts to 1, and the first_notification_delay to 0, then Nagios will notify immediately for the problem, and will not retry the check at all before sending an alert.
-
greglynn85
- Posts: 2
- Joined: Fri Aug 17, 2012 4:28 am
Re: Keep Status until user intervention
Many thanks for your reply.
The service desk did get an email notification about this, but no one was monitoring the group mailbox at this time. We have escaltions set up, but as the service recovered, we never got this.
If there was a way to keep the status of the server/service at critical/warning until user intervention then it would not be missed on the big screens in the office.
If this can not be done, we will have to review our monitoring process internally.
The service desk did get an email notification about this, but no one was monitoring the group mailbox at this time. We have escaltions set up, but as the service recovered, we never got this.
If there was a way to keep the status of the server/service at critical/warning until user intervention then it would not be missed on the big screens in the office.
If this can not be done, we will have to review our monitoring process internally.
Re: Keep Status until user intervention
That isn't really how Nagios works. The Nagios UI *needs* to be showing the *current* status information of what it is monitoring, otherwise if the information in the UI is out-of-date, and it isn't really monitoring anything. I would look at re-examining how alerting is handled from Nagios. If alerts are being missed altogether, then it might be worth emailing contacts directly, or having somewhere that the alerts aren't just going to disappear.