Page 1 of 1

Notifications and Status Reset

Posted: Mon May 13, 2013 3:34 am
by WillemDH
Hello,

Can someone explain how I should set up the following. We have an "on duty" BlakcBerry who receives only alerts from 07:00 to 23:00, as we do like to get some sleep at night ;)
Now the problem is that we do want to receive alerts on this BlackBerry if something happened at night and the problem is still there. At the moment we monitor everything with SCOM 2007 and there we execute a status reset of all our servers at 07:00 after which problems which still exist will send a new alert. How should I do something like a status reset of all servers / services at 07:00? Or is there some other way to resend alerts for critical servers at 07:00

Thanks.

Re: Notifications and Status Reset

Posted: Mon May 13, 2013 4:10 pm
by abrist
As long as there are no notification time periods that cover 23:00-2400 and 00:00 to 07:00, yoru request should be the default behavior. See below.
From: http://nagios.sourceforge.net/docs/3_0/ ... tions.html
The fourth host or service filter that must be passed is the time period test. Each host and service definition has a <notification_period> option that specifies which time period contains valid notification times for the host or service. If the time that the notification is being made does not fall within a valid time range in the specified time period, no one gets contacted. If it falls within a valid time range, the notification gets passed to the next filter... Note: If the time period filter is not passed, Nagios will reschedule the next notification for the host or service (if its in a non-OK state) for the next valid time present in the time period. This helps ensure that contacts are notified of problems as soon as possible when the next valid time in time period arrives.

Re: Notifications and Status Reset

Posted: Wed May 22, 2013 5:01 am
by WillemDH
I haven't been able to test this, but if I read this correct, it means that if a host or service is still down on the moment a notifications schedule becomes active, a notifications will be sent out. Does this also apply when there are several notification contacts, for example:

user a: 09:00 - 23:00
user b: always available

alert comes in at 02:00 => notification is sent out at user b
meanwhile the problem is still there, will user a also get a notification at 09:00 for this problem?

Re: Notifications and Status Reset

Posted: Wed May 22, 2013 11:37 am
by slansing
If they are using the same time period, or the notification period is the same on the host/service the above is correct. If the host/service is still down when user A's timeperiod comes online, and they are designated to receive alerts for the object, they will receive the notification that the object is still down.

Re: Notifications and Status Reset

Posted: Sat Jun 22, 2013 4:06 am
by WillemDH
Strange becasue this morning @ 10 am I received an alert on my "on duty" BlackBerry, that a critical web application has recovered and no other alerts. This web application has a recurring downtime form 2:00 to 7:00 each day. Today however the maintenance tasks went wrong and looking into the event log, I can see that it took 3 more hours to finish it's jobs.
So the on duty user / contact has a notificaiton period from 08:00 to 23:00. As you say "If the host/service is still down when user A's timeperiod comes online, and they are designated to receive alerts for the object, they will receive the notification that the object is still down."
Why didn't I receive any email on my BlackBerry at 08:00 that my critical web application was still down? Maybe it's important to say that the hosts and services are configured to only send an email every 1440 minutes (24 hours) instead of the default of 60 minutes. Could this be the reason?
As we 'd rather not send an email every 60 minutes to all contacts, is there any way to reset the health of all critical services at for example 08:00 or maybe scheduling a new check for all critical hosts / services, so when this new check fails it sens an email to available contacts?

Thanks again for clarifying this.

Re: Notifications and Status Reset

Posted: Mon Jun 24, 2013 11:10 am
by slansing
Maybe it's important to say that the hosts and services are configured to only send an email every 1440 minutes (24 hours) instead of the default of 60 minutes. Could this be the reason?
Yes this would be precisely the reason. You may want to decrease the notification interval for objects such as these since during downtime they are dependent on another application "updating, pruning, whatever it may be that causes them to go down."

Re: Notifications and Status Reset

Posted: Mon Jul 01, 2013 6:07 am
by WillemDH
Ok, Thanks. I've tested this and made sure all critical sevrices have a notification interval of 30 minutes. Thread can be closed

Re: Notifications and Status Reset

Posted: Mon Jul 01, 2013 10:32 am
by slansing
Closing as resolved.