Nagios repeated notifications with passive checks

bart2puck · Post by **bart2puck** » Thu Dec 05, 2013 2:35 pm

I have a setup where nagios receives a snmp trap from a device. It then notifies the contact defined in config.cfg. that works great. What I am trying to accomplish is have nagios send another notification if the problem isn't acknowledge in a given amount of time. I can not get nagios to send that second notification. I am using external commands to actually make a call as the notification, that all works fine. I don't see nagios attempt to make that second notification.

I cut down all my config files to 1 config file for easy of reading.

Code: Select all

#TIMEPERIODS


define timeperiod{
        timeperiod_name 24x7
        alias           24 Hours A Day, 7 Days A Week
        sunday          00:00-24:00
        monday          00:00-24:00
        tuesday         00:00-24:00
        wednesday       00:00-24:00
        thursday        00:00-24:00
        friday          00:00-24:00
        saturday        00:00-24:00
        }

#SERVICES


##handle the trap


define service{
        host_name                       serverName
        service_description             TRAP
        is_volatile                     1
        check_command                   check-host-alive
        max_check_attempts              3
        normal_check_interval           1
        retry_check_interval            1
        active_checks_enabled           0
        passive_checks_enabled          1
        check_period                    24x7
        notification_interval           1
        notification_period             24x7
        notification_options            w,u,c
        notifications_enabled           1
        contact_groups                  admins
        }

#COMMANDS

define command{
        command_name    check-host-alive
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
        }

define command{
        command_name  notify-host-by-sip
        command_line /usr/lib64/nagios/plugins/calls/makeCall "$NOTIFICATIONTYPE$"
}

define command{
        command_name notify-service-by-sip
        command_line /usr/lib64/nagios/plugins/calls/makeCall "$NOTIFICATIONTYPE$"


}



#CONTACT_GROUPS

define contactgroup{
        contactgroup_name       admins
        alias                   Nagios Administrators
        members                 user_sip
        }

#CONTACTS 

define contact{
        contact_name  user_sip
        alias  useralias
        service_notification_period  24x7
        host_notification_period  24x7
        service_notification_options  w
        host_notification_options  d
        service_notification_commands notify-service-by-sip
        host_notification_commands  notify-host-by-sip
        email  someNumber@someServer
}

#HOSTS

define host{
        host_name                       localhost
        alias                           Development
        address                         serverIP
        max_check_attempts              5
        check_period                    24x7
        contact_groups                  admins
        notification_period             24x7
        }

define host{
        host_name                      serverName
        alias                           Development
        address                         someIP
        max_check_attempts              5
        check_period                    24x7
        contact_groups                  admins
        notification_period             24x7
        }

slansing · Post by **slansing** » Thu Dec 05, 2013 5:52 pm

I would recommend enabling freshness checking on your passive services, that way, if an update is not received within a given amount of time "say 10 minutes or so" then it will trigger a command to run "usually check_dummy" to change the state to critical and trigger a notification.http://nagios.sourceforge.net/docs/3_0/freshness.html

As far as an acknowledgement? That could be tricky, I don't believe there is really a way to trigger a notification in the case a host/service is not acknowledged, besides running a freshness event handler to trigger a notification after no updates are received.

bart2puck · Post by **bart2puck** » Thu Dec 05, 2013 6:07 pm

thanks slansing. So it is standard to get only 1 notification when an a service changes from non-ok to anything else?

tmcdonald · Post by **tmcdonald** » Fri Dec 06, 2013 9:48 am

That is correct. One notification/alert when it goes down, if it's down for a week you still get only one, then if it recovers you may get another if you have configured it to alert on recovery.

bart2puck · Post by **bart2puck** » Fri Dec 06, 2013 10:17 am

hmm. would it be possible using escalations or anything to achieve my goal?

sreinhardt · Post by **sreinhardt** » Fri Dec 06, 2013 1:38 pm

We should clarify, it is standard to only get one alert at the time of a state change, however the notification interval config option on hosts and services denotes the time frame to send another notification if the host\service is not acknowledged or a check returns it to an ok state. There shouldn't be a need to use escalations unless you want to alter that behavior. Just using notification interval should do it.

bart2puck · Post by **bart2puck** » Fri Dec 06, 2013 1:40 pm

does this apply when a service is using passive checks and not active ones?

abrist · Post by **abrist** » Fri Dec 06, 2013 2:18 pm

Yes. Problem state passive checks will still re-notify on interval. By default they are treated as hard states, so a passive check that changes the state of an object will not wait for retries and move immediately to notifications (if configured). Like active checks, they will continue to notify on interval until resolved or acknowledged..

bart2puck · Post by **bart2puck** » Fri Dec 06, 2013 4:02 pm

ok. so I see in status.dat this:

Code: Select all

	last_notification=1386363433
	next_notification=1386363493
	no_more_notifications=0
	notifications_enabled=1

this tells me there should be another notification 60 seconds later....
but that next notification never happens.
logs show nothing after 1st notification.

here is my lastest config:

Code: Select all

#TIMEPERIODS
define timeperiod{
        timeperiod_name 24x7
        alias           24 Hours A Day, 7 Days A Week
        sunday          00:00-24:00
        monday          00:00-24:00
        tuesday         00:00-24:00
        wednesday       00:00-24:00
        thursday        00:00-24:00
        friday          00:00-24:00
        saturday        00:00-24:00
        }
#SERVICES

##handle the trap

define service{
        host_name                       localhost
        service_description             TRAPPER
        is_volatile                     0
        check_command                   check-host-alive
        check_period                    none
        max_check_attempts              1
        normal_check_interval           1
        retry_check_interval            1
        active_checks_enabled           0
        passive_checks_enabled          1
        notification_interval           1
        notification_period             24x7
        notification_options            w,u,c
        notifications_enabled           1
        contact_groups                  admins
        }

#COMMANDS

define command{
        command_name notify-service-by-sip
        command_line /usr/lib64/nagios/plugins/calls/makeCall "$NOTIFICATIONTYPE$"
}
#CONTACT_GROUPS

define contactgroup{
        contactgroup_name       admins
        alias                   Nagios Administrators
        members                 user_sip
        }

#CONTACTS 

define contact{
        contact_name  user_sip
        alias  useralias
        service_notification_period  24x7
        host_notification_period  24x7
        service_notification_options  w,u,c
        host_notification_options  d
        service_notification_commands notify-service-by-sip
        host_notification_commands  notify-host-by-sip
        email  someNumber@someIP
}

#HOSTS

define host{
        host_name                       localhost
        alias                           Development
        address                         ipAddress
        max_check_attempts              5
        check_period                    24x7
        contact_groups                  admins
        notification_period             24x7
        }

Any thoughts?

sreinhardt · Post by **sreinhardt** » Mon Dec 09, 2013 10:48 am

Unless the host also went down at the same time, thus causing the service to stop notifying since it's "parent" host is down, I see no reason that this should not have notified again. I would note that 1 minute notification intervals are probably a little bit quick, for testing maybe set it to 5 min just to be sure its not still executing the previous one. Also which log do you happen to be looking at when you are seeing the first message but not the second?

Nagios Support Forum

Nagios repeated notifications with passive checks

Nagios repeated notifications with passive checks

Re: Nagios repeated notifications with passive checks

Re: Nagios repeated notifications with passive checks

Re: Nagios repeated notifications with passive checks

Re: Nagios repeated notifications with passive checks

Re: Nagios repeated notifications with passive checks

Re: Nagios repeated notifications with passive checks

Re: Nagios repeated notifications with passive checks

Re: Nagios repeated notifications with passive checks

Re: Nagios repeated notifications with passive checks