Problem:
We have a switch that sends Nagios an SNMP trap when links go down and when they recover. Each trap creates an SMS notification for the Support Engineer. Sometimes, links go down for only a few seconds and we would like to avoid an SMS being sent in this instance. We would like every single trap to still be recorded in Nagios, but would only like notified if (the link is down for >1min).
Attempted solution:
The first idea was to create escalations for this service. The first notification would only send an email to support and not create an SMS. After 1 min, this would be escalated to the Support Engineer via SMS. After a further 20 mins, it would escalated to the Shadow Support Engineer.
The config is below, but during testing this did not work as intended. As soon as the link went down the email was sent as expected, but after 1 min, there was no SMS notification. So therefore no escalation.
My theory:
Since the SNMP trap is passive, once the link goes down there are no further traps to trigger the escalation of this notification. Is this true? Can this problem be escalated if its a passive check?
What is the best way to handle escalation of SNMP traps?
Thanks in advance.
Code: Select all
///// from /etc/nagios/escalations.cfg
define serviceescalation {
host_name switch
service_description link1
first_notification 1
last_notification 0
contact_groups support_email_only
notification_interval 1
}
define serviceescalation {
host_name switch
service_description link1
first_notification 2
last_notification 0
contact_groups sms_primary_support_engineer
notification_interval 20
}
define serviceescalation {
host_name switch
service_description link1
first_notification 3
last_notification 0
contact_groups sms_shadow_support_engineer
notification_interval 20
}