Page 1 of 2

Nagios SNMP Trap alerting

Posted: Thu May 22, 2014 10:50 am
by VarunJK
Hi,
Is there a way we can control when nagios alert based on number of occurrences in defined time interval rather than alerting every time a trap occurs?
Please help.

Regards,
Varun

Re: Nagios SNMP Trap alerting

Posted: Thu May 22, 2014 10:57 am
by eloyd
An easy "cheat" is to use escalation rules to only alert after it's happened a few times in a row.

Re: Nagios SNMP Trap alerting

Posted: Thu May 22, 2014 12:44 pm
by VarunJK
Can we use escalations rule to control nagios alerts on web UI ?
We have a web UI page which lists all the hosts and its services which are being checked. I do not want nagios to alert in the UI when a trap occurs immediately rather want it to wait for a defined number of occurrences in defined time period.

Re: Nagios SNMP Trap alerting

Posted: Thu May 22, 2014 12:49 pm
by slansing
Well what we need to pin down here is do you mean a notification? If so, you can totally control that with your notification settings:

http://nagios.sourceforge.net/docs/3_0/ ... tions.html
http://nagios.sourceforge.net/docs/3_0/escalations.html

Re: Nagios SNMP Trap alerting

Posted: Fri May 23, 2014 12:12 pm
by VarunJK
Hi,
Sorry but I may not have communicated my question properly. What I mean by alert is , when ever a trap occurs, nagios notify me about this passive service alert in web UI.
I am looking for a way to alter this. I do not want the nagios to alert me in web UI every time the alert occurs. Rather I want it to alert me only if it occurs 2 times in 5 mins time interval.

Re: Nagios SNMP Trap alerting

Posted: Fri May 23, 2014 12:35 pm
by eloyd
Sam has your answer.

Don't notify on the first error (notifications)
Or use escalations to notify on the first attempt (but don't notify anyone) and wait until it hits the third one to escalate to someone else.

Re: Nagios SNMP Trap alerting

Posted: Fri May 23, 2014 2:09 pm
by slansing
Precicely, you will need to read those documentation pages I linked above, they will show you how to set up your notification options to force the behavior you expect. If you have issues with the definitions, let us know what the problem is, what you would like to see happen, and also supply us with all related configuration files, host/service/templates/contacts.

Re: Nagios SNMP Trap alerting

Posted: Fri May 23, 2014 4:44 pm
by VarunJK
Thank you guys for your patience and your help in this regard.
Below I have defined my passive services for checking network interface up and down-

define service {
use passive-service
service_description Generic Event
host_name .*
check_freshness 0 ; Let last state 'stick'
# Disable flap detection so multiple events trigger
# multiple notifications as Generic Event is a catch-all
# service.
flap_detection_enabled 0
initial_state o ; OK
}

define service {
use passive-service
name pcmm-passive-base
check_freshness 0 ; Let last state 'stick'
initial_state o ; OK
hostgroup_name tekelec_hosts
register 0
}

define service {
use pcmm-passive-base
service_description Network Interface Down
}

define service {
use pcmm-passive-base
service_description Network Interface Up
}

Here are the host groups defined-

define hostgroup {
hostgroup_name pcmm_cc38
alias PCMM PE Lab
members cc38-mgr-pcmm01a.sys.comcast.net,cc38-mgr-pcmm01b.sys.comcast.net,cc38-ma-pcmm01a.sys.comcast.net,cc38-ma-pcmm01b.sys.comcast.net,cc38-t1-pcmm01a.sys.comcast.net,cc38-t1-pcmm01b.sys.comcast.net,cc38-t2-pcmm01a.sys.comcast.net,cc38-t2-pcmm01b.sys.comcast.net,cc38-bod-pcmm01a.sys.comcast.net,cc38-bod-pcmm01b.sys.comcast.net
}

define hostgroup {
hostgroup_name tekelec_hosts
}

Here are the escalation defined -

define hostescalation {
name pcmm-base-host
hostgroup_name pcmm_cc38
contact_groups pcmm-notify-group
first_notification 1 ; notify right away (1st notification)
last_notification 0 ; notify until host changes state
notification_interval 240 ; default to 12-hour re-notify
escalation_period 24x7
register 0
}


define serviceescalation {
name pcmm-base-svc
hostgroup_name pcmm_cc38
contact_groups pcmm-notify-group
first_notification 2 ; notify after 2 occurences (1st notification)
last_notification 5 ; notify until 5 occurences
notification_interval 60 ; default to 1 hour re-notify
escalation_period 24x7
register 0
}

Currently whenever there is network interface up/down happening , the trap is sent and nagios alerts me in webUI. I would like the nagios wait for 2 occurrences in 1 hour before alerting in webUI.

Re: Nagios SNMP Trap alerting

Posted: Tue May 27, 2014 10:34 am
by sreinhardt
As slansing and eloyd mentioned, you need to have an escalation that handles the first notification, not just notification 2, if you wish to disable the first notification. Otherwise the default behavior to notify the contact\contact groups for that host\service will still apply. Escalations will ONLY cover the period of notification numbers that they are configured to, otherwise it will still be default behavior.

Re: Nagios SNMP Trap alerting

Posted: Tue May 27, 2014 2:36 pm
by VarunJK
Forgive my negligence but I am still struggling to understand -

define hostescalation {
name pcmm-base-host
hostgroup_name pcmm_cc38
contact_groups pcmm-notify-group
first_notification 2 ; notify after 2 occurences (1st notification)
last_notification 5 ; notify until 5 occurences
notification_interval 60 ; default to 1-hour re-notify
escalation_period 24x7
register 0
}


define serviceescalation {
name pcmm-base-svc
hostgroup_name pcmm_cc38
contact_groups pcmm-notify-group
first_notification 2 ; notify after 2 occurences (1st notification)
last_notification 5 ; notify until 5 occurences
notification_interval 60 ; default to 1 hour re-notify
escalation_period 24x7
register 0
}

I have specified escalation rule to notify me for the first time when the eth4 interface goes down 2 times in 1 hour. The test I did was to bring down eth4 and bring it up again. As per my escalation rules, I should not see active and passive check indicating the interface eth4 down and up. Yet, I still see in UI that the active check and passive check alert indicating that the interface was down and up.

What do I need to do to disable the first notification when the event occurs only once?