SNMP Trap Sender

This board serves as an open discussion and support collaboration point for Nagios XI. NOTE: Nagios XI customers should use the Customer Support forum to obtain expedited support.

SNMP Trap Sender

Postby charmainequek » Wed Jun 19, 2019 3:16 am

I have service checks that have been set up in Nagios XI 5.4.12. We are using snmp trap sender component in Nagios XI 5.4.12 to send traps to a 3rd party snmp trap receiver. We set the snmp trap sender to only send critical alerts. We DO NOT have any contacts or notifications setup in our Nagios XI. All notifications are reliant on the traps sent out by the SNMP trap sender to the 3rd party snmp receiver who will that translate each trap they received into an incident ticket.

The issue we encounter is that the snmp trap receiver is receiving too many trap messages from the same service which translates into multiple incident tickets for the same issue. We would like to find out how we can fine tune the number of alerts sent out via the snmp trap sender.

Check settings scenario :
Check Interval: 2m
Retry Interval: 1m
Number of Retries: 5

1.01 Nagios checks service, service is OK, next check is 1.03, attempt 1/5
1.03 Nagios checks service, service is OK, next check is 1.05, attempt 1/5
1.03.30 service breaks somehow, Nagios does not know about it yet
1.05 Nagios checks service, detects thresholds have been tiggered, SOFT state, NEXT check 1.06, attempt 1/5
1.06 Nagios checks service, thresholds still tiggered, SOFT state, NEXT check 1.07, attempt 2/5
1.07 Nagios checks service, thresholds still tiggered, SOFT state, NEXT check 1.08, attempt 3/5
1.08 Nagios checks service, thresholds still tiggered, SOFT state, NEXT check 1.09, attempt 4/5
1.09 Nagios checks service, thresholds still tiggered, HARD state, notifications sent, NEXT check 1.10, attempt 5/5

I have some questions based on the above scenario and settings
1. at which point will an alert be sent through snmp trap sender? Is it at all points starting from 1.05?
2. what happen after 1.10? does it repeat the cycle from 1.01 again?
3. If the user acknowledge the issue under service management, what will be the expected behaviour?
4. What setting should we change to minimise the number of snmp trap alerts being sent over to the snmp trap receiver?
5. Does snmp trap sender send alerts to snmp receiver based on soft / hard state or based on every check that takes place that return a critical state (regardless of hard / soft)?
6. I have a very big confusion over notifications and alerts thus appreciate if you can explain as clearly as possible

Thanks!
Last edited by charmainequek on Wed Jun 19, 2019 9:15 pm, edited 1 time in total.
charmainequek
 
Posts: 32
Joined: Thu Dec 21, 2017 8:00 pm

Re: SNMP Trap Sender

Postby ssax » Wed Jun 19, 2019 5:03 pm

See here:

https://assets.nagios.com/downloads/nag ... tions.html

And here:

https://assets.nagios.com/downloads/nag ... types.html

1. It depends on your SNMP Trap Sender settings for State Type

2. It checks on the next check_interval and if there's still a problem it will notify based on the notification_interval

3. Allows you to acknowledge the current problem for the specified service. By acknowledging the current problem, future notifications (for the same servicestate) are disabled. If the "sticky" option is set to two (2), the acknowledgement will remain until the service returns to an OK state. Otherwise the acknowledgement will automatically be removed when the service changes state. If the "notify" option is set to one (1), a notification will be sent out to contacts indicating that the current service problem has been acknowledged. If the "persistent" option is set to one (1), the comment associated with the acknowledgement will survive across restarts of the Nagios process. If not, the comment will be deleted the next time Nagios restarts.

4. Only set HARD for State Type in the SNMP Trap Sender configuration.

5. Based on hard/soft state

6. Please read the information provided AND this guide:

[url]
https://assets.nagios.com/downloads/nag ... ility.html[/url]
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
ssax
Dreams In Code
 
Posts: 4104
Joined: Wed Feb 11, 2015 12:54 pm

Re: SNMP Trap Sender

Postby charmainequek » Wed Jun 19, 2019 9:45 pm

hi ssax,

thank you for your reply.

1. can i assume that the terms "notify" and "notifications" in your replies refer to the alerts that are sent to the 3rd party snmp trap receiver?
In the beginning of the thread i have emphasized on the point that we DO NOT have any contacts or notificatios setup in our Nagios XI. We are using only the SNMP trap sender to trap messages to a 3rd party SNMP trap receiver.

2. Can you advise based on 2 types of settings in the SNMP trap Sender at which point trap messages are received by the trap receiver. (please can you write down for me all the points trap messages will be received by the receiver e.g. trap received at point 1.05, 1.06 and 1.07 etc)
i. Hosts : All; Services : critical; state type : Both
ii. Hosts : All; Services : critical; state type : Hard

3. Can you advise on the hard/max limit (time in minutes) to set for the retry_interval in service managemnet under the "check settings" tab.

Thanks.
charmainequek
 
Posts: 32
Joined: Thu Dec 21, 2017 8:00 pm

Re: SNMP Trap Sender

Postby tgriep » Mon Jun 24, 2019 1:57 pm

1. When someone says notify or notification, they usually mean that some sort of command was ran on the Nagios server that sends an email, test message or SNMP TRAP so someone if "Notified" that there is an issue with a Host or Service that the Nagios server is monitoring.

2.
i. Hosts : All; Services : critical; state type : Both
1.05,1.06, 1.07, 1.08, 1.09
As soon as a service goes Critical, it will generate a TRAP on Both Soft and Hard States.

ii. Hosts : All; Services : critical; state type : Hard
Just at time 1:09 a TRAP will be sent.

3. Is this what you are looking for?
check interval

This directive is used to define the number of "time units" to wait before scheduling the next "regular" check of the service. "Regular" checks are those that occur when the service is in an OK state or when the service is in a non-OK state, but has already been rechecked max_check_attempts number of times. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. More information on this value can be found in the check scheduling documentation.


retry interval

This directive is used to define the number of "time units" to wait before scheduling a re-check of the service. Services are rescheduled at the retry interval when they have changed to a non-OK state. Once the service has been retried max_check_attempts times without a change in its status, it will revert to being scheduled at its "normal" rate as defined by the check_interval value. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. More information on this value can be found in the check scheduling documentation.

max check attempts

This directive is used to define the number of times that Nagios will retry the service check command if it returns any state other than an OK state. Setting this value to 1 will cause Nagios to generate an alert without retrying the service check again.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
 
Posts: 8082
Joined: Thu Oct 30, 2014 9:02 am


Return to Nagios XI

Who is online

Users browsing this forum: Exabot [Bot], surmeena and 16 guests