Page 1 of 1
Critical notification to be sent, no need for recovery notif
Posted: Wed Nov 04, 2020 3:44 am
by Iskon
Need a little help with the notifications settings. What my team and me are trying to do is, send a Critical notification but not Recovery notification. Why we are trying to do that? We need to check how many times interface on switch has his status changed and Nagios flap detection can't see that change, its a quick one

. And we need to change exit status to Critical if the status changed and to send a notification, and when recovery comes in nagios because port is usually up 5-10 seconds after flapping and that is OK status. So does anybody knows how to do this?

Re: Critical notification to be sent, no need for recovery n
Posted: Wed Nov 04, 2020 4:26 pm
by ssax
To disable recoveries, edit the services and set the notification options under the Alert Settings tab to what you want (making sure Recovery is not select). Or you can edit each contact and disable the recoveries in the same spot.
You can also disable flapping on them if that's impacting it.
You can also setup state stalking to log additional problem states in the State History report:
https://assets.nagios.com/downloads/nag ... lking.html
And we need to change exit status to Critical if the status changed
The plugin handles the output/exit code/status so the plugin would need to support it, most plugins do not know the status of the previous results so you're not likely to find one that does what your looking for unless you modify it/write your own. What plugin is it using?
Re: Critical notification to be sent, no need for recovery n
Posted: Tue Nov 10, 2020 7:35 am
by Iskon
Thank you for your reply.
We are using check_ifoperstatus Nagios plugin, that checks interfaces on network devices via SNMP.
We already have a service for up/down interface changes (when Nagios can detect that when retry checks reach the limit and send critical notification).
Some interfaces can have the state changed 3 times in a period of 10 seconds and after that is UP.
We need to get an alert for that 3 state changes in the period of 10 seconds (that would be a critical alert sent) and no need for a recovery alert if there aren't any state changes afterward and the current state of the interface is UP, so we can detect that an interface is flapping.
So let's say in between 2 Nagios checks (let's say the checks are running every minute) interface changed state (it went from UP to DOWN to UP x2) and after that changes his state is UP and state of the service would be OK because it didn't detect any problems with the interface. We need to get an alert about state changes.
We don't want that to be another service on the host, we wanna do that all on the same service with the same plugin.
Can we do some scripting with Nagios macros in that plugin with some conditions (what conditions?) that would escape that recovery alert if the state of the interface after flapping is UP?
Re: Critical notification to be sent, no need for recovery n
Posted: Wed Nov 11, 2020 5:21 pm
by ssax
The only way you'll get notified every one of those state changes is if you set max_check_attempts to 1. You may want to enable is_volatile on the service as well:
https://assets.nagios.com/downloads/nag ... vices.html
Can we do some scripting with Nagios macros in that plugin with some conditions (what conditions?) that would escape that recovery alert if the state of the interface after flapping is UP?
The problem is there is nothing that says that the service was flapping in a macro/envar, there's is_flapping but that won't work for what you're trying to do because it would be out of flapping at that point. The only place historical flapping information is stored is in the /usr/local/nagios/var/nagios.log (or in the nagios_logentries DB table) which you would need to parse and find the start/end (if there is an end listed) for the service to determine if the service was flapping. Nagios still needs the OK result once it's out of flapping though so it internally knows to reset the counters, given that, the only option to stop the recovery would be to disable recoveries or modify your notification handler to parse those logs to see if it was flapping and not send it. You would need to implement that functionality if you want it or you can create a feature request here to add some form of last_check_was_flapping functionality/macro:
https://github.com/NagiosEnterprises/nagioscore/issues