Page 2 of 2
Re: Nagios sending out too many alerts
Posted: Wed Apr 11, 2012 3:39 am
by johndoe
I don't see that icon. A bit lost as what you mean by "could be because you have submitted a external command to Nagios disabling and re-enabling notifications."
All I did was create a BPI group, add notifications to it, go to the passive template and disable notifications, should be foolproof really, most of the services and host checks are using "XI passive templates" respectively and have "Skip" on the service/host notifications and "Off" on the templates however they still send notifications... As can be seen from the screenshots the configs should be correct (The screenshots are all for the same service).
If by external command you mean at some point clicking on "Quick Actions > Disable notifications" on the service... It's possible that has been done in the past but shouldn't the template settings get priority if changed more recently than that potential click? It's not viable for me to go to each service and disable notifications for each...
What's the solution for this?
Re: Nagios sending out too many alerts
Posted: Wed Apr 11, 2012 8:45 am
by johndoe
Scott,
I had a look at the MySQL tables for both the nagios database and nagiosql... While doing that i compared a service which was working (not sending notifications with skip/OFF as previously mentioned) with one that wasn't working (still sending notifications)...
What I noticed is that on the nagiosql database (tbl_service table) the rows looked identical for both services yet on the nagios database (nagios_services table) the rows were different on the "notifications_enabled field"...
Again, correct me if i'm wrong but I believe this isn't the correct behaviour from the nagios part (possibly due to an earlier click on the disable/enable notifications like you mentioned before)?
Shouldn't these be synched and the latest action take priority?
Re: Nagios sending out too many alerts
Posted: Wed Apr 11, 2012 9:50 am
by scottwilkerson
johndoe wrote:If by external command you mean at some point clicking on "Quick Actions > Disable notifications" on the service... It's possible that has been done in the past but shouldn't the template settings get priority if changed more recently than that potential click?
yes, this is what I meant.
johndoe wrote:Again, correct me if i'm wrong but I believe this isn't the correct behaviour from the nagios part (possibly due to an earlier click on the disable/enable notifications like you mentioned before)?
Shouldn't these be synched and the latest action take priority?
Actually this is the expected behavior, and I'll try to explain why.
Commands such at "Quick Actions > Disable notifications" take the highest precedent.
Lets say your system is up and running and you have notifications off for a host but you decide you need to enable notifications for this host/service and submit the command through "Quick Actions >Enable notifications". this is held at the highest level.
Later, you make some changes to a template that disables notification (similar to your situation), it is applied, and all the hosts/services that have the template on it get the new value on the underlying template, however you SPECIFICALLY set this host/service to be enabled at the highest level.
I hope this makes sense. During configuration changes it may not seem the best option, but you certainly wouldn't want another Admin applying a configuration and it overwriting the fact that you disabled/enabled a specific host/services.
Re: Nagios sending out too many alerts
Posted: Wed Apr 11, 2012 10:05 am
by johndoe
So what is the best/quickest way for me to ensure those that were clicked on don't take that "external command" change and change accordingly to the template (until i click on them again

)
Re: Nagios sending out too many alerts
Posted: Wed Apr 11, 2012 1:27 pm
by scottwilkerson
If you nagios.cfg has
You could go to Home -> Event Log and search for "ENABLE_SVC_NOTIFICATIONS"
If you are not logging external commands (the default) there isn't an elegant way to see this. You could however stop nagios, remove /usr/local/nagios/var/retention.dat file and restart nagios.
Code: Select all
service nagios stop
rm -f /usr/local/nagios/var/retention.dat
service nagios start
WARNING: This reset all of the info nagios is retaining, including ALL commands submitted, this include problem acknowledgments as well as short term state history (used to determine if host/service is flapping and if it is time to send a notification). ALL hosts/service will show a pending state until they make their next check.
For some organizations this may be no big deal, for others though it could be a real headache if you had hundereds of notifications disabled manually or problems acknowledged.