Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
We're seeing what seems like some fairly odd behavior from our Nagios installation. Basically Whenever a reload or restart of the Nagios daemon is triggered, notifications are re-sent for all down services and hosts regardless of whether state retention is enabled or not. By default we use an alert-once scheme (no re-notification), and I'm seeing that retained states are being picked up by the monitoring system on nagios start... Am I missing something or is this expected behavior?
I'm including a copy of our nagios.cfg below, thanks for any insight you can provide
This is odd as state retention is enabled in your config. How are you sending the notifications, with the notification handler, event handler, or through an escalation?
If you acknowledge an issue, does it re-alert after a nagios start?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Notifications are sent via the standard notification handler - i.e. via the notification_commands options specified via contacts. I've included a sample of one of those templates below
define contact{
use base-contact
name email-contact
service_notification_commands service-email
host_notification_commands host-email
register 0
service_notification_options c,r
host_notification_options d,r
}
service-email and host-email are perl notification scripts that generate some pretty html notifications for us, but have no special logic outside of that.
# ls -l /dev/shm/nagios/status.dat
-rw-rw-r-- 1 nagios nagios 6926043 May 1 12:47 /dev/shm/nagios/status.dat
# ls -l /var/log/nagios/retention.dat
-rw-r--r-- 1 nagios nagios 6958858 May 1 12:37 /var/log/nagios/retention.dat
Also, notifications DO still get sent out for acknowledged issues after a start, even with state_retention enabled. A relevant host check (with confidential info removed) would be:
This number doesn't mean a whole lot I don't think on a well behaved system as I think retention.dat is written out during a clean Nagios exit anyway. Is it possible that during your start/restart that your Nagios isn't exiting properly? Maybe one thing to do is change that number to 1 or 2 from 60 and see if it affects the behavior?