Page 1 of 1

Scheduled Downtime Stops Functioning After Restart

Posted: Mon Jun 18, 2012 12:39 pm
by CGraham
When someone schedules downtime it functions fine until someone else applies changes in the configuration manager. When the changes are applied, notifications begin sending despite the scheduled downtime. On the host status detail screen, the "half-pie" scheduled downtime icon disappears but the scheduled downtime comment remain. Additionally, if you go into the scheduled downtime link on the left, the downtime is still listed there.

UPDATE: I tried just restarting Nagios instead of using the "Apply Configuration" button. This also causes the scheduled downtime icon to disappear and notifications to begin sending.

Attempted solutions: Read through the FAQ and tried the "killall nagios" & "service nagios start" to no avail.

Here's the messages log of the restart:

Jun 18 13:30:34 [server name redacted] nagios: PROGRAM_RESTART event encountered, restarting...
Jun 18 13:30:34 [server name redacted] nagios: ndomod: Shutdown complete.
Jun 18 13:30:34 [server name redacted] nagios: Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.
Jun 18 13:30:34 [server name redacted] nagios: Nagios 3.4.1 starting... (PID=762)
Jun 18 13:30:34 [server name redacted] nagios: Local time is Mon Jun 18 13:30:34 EDT 2012
Jun 18 13:30:34 [server name redacted] nagios: LOG VERSION: 2.0
Jun 18 13:30:34 [server name redacted] nagios: ndomod: NDOMOD 1.5.1 (05-15-2012) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
Jun 18 13:30:34 [server name redacted] nagios: ndomod: Successfully connected to data sink. 0 queued items to flush.
Jun 18 13:30:34 [server name redacted] nagios: Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.

System Profile Uploaded

Re: Scheduled Downtime Stops Functioning After Restart

Posted: Mon Jun 18, 2012 2:31 pm
by agriffin
This information is normally written to /usr/local/nagios/var/retention.dat, which survives restarts to nagios. The first things that come to mind are that this could be caused by changes to nagios.cfg or permission issues with retention.dat. Have you modified Nagios' main config file? What's the output of the following command?

Code: Select all

# ls -l /usr/local/nagios/var/retention.dat

Re: Scheduled Downtime Stops Functioning After Restart

Posted: Tue Jun 19, 2012 1:16 pm
by CGraham
Yes I have edited the nagios.cfg file (extending the check timeout) but not the entry regarding state retention:

state_retention_file=/usr/local/nagios/var/retention.dat


[root@hostname libexec]# ls -l /usr/local/nagios/var/retention.dat
-rw------- 1 nagios users 6006132 Jun 19 14:11 /usr/local/nagios/var/retention.dat

Re: Scheduled Downtime Stops Functioning After Restart

Posted: Tue Jun 19, 2012 1:25 pm
by CGraham
Just created scheduled downtime while watching the file. The entry wasn't created. Here's the beginning if that helps:

########################################
# NAGIOS STATE RETENTION FILE
#
# THIS FILE IS AUTOMATICALLY GENERATED
# BY NAGIOS. DO NOT MODIFY THIS FILE!
########################################
info {
created=1340129481
version=3.4.1
last_update_check=1338826943
update_available=1
update_uid=1314624658
last_version=3.2.3
new_version=3.4.1
}
program {
modified_host_attributes=3
modified_service_attributes=3
enable_notifications=1
active_service_checks_enabled=1
passive_service_checks_enabled=1
active_host_checks_enabled=1
passive_host_checks_enabled=1
enable_event_handlers=1
obsess_over_services=0
obsess_over_hosts=0
check_service_freshness=1
check_host_freshness=0
enable_flap_detection=1
enable_failure_prediction=1
process_performance_data=1
global_host_event_handler=xi_host_event_handler
global_service_event_handler=xi_service_event_handler
next_comment_id=364
next_downtime_id=47
next_event_id=25490
next_problem_id=12509
next_notification_id=1047
}

Re: Scheduled Downtime Stops Functioning After Restart

Posted: Tue Jun 19, 2012 2:52 pm
by scottwilkerson
The scheduled downtime script only runs 1 time per hour via cron, so you won't see the change until a couple minutes past the hour.

Re: Scheduled Downtime Stops Functioning After Restart

Posted: Tue Jun 19, 2012 5:03 pm
by CGraham
Ok, so what I'm understanding is that any scheduled downtime isn't permanent until the top of the hour.

Can you tell me which cron job does this? And impact of increasing the number of runs per hour?

Basically we are a growing software company that is adding systems to Nagios constantly. This is causing issues with false alarms...

Re: Scheduled Downtime Stops Functioning After Restart

Posted: Tue Jun 19, 2012 8:36 pm
by scottwilkerson
This is true..

You certainly could modify the cron to run more often, it is found in /etc/cron.d/nagiosxi and is the line

Code: Select all

01  * * * * nagios /usr/local/nagiosxi/cron/recurringdowntime.pl > /usr/local/nagiosxi/var/recurringdowntime.log 2>&1
this isn't true, editing my post because I had thought we were talking about recurring downtime.

Re: Scheduled Downtime Stops Functioning After Restart

Posted: Wed Jun 20, 2012 9:49 am
by agriffin
This is apparently a bug in the latest Nagios Core release. There's a fix posted on the bug report here. We are currently testing it and will likely ship a bug fix release soon.

Re: Scheduled Downtime Stops Functioning After Restart

Posted: Wed Jun 20, 2012 10:22 am
by CGraham
Thanks for the information. I certainly didn't seem to remember losing my scheduled downtime every time I made changes.