Page 1 of 1
Recurring Downtime and Host/Service Checks
Posted: Mon Aug 29, 2016 9:21 am
by jkinning
We have monthly maintenance windows for our various groups, Mainframe, Telecom, Network, Server Management which I am using the recurring downtime to schedule the downtime each month. Yesterday was Server Management maintenance day and I have it set for our window 0000-1810 to accommodate both non-prod and prod systems. Technically, the window is over at 1800 but I included a 10 minute buffer to try and prevent any erroneous pages sent out to the oncall person. It was brought to my attention that someone from Server Management logged into Nagios to check the status of the hosts and services, make sure everything was good before Nagios started sending out notifications and they said nothing was shown until after the 1810 time and then everything appeared under the Technical Overview view. If these hosts are on scheduled downtime they are still being monitored, at least they appear to be, just wondering if he should have seen all the hosts. He is an admin in Nagios to see and change all hosts and services? He wanted to validate everything was good and correct anything that Nagios was showing problems before the downtime expired to prevent notifications being sent to the oncall person.
Is there a better method to do this? Schedule recurring downtime and keep notifications delay 15 or 30 minutes later or should these hosts and services been visible in the Technical Overview during recurring downtime?
Re: Recurring Downtime and Host/Service Checks
Posted: Mon Aug 29, 2016 10:53 am
by bwallace
Scheduling downtime will only suppress email notifications during the specified time period. However, the checks still run as usual and any alerts will continue to be displayed in the UI - just that a notification for such will not be sent. So yes, this admin should have been able to see all hosts, etc - business as usual. It is odd he didn't see anything until downtime had expired.
I was unable to reproduce this here on version 5.2.9. What XI version are you running? Is there anything peculiar in /var/log/httpd/error_log from around this time?
Re: Recurring Downtime and Host/Service Checks
Posted: Mon Aug 29, 2016 11:34 am
by jkinning
I am running 5.2.9 on CentOS 6.8 and do see error messages so I am attaching log file.
Re: Recurring Downtime and Host/Service Checks
Posted: Mon Aug 29, 2016 3:47 pm
by bwallace
Thanks, but what client IP can we focus on in that log? What particular time frame?
Re: Recurring Downtime and Host/Service Checks
Posted: Tue Aug 30, 2016 12:29 pm
by jkinning
Time is from 6am to 6:10pm and any host in the Windows Prod group, yellowfin1p might be good.
Re: Recurring Downtime and Host/Service Checks
Posted: Thu Sep 01, 2016 1:34 pm
by bwallace
Those details all look fine so between that and the error log you posted, there are not any clues to the cause of this behavior, unfortunately.
Is this reproducible on your side?
I attempted as much here but everything worked as expected.
Re: Recurring Downtime and Host/Service Checks
Posted: Thu Sep 01, 2016 2:46 pm
by jkinning
I have a note to check it out more closely next month during the maintenance window again. I am not sure if it is just a fluke or what cause I've had this setup for a couple years now and no one has said anything until just now.
Good to hear that from an "expert" view things look alright.
Re: Recurring Downtime and Host/Service Checks
Posted: Thu Sep 01, 2016 3:13 pm
by bwallace
Thanks. Should this occur again, definitely run these commands while reproducing the issue, then provide the output:
Code: Select all
tail -f /var/log/httpd/error_log
tail -f /usr/local/nagiosxi/var/cmdsubsys.log
We can leave this thread open in the meantime...