Page 1 of 2

Possible reoccurring downtime bug

Posted: Mon Sep 30, 2019 3:39 pm
by nickap
We have reoccurring downtime scheduled to trigger every Saturday and Sunday
This service has been scheduled for fixed downtime from 10-05-2019 12:00:00 to 10-07-2019 12:00:00. Notifications for the service will not be sent out during that time period.

This service has been scheduled for fixed downtime from 09-29-2019 12:00:00 to 10-01-2019 12:00:00. Notifications for the service will not be sent out during that time period.
This month had 5 weekends and the scheduled downtime triggered on Sunday and Monday. Is this is a bug or am I missing something?

Re: Possible reoccurring downtime bug

Posted: Mon Sep 30, 2019 4:05 pm
by benjaminsmith
Hello @nickap
This month had 5 weekends and the scheduled downtime triggered on Sunday and Monday. Is this is a bug or am I missing something?
Let's check all the time settings on the server to make sure there isn't a mismatch that could be causing a scheduling error. Please post the output to the following commands.

The php time and the server time.

Code: Select all

php -r 'echo date("D M j G:i:s T Y")."\n";' 
date 
Also, check the timezone for both php and the server.

Code: Select all

grep "date.timezone" /etc/php.ini
ls -l /etc/localtime
Lastly, the time settings for the database.

Code: Select all

echo "SELECT NOW();" | mysql -u root -pnagiosxi
Reference: Nagios XI Changing The System Time

Re: Possible reoccurring downtime bug

Posted: Mon Sep 30, 2019 4:23 pm
by nickap
[root@nagiosxi ~]# php -r 'echo date("D M j G:i:s T Y")."\n";'
Mon Sep 30 17:19:57 EDT 2019
[root@nagiosxi ~]# date
Mon Sep 30 17:20:01 EDT 2019
[root@nagiosxi ~]#
[root@nagiosxi ~]# grep "date.timezone" /etc/php.ini
; http://www.php.net/manual/en/datetime.c ... e.timezone
date.timezone = US/Eastern
[root@nagiosxi~]# ls -l /etc/localtime
lrwxrwxrwx 1 root root 30 Sep 24 2015 /etc/localtime -> /usr/share/zoneinfo/US/Eastern
[root@nagiosxi ~]#
[root@nagiosxi ~]# echo "SELECT NOW();" | mysql -u root -pnagiosxi
ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: ...)

Re: Possible reoccurring downtime bug

Posted: Mon Sep 30, 2019 4:51 pm
by benjaminsmith
Hello @nickap,

Thanks for running those commands, that looks normal. Please send me your system profile so I can review the logs.

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message and then reply to this post to bring it up in the queue.

Re: Possible reoccurring downtime bug

Posted: Tue Oct 01, 2019 6:29 am
by nickap
benjaminsmith wrote:Hello @nickap,

Thanks for running those commands, that looks normal. Please send me your system profile so I can review the logs.

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message and then reply to this post to bring it up in the queue.


Sent profile.zip via PM, thanks Ben!

Re: Possible reoccurring downtime bug

Posted: Tue Oct 01, 2019 2:58 pm
by benjaminsmith
Hi,

Thanks for sending over the system profile. Recurring downtime is running a cron job that sends a command to Nagios to initiate scheduled downtime. The settings are being written to /usr/local/nagios/etc/recurringdowntime.cfg, and it is working but not at the correct times.

Regarding the settings, you have selected downtime to start at 12:00 on Saturday and Sunday for a period of 48 hours, so downtime on Sunday starts when the host or service is still in downtime. When do you want the scheduled downtime to end? Please try to set it to start on Saturday for 48 hours or to start on Saturday and Sunday for a duration of 24 hours.

Please post the recurring downtime log for any error messages. Also, try setting up a test recurring downtime schedule and then check the logs to make sure it was started at the correct time.

Code: Select all

tail /usr/local/nagiosxi/var/recurringdowntime.log

Re: Possible reoccurring downtime bug

Posted: Thu Oct 03, 2019 6:59 am
by nickap
benjaminsmith wrote:Hi,

Thanks for sending over the system profile. Recurring downtime is running a cron job that sends a command to Nagios to initiate scheduled downtime. The settings are being written to /usr/local/nagios/etc/recurringdowntime.cfg, and it is working but not at the correct times.

Regarding the settings, you have selected downtime to start at 12:00 on Saturday and Sunday for a period of 48 hours, so downtime on Sunday starts when the host or service is still in downtime. When do you want the scheduled downtime to end? Please try to set it to start on Saturday for 48 hours or to start on Saturday and Sunday for a duration of 24 hours.

Please post the recurring downtime log for any error messages. Also, try setting up a test recurring downtime schedule and then check the logs to make sure it was started at the correct time.

Code: Select all

tail /usr/local/nagiosxi/var/recurringdowntime.log
Thanks Ben, I've removed Sunday from the scheduled downtime and changed the schedule to Saturday for 48 hours. I will post the log on Monday after the schedule kicks off.

On a side note, should the schedule downtime be capped if at 1440 minutes (24 hours) if more than one day is selected? or at least a description indicating it will overlap if greater than 24 hours.

Re: Possible reoccurring downtime bug

Posted: Thu Oct 03, 2019 10:27 am
by mbellerue
nickap wrote:On a side note, should the schedule downtime be capped if at 1440 minutes (24 hours) if more than one day is selected? or at least a description indicating it will overlap if greater than 24 hours.
I will see if we an add a bit of text in the scheduled downtime section about this. Restricting the time to 24 hours might be a good idea, too. Let's see how your server does with the scheduled downtime and from there we can look at a feature request.

Re: Possible reoccurring downtime bug

Posted: Mon Oct 07, 2019 12:28 pm
by nickap

Code: Select all

tail /usr/local/nagiosxi/var/recurringdowntime.log
check successful
candidate_timestamp: 1570896000,2019-10-12 12:00
got candidate_day_of_week: sat, checking: sat
check successful
got candidate_month_of_year: oct, checking: jan,feb,mar,apr,may,jun,jul,aug,sep,oct,nov,dec
check successful
all parameters match, re-adjusting candidate for proper time
candidate_timestamp: 1570896000000, 2019-10-12 12:00
Downtime exists with start_time: 1570896000, and duration 172800 seconds ..
NOT SCHEDULING
I've pasted the log from the weekend. I actually got an alert triggered at 1:00 AM Saturday for this, I think it should of been suppressed or something is not configured right.

Re: Possible reoccurring downtime bug

Posted: Mon Oct 07, 2019 3:18 pm
by mbellerue
Can we see more of that log? The specific section you have up right now is for 2019-10-12. I'd like to see basically those messages, except for 2019-10-05.