Recurring Maintenance not working

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
acentek
Posts: 123
Joined: Thu Jul 27, 2017 2:00 pm

Re: Recurring Maintenance not working

Post by acentek »

Did you get some time to look at this yesterday or at all yet today?
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Recurring Maintenance not working

Post by npolovenko »

@acentek, Sorry, I was busy this morning with a few tickets. I was told that the Perl script is deprecated and no longer used in XI 5.5.
Based on the logs the recurring downtime entries are being created successfully. Then the following cron is supposed to execute new downtime rules.

Code: Select all

01  * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/recurring_downtime.php >> /usr/local/nagiosxi/var/recurringdowntime.log 2>&1
However, as you can see this cron runs only once an hour +1 min. So if you scheduled a downtime that was supposed to start in 10 minutes, and the cron was scheduled to run in 50 minutes, then the cron will ignore the downtime because the timestamp is in the past.
It's probably hard to understand. But try to schedule downtime that starts in 2 hours and let us know if it works.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
acentek
Posts: 123
Joined: Thu Jul 27, 2017 2:00 pm

Re: Recurring Maintenance not working

Post by acentek »

Ok but explain why this hasn't been working all week until i ran the recurringdowntime.pl script?

So the cron job is going to run every hour at the *:01 minute

01 * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/recurring_downtime.php >> /usr/local/nagiosxi/var/recurringdowntime.log 2>&1

But my scheduled downtimes have been in place for months and those were no longer working.

I'll add now for an hour from now it should kick off the cron above to schedule it.

I'll post in ~2 hours with an update.
acentek
Posts: 123
Joined: Thu Jul 27, 2017 2:00 pm

Re: Recurring Maintenance not working

Post by acentek »

Ok so its that cron job that isn't working.

Event Log
Report covers from: 2018-07-17 14:36:43 to 2018-07-18 14:36:43
Showing 1-13 of 13 total matches for 'acentek-infra'Clear search criteria
Page 1 of 1

Type Date / Time Information
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;Total Processes;OK;HARD;1;PROCS OK: 177 processes
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;Swap Usage;OK;HARD;1;SWAP OK - 100% free (3983 MB out of 3983 MB)
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;Salt Master Server;OK;HARD;1;active
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;SSH Server;OK;HARD;1;active
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;Ping;OK;HARD;1;OK - acentek-inframgmt.acentek.net: rta 0.223ms, lost 0%
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;Memory Usage;OK;HARD;1;OK - 3197 / 3790 MB (84%) Free Memory, Used: 590 MB, Shared: 182 MB, Buffers + Cached: 2481 MB
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;Load;OK;HARD;1;OK - load average: 0.01, 0.04, 0.05
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;Cron Scheduling Daemon;OK;HARD;1;active
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;CPU Stats;OK;HARD;1;CPU STATISTICS OK: user=1.40% system=1.70% iowait=0.00% idle=96.90%
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;/dev/mapper/centos-root Disk Usage;OK;HARD;1;DISK OK - free space: / 31964 MB (83.56% inode=100%):
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;/dev/mapper/centos-home Disk Usage;OK;HARD;1;DISK OK - free space: /home 18638 MB (99.80% inode=100%):
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;/ Disk Usage;OK;HARD;1;DISK OK - free space: / 31964 MB (83.56% inode=100%):
Information 2018-07-18 00:00:00 CURRENT HOST STATE: acentek-inframgmt;UP;HARD;1;OK - acentek-inframgmt.acentek.net: rta 0.186ms, lost 0%

It never went into recurring maintenance today.

Thoughts?
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Recurring Maintenance not working

Post by npolovenko »

@acentek, Please schedule downtime for the host one more time, and then manually run this command from the command line:

Code: Select all

/usr/bin/php -q /usr/local/nagiosxi/cron/recurring_downtime.php
And let me know the output of this command.

When I did that I got a "Successfully submitted downtime for host: xxxx" message in the console. And the host successfully went into downtime.

Another thing is that since they replaced the Perl script with the PHP script, the architecture may have changed a bit. So if you have old recurring downtime entries I recommend deleting and recreating them. Make sure to check valid months and valid weekdays when creating a new downtime.

Does your host have any special characters in its name?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
acentek
Posts: 123
Joined: Thu Jul 27, 2017 2:00 pm

Re: Recurring Maintenance not working

Post by acentek »

Ok so adding new recurring maintenance's seems to work fine.

This leads me to believe we need to recreate all of the recurring maintenance's in nagios.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Recurring Maintenance not working

Post by npolovenko »

@acentek, I submitted a recurring downtime entry on XI 5.4.12 and then upgraded to 5.5. After I upgraded to 5.5 I manually ran this command:

Code: Select all

/usr/bin/php -q /usr/local/nagiosxi/cron/recurring_downtime.php
And it seems to have re-enabled all the previous recurring downtimes. Can you confirm whether that is the case on your XI? Also what version did you upgrade from?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
acentek
Posts: 123
Joined: Thu Jul 27, 2017 2:00 pm

Re: Recurring Maintenance not working

Post by acentek »

Ok so i created this.

Host Services Comment Start Time Duration Months Weekdays Days in Month Actions
DHCPD-MINN Yes 14:40 60 All All All Edit Delete

Then i ran the php script.

Checking recurring config id: b1ee32ae2f6f005f0fd43cd3dca0023b
host: DHCPD-MINN, services: yes
**************************************************************

get_next_scheduled_time('14:40', '', '', '') called
current_timestamp: 1532115429, 2018-07-20 14:37
candidate_timestamp: 1532115600, 2018-07-20 14:40
all parameters match, re-adjusting candidate for proper time
candidate_timestamp: 1532115600000, 2018-07-20 14:40


So i am guessing i should have seen my host go into recurring maintenance.


Type Date / Time Information
Service Recovery 2018-07-20 14:47:44 SERVICE ALERT: WTVUB2001;Remote TX Mod Rate;OK;SOFT;2;SNMP OK - 16
Service Warning 2018-07-20 14:46:44 SERVICE ALERT: WTVUB2001;Remote TX Mod Rate;WARNING;SOFT;1;SNMP WARNING - *0*
Service Recovery 2018-07-20 14:46:34 SERVICE ALERT: BCKL04 Valere;Voltage;OK;SOFT;2;SNMP OK - 5230 Volts
Service Warning 2018-07-20 14:45:34 SERVICE ALERT: BCKL04 Valere;Voltage;WARNING;SOFT;1;SNMP WARNING - *5095* Volts
Service Recovery 2018-07-20 14:41:54 SERVICE ALERT: WTVUB2001;Remote TX Mod Rate;OK;SOFT;3;SNMP OK - 16
Service Warning 2018-07-20 14:40:54 SERVICE ALERT: WTVUB2001;Remote TX Mod Rate;WARNING;SOFT;2;SNMP WARNING - *0*
Service Warning 2018-07-20 14:39:54 SERVICE ALERT: WTVUB2001;Remote TX Mod Rate;WARNING;SOFT;1;SNMP WARNING - *0*
Service Recovery 2018-07-20 14:39:24 SERVICE ALERT: SpeedTestMN;Apache Web Server;OK;SOFT;3;active
Service Recovery 2018-07-20 14:39:14 SERVICE ALERT: acenet-speedtest;Total Processes;OK;SOFT;2;PROCS OK: 181 processes

NOPE so that's not working still
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Recurring Maintenance not working

Post by npolovenko »

@acentek, We filed a bug report and our dev team is working on it. The fix should be added in the next release.
In a meantime you can change the following line in the crontab:

Code: Select all

01  * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/recurring_downtime.php >> /usr/local/nagiosxi/var/recurringdowntime.log 2>&1
To:

Code: Select all

01  * * * * nagios /usr/local/nagiosxi/cron/recurringdowntime.pl >> /usr/local/nagiosxi/var/recurringdowntime.log 2>&1
And then restart the crond daemon:

Code: Select all

service crond restart
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
acentek
Posts: 123
Joined: Thu Jul 27, 2017 2:00 pm

Re: Recurring Maintenance not working

Post by acentek »

Ok so you can see that the cron job's ran. Botht he php and perl script

Jul 23 08:01:01 nagios CROND[20511]: (nagios) CMD (nagios /usr/local/nagiosxi/cron/recurringdowntime.pl >> /usr/local/nagiosxi/var/recurringdowntime.log 2>&1)
Jul 23 08:01:01 nagios CROND[20505]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/nom.php >> /usr/local/nagiosxi/var/nom.log 2>&1)
Jul 23 08:01:01 nagios CROND[20516]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php >> /usr/local/nagiosxi/var/eventman.log 2>&1)
Jul 23 08:01:01 nagios CROND[20498]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/recurring_downtime.php >> /usr/local/nagiosxi/var/recurringdowntime.log 2>&1)


https://www.screencast.com/t/NG9uVrzjsB

But you can see that the HSTN_NetGuardian432G5 went into alarm when it should have been in recurring maintenance.

I am starting the process of recreating all of the recurring maintenance jobs now to see if that fixes it.
Locked