Recurring Maintenance not working
Re: Recurring Maintenance not working
Did you get some time to look at this yesterday or at all yet today?
-
npolovenko
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: Recurring Maintenance not working
@acentek, Sorry, I was busy this morning with a few tickets. I was told that the Perl script is deprecated and no longer used in XI 5.5.
Based on the logs the recurring downtime entries are being created successfully. Then the following cron is supposed to execute new downtime rules.
However, as you can see this cron runs only once an hour +1 min. So if you scheduled a downtime that was supposed to start in 10 minutes, and the cron was scheduled to run in 50 minutes, then the cron will ignore the downtime because the timestamp is in the past.
It's probably hard to understand. But try to schedule downtime that starts in 2 hours and let us know if it works.
Based on the logs the recurring downtime entries are being created successfully. Then the following cron is supposed to execute new downtime rules.
Code: Select all
01 * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/recurring_downtime.php >> /usr/local/nagiosxi/var/recurringdowntime.log 2>&1It's probably hard to understand. But try to schedule downtime that starts in 2 hours and let us know if it works.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Recurring Maintenance not working
Ok but explain why this hasn't been working all week until i ran the recurringdowntime.pl script?
So the cron job is going to run every hour at the *:01 minute
01 * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/recurring_downtime.php >> /usr/local/nagiosxi/var/recurringdowntime.log 2>&1
But my scheduled downtimes have been in place for months and those were no longer working.
I'll add now for an hour from now it should kick off the cron above to schedule it.
I'll post in ~2 hours with an update.
So the cron job is going to run every hour at the *:01 minute
01 * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/recurring_downtime.php >> /usr/local/nagiosxi/var/recurringdowntime.log 2>&1
But my scheduled downtimes have been in place for months and those were no longer working.
I'll add now for an hour from now it should kick off the cron above to schedule it.
I'll post in ~2 hours with an update.
Re: Recurring Maintenance not working
Ok so its that cron job that isn't working.
Event Log
Report covers from: 2018-07-17 14:36:43 to 2018-07-18 14:36:43
Showing 1-13 of 13 total matches for 'acentek-infra'Clear search criteria
Page 1 of 1
Type Date / Time Information
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;Total Processes;OK;HARD;1;PROCS OK: 177 processes
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;Swap Usage;OK;HARD;1;SWAP OK - 100% free (3983 MB out of 3983 MB)
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;Salt Master Server;OK;HARD;1;active
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;SSH Server;OK;HARD;1;active
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;Ping;OK;HARD;1;OK - acentek-inframgmt.acentek.net: rta 0.223ms, lost 0%
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;Memory Usage;OK;HARD;1;OK - 3197 / 3790 MB (84%) Free Memory, Used: 590 MB, Shared: 182 MB, Buffers + Cached: 2481 MB
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;Load;OK;HARD;1;OK - load average: 0.01, 0.04, 0.05
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;Cron Scheduling Daemon;OK;HARD;1;active
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;CPU Stats;OK;HARD;1;CPU STATISTICS OK: user=1.40% system=1.70% iowait=0.00% idle=96.90%
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;/dev/mapper/centos-root Disk Usage;OK;HARD;1;DISK OK - free space: / 31964 MB (83.56% inode=100%):
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;/dev/mapper/centos-home Disk Usage;OK;HARD;1;DISK OK - free space: /home 18638 MB (99.80% inode=100%):
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;/ Disk Usage;OK;HARD;1;DISK OK - free space: / 31964 MB (83.56% inode=100%):
Information 2018-07-18 00:00:00 CURRENT HOST STATE: acentek-inframgmt;UP;HARD;1;OK - acentek-inframgmt.acentek.net: rta 0.186ms, lost 0%
It never went into recurring maintenance today.
Thoughts?
Event Log
Report covers from: 2018-07-17 14:36:43 to 2018-07-18 14:36:43
Showing 1-13 of 13 total matches for 'acentek-infra'Clear search criteria
Page 1 of 1
Type Date / Time Information
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;Total Processes;OK;HARD;1;PROCS OK: 177 processes
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;Swap Usage;OK;HARD;1;SWAP OK - 100% free (3983 MB out of 3983 MB)
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;Salt Master Server;OK;HARD;1;active
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;SSH Server;OK;HARD;1;active
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;Ping;OK;HARD;1;OK - acentek-inframgmt.acentek.net: rta 0.223ms, lost 0%
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;Memory Usage;OK;HARD;1;OK - 3197 / 3790 MB (84%) Free Memory, Used: 590 MB, Shared: 182 MB, Buffers + Cached: 2481 MB
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;Load;OK;HARD;1;OK - load average: 0.01, 0.04, 0.05
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;Cron Scheduling Daemon;OK;HARD;1;active
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;CPU Stats;OK;HARD;1;CPU STATISTICS OK: user=1.40% system=1.70% iowait=0.00% idle=96.90%
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;/dev/mapper/centos-root Disk Usage;OK;HARD;1;DISK OK - free space: / 31964 MB (83.56% inode=100%):
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;/dev/mapper/centos-home Disk Usage;OK;HARD;1;DISK OK - free space: /home 18638 MB (99.80% inode=100%):
Information 2018-07-18 00:00:00 CURRENT SERVICE STATE: acentek-inframgmt;/ Disk Usage;OK;HARD;1;DISK OK - free space: / 31964 MB (83.56% inode=100%):
Information 2018-07-18 00:00:00 CURRENT HOST STATE: acentek-inframgmt;UP;HARD;1;OK - acentek-inframgmt.acentek.net: rta 0.186ms, lost 0%
It never went into recurring maintenance today.
Thoughts?
-
npolovenko
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: Recurring Maintenance not working
@acentek, Please schedule downtime for the host one more time, and then manually run this command from the command line:
And let me know the output of this command.
When I did that I got a "Successfully submitted downtime for host: xxxx" message in the console. And the host successfully went into downtime.
Another thing is that since they replaced the Perl script with the PHP script, the architecture may have changed a bit. So if you have old recurring downtime entries I recommend deleting and recreating them. Make sure to check valid months and valid weekdays when creating a new downtime.
Does your host have any special characters in its name?
Code: Select all
/usr/bin/php -q /usr/local/nagiosxi/cron/recurring_downtime.phpWhen I did that I got a "Successfully submitted downtime for host: xxxx" message in the console. And the host successfully went into downtime.
Another thing is that since they replaced the Perl script with the PHP script, the architecture may have changed a bit. So if you have old recurring downtime entries I recommend deleting and recreating them. Make sure to check valid months and valid weekdays when creating a new downtime.
Does your host have any special characters in its name?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Recurring Maintenance not working
Ok so adding new recurring maintenance's seems to work fine.
This leads me to believe we need to recreate all of the recurring maintenance's in nagios.
This leads me to believe we need to recreate all of the recurring maintenance's in nagios.
-
npolovenko
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: Recurring Maintenance not working
@acentek, I submitted a recurring downtime entry on XI 5.4.12 and then upgraded to 5.5. After I upgraded to 5.5 I manually ran this command:
And it seems to have re-enabled all the previous recurring downtimes. Can you confirm whether that is the case on your XI? Also what version did you upgrade from?
Code: Select all
/usr/bin/php -q /usr/local/nagiosxi/cron/recurring_downtime.phpAs of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Recurring Maintenance not working
Ok so i created this.
Host Services Comment Start Time Duration Months Weekdays Days in Month Actions
DHCPD-MINN Yes 14:40 60 All All All Edit Delete
Then i ran the php script.
Checking recurring config id: b1ee32ae2f6f005f0fd43cd3dca0023b
host: DHCPD-MINN, services: yes
**************************************************************
get_next_scheduled_time('14:40', '', '', '') called
current_timestamp: 1532115429, 2018-07-20 14:37
candidate_timestamp: 1532115600, 2018-07-20 14:40
all parameters match, re-adjusting candidate for proper time
candidate_timestamp: 1532115600000, 2018-07-20 14:40
So i am guessing i should have seen my host go into recurring maintenance.
Type Date / Time Information
Service Recovery 2018-07-20 14:47:44 SERVICE ALERT: WTVUB2001;Remote TX Mod Rate;OK;SOFT;2;SNMP OK - 16
Service Warning 2018-07-20 14:46:44 SERVICE ALERT: WTVUB2001;Remote TX Mod Rate;WARNING;SOFT;1;SNMP WARNING - *0*
Service Recovery 2018-07-20 14:46:34 SERVICE ALERT: BCKL04 Valere;Voltage;OK;SOFT;2;SNMP OK - 5230 Volts
Service Warning 2018-07-20 14:45:34 SERVICE ALERT: BCKL04 Valere;Voltage;WARNING;SOFT;1;SNMP WARNING - *5095* Volts
Service Recovery 2018-07-20 14:41:54 SERVICE ALERT: WTVUB2001;Remote TX Mod Rate;OK;SOFT;3;SNMP OK - 16
Service Warning 2018-07-20 14:40:54 SERVICE ALERT: WTVUB2001;Remote TX Mod Rate;WARNING;SOFT;2;SNMP WARNING - *0*
Service Warning 2018-07-20 14:39:54 SERVICE ALERT: WTVUB2001;Remote TX Mod Rate;WARNING;SOFT;1;SNMP WARNING - *0*
Service Recovery 2018-07-20 14:39:24 SERVICE ALERT: SpeedTestMN;Apache Web Server;OK;SOFT;3;active
Service Recovery 2018-07-20 14:39:14 SERVICE ALERT: acenet-speedtest;Total Processes;OK;SOFT;2;PROCS OK: 181 processes
NOPE so that's not working still
Host Services Comment Start Time Duration Months Weekdays Days in Month Actions
DHCPD-MINN Yes 14:40 60 All All All Edit Delete
Then i ran the php script.
Checking recurring config id: b1ee32ae2f6f005f0fd43cd3dca0023b
host: DHCPD-MINN, services: yes
**************************************************************
get_next_scheduled_time('14:40', '', '', '') called
current_timestamp: 1532115429, 2018-07-20 14:37
candidate_timestamp: 1532115600, 2018-07-20 14:40
all parameters match, re-adjusting candidate for proper time
candidate_timestamp: 1532115600000, 2018-07-20 14:40
So i am guessing i should have seen my host go into recurring maintenance.
Type Date / Time Information
Service Recovery 2018-07-20 14:47:44 SERVICE ALERT: WTVUB2001;Remote TX Mod Rate;OK;SOFT;2;SNMP OK - 16
Service Warning 2018-07-20 14:46:44 SERVICE ALERT: WTVUB2001;Remote TX Mod Rate;WARNING;SOFT;1;SNMP WARNING - *0*
Service Recovery 2018-07-20 14:46:34 SERVICE ALERT: BCKL04 Valere;Voltage;OK;SOFT;2;SNMP OK - 5230 Volts
Service Warning 2018-07-20 14:45:34 SERVICE ALERT: BCKL04 Valere;Voltage;WARNING;SOFT;1;SNMP WARNING - *5095* Volts
Service Recovery 2018-07-20 14:41:54 SERVICE ALERT: WTVUB2001;Remote TX Mod Rate;OK;SOFT;3;SNMP OK - 16
Service Warning 2018-07-20 14:40:54 SERVICE ALERT: WTVUB2001;Remote TX Mod Rate;WARNING;SOFT;2;SNMP WARNING - *0*
Service Warning 2018-07-20 14:39:54 SERVICE ALERT: WTVUB2001;Remote TX Mod Rate;WARNING;SOFT;1;SNMP WARNING - *0*
Service Recovery 2018-07-20 14:39:24 SERVICE ALERT: SpeedTestMN;Apache Web Server;OK;SOFT;3;active
Service Recovery 2018-07-20 14:39:14 SERVICE ALERT: acenet-speedtest;Total Processes;OK;SOFT;2;PROCS OK: 181 processes
NOPE so that's not working still
-
npolovenko
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: Recurring Maintenance not working
@acentek, We filed a bug report and our dev team is working on it. The fix should be added in the next release.
In a meantime you can change the following line in the crontab:
To:
And then restart the crond daemon:
In a meantime you can change the following line in the crontab:
Code: Select all
01 * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/recurring_downtime.php >> /usr/local/nagiosxi/var/recurringdowntime.log 2>&1
Code: Select all
01 * * * * nagios /usr/local/nagiosxi/cron/recurringdowntime.pl >> /usr/local/nagiosxi/var/recurringdowntime.log 2>&1Code: Select all
service crond restartAs of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Recurring Maintenance not working
Ok so you can see that the cron job's ran. Botht he php and perl script
Jul 23 08:01:01 nagios CROND[20511]: (nagios) CMD (nagios /usr/local/nagiosxi/cron/recurringdowntime.pl >> /usr/local/nagiosxi/var/recurringdowntime.log 2>&1)
Jul 23 08:01:01 nagios CROND[20505]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/nom.php >> /usr/local/nagiosxi/var/nom.log 2>&1)
Jul 23 08:01:01 nagios CROND[20516]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php >> /usr/local/nagiosxi/var/eventman.log 2>&1)
Jul 23 08:01:01 nagios CROND[20498]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/recurring_downtime.php >> /usr/local/nagiosxi/var/recurringdowntime.log 2>&1)
https://www.screencast.com/t/NG9uVrzjsB
But you can see that the HSTN_NetGuardian432G5 went into alarm when it should have been in recurring maintenance.
I am starting the process of recreating all of the recurring maintenance jobs now to see if that fixes it.
Jul 23 08:01:01 nagios CROND[20511]: (nagios) CMD (nagios /usr/local/nagiosxi/cron/recurringdowntime.pl >> /usr/local/nagiosxi/var/recurringdowntime.log 2>&1)
Jul 23 08:01:01 nagios CROND[20505]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/nom.php >> /usr/local/nagiosxi/var/nom.log 2>&1)
Jul 23 08:01:01 nagios CROND[20516]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php >> /usr/local/nagiosxi/var/eventman.log 2>&1)
Jul 23 08:01:01 nagios CROND[20498]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/recurring_downtime.php >> /usr/local/nagiosxi/var/recurringdowntime.log 2>&1)
https://www.screencast.com/t/NG9uVrzjsB
But you can see that the HSTN_NetGuardian432G5 went into alarm when it should have been in recurring maintenance.
I am starting the process of recreating all of the recurring maintenance jobs now to see if that fixes it.