Check Freshness running early
Check Freshness running early
I have several Nagios XI 5.6.1 servers running on RHEL 7 64bit VM's. We have several passive checks, all setup with a freshness value of 7200 and the check command is "check_dummy" with ARG1 as 0 "Resetting check after 2 hours".
Here are some examples from our nagios.log file (Service and HOST name redacted) As you can see, Services 1 and 4 were reset more often than the 2 hour freshness check. The passive process is a custom log scraping we wrote and does not include the ability to send the resets themselves.
[1569277851] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569278149] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569278448] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569278747] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569279045] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569279344] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569279642] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569279941] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569280240] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569281134] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569281433] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569281732] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569282030] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569282329] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569284459] SERVICE ALERT: HOST1;SERVICE_2;OK;HARD;1;OK: Resetting check after 2 hours
[1569286497] SERVICE ALERT: HOST1;SERVICE_3;OK;HARD;1;OK: Resetting check after 2 hours
[1569287213] SERVICE ALERT: HOST1;SERVICE_4;OK;HARD;1;OK: Resetting check after 2 hours
[1569289004] SERVICE ALERT: HOST1;SERVICE_4;OK;HARD;1;OK: Resetting check after 2 hours
[1569291693] SERVICE ALERT: HOST1;SERVICE_4;OK;HARD;1;OK: Resetting check after 2 hours
[1569293185] SERVICE ALERT: HOST1;SERVICE_4;OK;HARD;1;OK: Resetting check after 2 hours
[1569308553] SERVICE ALERT: HOST1;SERVICE_5;OK;HARD;1;OK: Resetting check after 2 hours
[1569323646] SERVICE ALERT: HOST1;SERVICE_4;OK;HARD;1;OK: Resetting check after 2 hours
[1569324123] SERVICE ALERT: HOST1;SERVICE_3;OK;HARD;1;OK: Resetting check after 2 hours
Here is the config for Service 4 (Host and Service name redacted) Any assistance figuring out why we are resetting the check so often would be appreciated.
Here are some examples from our nagios.log file (Service and HOST name redacted) As you can see, Services 1 and 4 were reset more often than the 2 hour freshness check. The passive process is a custom log scraping we wrote and does not include the ability to send the resets themselves.
[1569277851] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569278149] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569278448] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569278747] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569279045] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569279344] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569279642] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569279941] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569280240] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569281134] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569281433] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569281732] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569282030] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569282329] SERVICE ALERT: HOST1;SERVICE_1;OK;HARD;1;OK: Resetting check after 2 hours
[1569284459] SERVICE ALERT: HOST1;SERVICE_2;OK;HARD;1;OK: Resetting check after 2 hours
[1569286497] SERVICE ALERT: HOST1;SERVICE_3;OK;HARD;1;OK: Resetting check after 2 hours
[1569287213] SERVICE ALERT: HOST1;SERVICE_4;OK;HARD;1;OK: Resetting check after 2 hours
[1569289004] SERVICE ALERT: HOST1;SERVICE_4;OK;HARD;1;OK: Resetting check after 2 hours
[1569291693] SERVICE ALERT: HOST1;SERVICE_4;OK;HARD;1;OK: Resetting check after 2 hours
[1569293185] SERVICE ALERT: HOST1;SERVICE_4;OK;HARD;1;OK: Resetting check after 2 hours
[1569308553] SERVICE ALERT: HOST1;SERVICE_5;OK;HARD;1;OK: Resetting check after 2 hours
[1569323646] SERVICE ALERT: HOST1;SERVICE_4;OK;HARD;1;OK: Resetting check after 2 hours
[1569324123] SERVICE ALERT: HOST1;SERVICE_3;OK;HARD;1;OK: Resetting check after 2 hours
Here is the config for Service 4 (Host and Service name redacted) Any assistance figuring out why we are resetting the check so often would be appreciated.
You do not have the required permissions to view the files attached to this post.
Re: Check Freshness running early
According to our documentation, under the "Check Settings" tab, you need to disable active checks, and enable passive checks for the service.
https://assets.nagios.com/downloads/nag ... ios-XI.pdf
From your screenshot, it is not really clear if the service is configured properly, as you selected the "Skip" option... and we haven't seen the template that is being in use.
Can you verify that your active checks are disabled and passive checks enabled for your service? Also, make sure that your PHP and system time are not out of sync:
Admin > System Config > System Profile > View System Info > Date/Time
and that you don't have multiple nagios processes running on your server:
https://assets.nagios.com/downloads/nag ... ios-XI.pdf
From your screenshot, it is not really clear if the service is configured properly, as you selected the "Skip" option... and we haven't seen the template that is being in use.
Can you verify that your active checks are disabled and passive checks enabled for your service? Also, make sure that your PHP and system time are not out of sync:
Admin > System Config > System Profile > View System Info > Date/Time
and that you don't have multiple nagios processes running on your server:
Code: Select all
ps -ef | grep nagios.cfg | grep -v grepYou do not have the required permissions to view the files attached to this post.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Check Freshness running early
The service was created using the passive check wizard in XI. Here is the configuraiton out of the config file:
I have checked the date and time:
PHP Time: Tue, 24 Sep 2019 15:39:06 -0400
System Time: Tue, 24 Sep 2019 15:39:06 -0400
It does appear that 2 copies of Nagios are running. When I run a systemctl restart nagios, it always starts a second copy. I have killed the pids and then ran the start, it is always running 2 copies. This appears to be the case for all 9 of my Nagios XI servers:
Active checks do show enabled, that, but they have no timeframe for when to run:
Code: Select all
define service {
host_name XXXXXXXXXXXXXXXXXX
service_description XXXXXXXXXXXXXXXXXX
use xiwizard_passive_service
servicegroups XXXXXXXXXXXXXXXX
check_command check_dummy!0 "Resetting check after 2 hours"!!!!!!!
max_check_attempts 1
check_period xi_timeperiod_24x7
check_freshness 1
freshness_threshold 7200
event_handler XXXXXXXXXXXXXXXXXXXXXXX
notification_interval 120
notification_period xi_timeperiod_24x7
notifications_enabled 1
contact_groups XXXXXXXXXXXXXXXXXXXXXXXXX
_xiwizard passivecheck
register 1
}
PHP Time: Tue, 24 Sep 2019 15:39:06 -0400
System Time: Tue, 24 Sep 2019 15:39:06 -0400
It does appear that 2 copies of Nagios are running. When I run a systemctl restart nagios, it always starts a second copy. I have killed the pids and then ran the start, it is always running 2 copies. This appears to be the case for all 9 of my Nagios XI servers:
Code: Select all
$ ps -ef | grep nagios.cfg | grep -v grep
nagios 29289 1 0 09:22 ? 00:00:12 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 29313 29289 0 09:22 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
You do not have the required permissions to view the files attached to this post.
Re: Check Freshness running early
The odd thing is that it doesn't always reset after the same number of minutes. Here is the service history for one of the checks.
You do not have the required permissions to view the files attached to this post.
Re: Check Freshness running early
Actually, this is normal... you don't have multiple nagios processes running. One is a child process - look at the PID:It does appear that 2 copies of Nagios are running.
Having said that, it would be easier to troubleshoot the issue if we had some more information. Can you PM me (or any other member of the Nagios Support team) your latest profile (Admin > System Profile > Download Profile), the name of the service in question, and the host it is attached to? Thank you!
You do not have the required permissions to view the files attached to this post.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Check Freshness running early
Please provide the command line to build the profile. I get the following from the GUI:
PROFILE BUILD FAILED
Array
(
)
CODE: 1
PROFILE BUILD FAILED
Array
(
)
CODE: 1
Re: Check Freshness running early
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Check Freshness running early
Replied via PM.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Check Freshness running early
In case anyone else runs into the same issue, we found the problem and fix.
As part of our Maintenance procedure, we had been using the Nagios external commands to disable all checks on the server, then enable them when we were done (DISABLE_HOST_SVC_CHECKS and ENABLE_HOST_SVC_CHECKS). The enable portion of this caused the passive checks to enable their active checks, even if the configuration files had " active_checks_enabled 0".
This could be checked by going to the advanced section of the checks and seeing that the active checks were enabled. When we turned off active checks, everything worked properly again.
As part of our Maintenance procedure, we had been using the Nagios external commands to disable all checks on the server, then enable them when we were done (DISABLE_HOST_SVC_CHECKS and ENABLE_HOST_SVC_CHECKS). The enable portion of this caused the passive checks to enable their active checks, even if the configuration files had " active_checks_enabled 0".
This could be checked by going to the advanced section of the checks and seeing that the active checks were enabled. When we turned off active checks, everything worked properly again.