after upgrading from 4.3.4 to 4.4.0 the service parents aren't working correct anymore. When a service enters an "unknown" or "error" state, rechecking always fails. When I force a recheck via webinterface, the time for "next scheduled check" raises, but "last check time" stays at the same value.
So I activated the debug log with following settings:
Code: Select all
debug_level=16
debug_verbosity=2
Code: Select all
[1530604260.156238] [016.0] [pid=1139] Scheduling a forced, active check of service 'CPU Load' on host 'backup' @ Tue Jul 3 09:50:58 2018
[1530604260.156268] [016.2] [pid=1139] Found another service check event for this service @ Tue Jul 3 09:51:34 2018
[1530604260.156279] [016.2] [pid=1139] New service check event is forced and occurs before the existing event, so the new event will be used instead.
[1530604260.156291] [016.2] [pid=1139] Scheduling new service check event.
[1530604260.156324] [016.0] [pid=1139] Attempting to run scheduled check of service 'CPU Load' on host 'backup': check options=1, latency=0.000015
[1530604260.156355] [016.2] [pid=1139] Execution parents for this service failed, so it will not be actively checked.
[1530604260.156364] [016.1] [pid=1139] Unable to run scheduled service check at this time
[1530604260.156376] [016.1] [pid=1139] Rescheduled next service check for Tue Jul 3 09:56:00 2018
[1530604260.156385] [016.0] [pid=1139] Scheduling a forced, active check of service 'CPU Load' on host 'backup' @ Tue Jul 3 09:56:00 2018
[1530604260.156392] [016.2] [pid=1139] Scheduling new service check event.
[1530604262.285685] [016.0] [pid=1139] Attempting to run scheduled check of service 'CPU Usage' on host 'backup': check options=0, latency=0.000000
[1530604262.285745] [016.2] [pid=1139] Execution parents for this service failed, so it will not be actively checked.
[1530604262.285791] [016.1] [pid=1139] Unable to run scheduled service check at this time
[1530604262.285802] [016.1] [pid=1139] Rescheduled next service check for Tue Jul 3 09:56:02 2018
[1530604262.285810] [016.0] [pid=1139] Scheduling a non-forced, active check of service 'CPU Usage' on host 'backup' @ Tue Jul 3 09:56:02 2018
[1530604262.285822] [016.2] [pid=1139] Scheduling new service check event.
This is only one example, I've seen this behaviour on other checks with parents and on a second nagios instance which I also upgraded to version 4.4.0. The upgrade to 4.4.1 didn't solve the Problem. The check runs without error, when I remove the parents line from the service definition.
The service definition as an example:
Code: Select all
define service{
use generic-service,graphed-service
host_name backup
service_description CPU Load
parents PING
check_command check_nrpe!check_load
servicegroups system-health
}
Stephan