Active Check Following a Non-OK Check Occurs Too Soon
Posted: Wed Nov 01, 2023 12:19 pm
Nagios Core 4.4.4
With the following parameters set in the cfg file for a service check:
check_period 24x7
max_check_attempts 4
check_interval 7
retry_interval 6
The following behavior was observed:
[10-29-2023 19:09:01] SERVICE ALERT: <hostname>;<service description>;CRITICAL;SOFT;1;<summary>
[10-29-2023 19:10:31] SERVICE ALERT: <hostname>;<service description>;CRITICAL;SOFT;2;<summary>
[10-29-2023 19:15:24] SERVICE ALERT: <hostname>;<service description>;CRITICAL;SOFT;3;<summary>
[10-29-2023 19:21:24] SERVICE ALERT: <hostname>;<service description>;CRITICAL;HARD;4;<summary>
The delta time between Soft 1 and Soft 2 was 1 minute 30 seconds | should have been 6 minutes
The delta time between Soft 2 and Soft 3 was 4 minutes 53 seconds | should have been 6 minutes
The delta time between Soft 3 and Hard 4 was 6 minutes 0 seconds | This was correct. 6 minutes
The Nagios server involved is a Linux RHEL 7 VM server with almost no load or CPU utilization.
The Nagios Program-Wide Performance Information chart is below: If anyone can give insight as to why Nagios service checks fire before they are supposed to during a retry, any information would be greatly appreciated. Also, if additional information is needed please so advise.
Thanks in Advance!
With the following parameters set in the cfg file for a service check:
check_period 24x7
max_check_attempts 4
check_interval 7
retry_interval 6
The following behavior was observed:
[10-29-2023 19:09:01] SERVICE ALERT: <hostname>;<service description>;CRITICAL;SOFT;1;<summary>
[10-29-2023 19:10:31] SERVICE ALERT: <hostname>;<service description>;CRITICAL;SOFT;2;<summary>
[10-29-2023 19:15:24] SERVICE ALERT: <hostname>;<service description>;CRITICAL;SOFT;3;<summary>
[10-29-2023 19:21:24] SERVICE ALERT: <hostname>;<service description>;CRITICAL;HARD;4;<summary>
The delta time between Soft 1 and Soft 2 was 1 minute 30 seconds | should have been 6 minutes
The delta time between Soft 2 and Soft 3 was 4 minutes 53 seconds | should have been 6 minutes
The delta time between Soft 3 and Hard 4 was 6 minutes 0 seconds | This was correct. 6 minutes
The Nagios server involved is a Linux RHEL 7 VM server with almost no load or CPU utilization.
The Nagios Program-Wide Performance Information chart is below: If anyone can give insight as to why Nagios service checks fire before they are supposed to during a retry, any information would be greatly appreciated. Also, if additional information is needed please so advise.
Thanks in Advance!