HOST retry_interval is being disregarded
Posted: Tue Jan 30, 2018 10:33 am
Hi Guys,
We are running Nagiosxi 5.4.4 on a server CentOS 6.9
I would like to bring to your attention a strange behaviour we have noticed.
The duration between check attempts doesn't match the retry_interval
Please have a look to the retry timing of the following example:
[Tue Jan 30 08:51:48 2018] HOST ALERT: 1287060420_00_oneaccess-01;DOWN;SOFT;1;CRITICAL - X.X.X.X: rta nan, lost 100%
[Tue Jan 30 08:52:14 2018] HOST ALERT: 1287060420_00_oneaccess-01;DOWN;SOFT;2;CRITICAL - X.X.X.X: rta nan, lost 100%
[Tue Jan 30 08:53:12 2018] HOST ALERT: 1287060420_00_oneaccess-01;DOWN;SOFT;3;CRITICAL - X.X.X.X: rta nan, lost 100%
[Tue Jan 30 08:55:10 2018] HOST ALERT: 1287060420_00_oneaccess-01;DOWN;HARD;4;CRITICAL - X.X.X.X: rta nan, lost 100%
[Tue Jan 30 15:54:07 2018] HOST ALERT: 1287060420_00_oneaccess-01;UP;HARD;4;TEST retry_interval
[Tue Jan 30 15:55:50 2018] HOST ALERT: 1287060420_00_oneaccess-01;DOWN;SOFT;1;CRITICAL - X.X.X.X: rta nan, lost 100%
[Tue Jan 30 15:56:15 2018] HOST ALERT: 1287060420_00_oneaccess-01;DOWN;SOFT;2;CRITICAL - X.X.X.X: rta nan, lost 100%
[Tue Jan 30 15:57:35 2018] HOST ALERT: 1287060420_00_oneaccess-01;DOWN;SOFT;3;CRITICAL - X.X.X.X: rta nan, lost 100%
[Tue Jan 30 15:59:36 2018] HOST ALERT: 1287060420_00_oneaccess-01;DOWN;HARD;4;CRITICAL - X.X.X.X: rta nan, lost 100%
While the in the host configuration "retry_interval = 3min"
define host {
name generic-host
max_check_attempts 4
check_interval 5
retry_interval 3
active_checks_enabled 1
passive_checks_enabled 1
check_period 24x7
check_freshness 1
event_handler_enabled 1
flap_detection_enabled 1
flap_detection_options o,u,
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
contact_groups MGMT-PBX
notification_interval 0
notification_period 24x7
first_notification_delay 0
notification_options d,
notifications_enabled 1
register 0
}
Thanks in advance for your quick reply
B.Regards
We are running Nagiosxi 5.4.4 on a server CentOS 6.9
I would like to bring to your attention a strange behaviour we have noticed.
The duration between check attempts doesn't match the retry_interval
Please have a look to the retry timing of the following example:
[Tue Jan 30 08:51:48 2018] HOST ALERT: 1287060420_00_oneaccess-01;DOWN;SOFT;1;CRITICAL - X.X.X.X: rta nan, lost 100%
[Tue Jan 30 08:52:14 2018] HOST ALERT: 1287060420_00_oneaccess-01;DOWN;SOFT;2;CRITICAL - X.X.X.X: rta nan, lost 100%
[Tue Jan 30 08:53:12 2018] HOST ALERT: 1287060420_00_oneaccess-01;DOWN;SOFT;3;CRITICAL - X.X.X.X: rta nan, lost 100%
[Tue Jan 30 08:55:10 2018] HOST ALERT: 1287060420_00_oneaccess-01;DOWN;HARD;4;CRITICAL - X.X.X.X: rta nan, lost 100%
[Tue Jan 30 15:54:07 2018] HOST ALERT: 1287060420_00_oneaccess-01;UP;HARD;4;TEST retry_interval
[Tue Jan 30 15:55:50 2018] HOST ALERT: 1287060420_00_oneaccess-01;DOWN;SOFT;1;CRITICAL - X.X.X.X: rta nan, lost 100%
[Tue Jan 30 15:56:15 2018] HOST ALERT: 1287060420_00_oneaccess-01;DOWN;SOFT;2;CRITICAL - X.X.X.X: rta nan, lost 100%
[Tue Jan 30 15:57:35 2018] HOST ALERT: 1287060420_00_oneaccess-01;DOWN;SOFT;3;CRITICAL - X.X.X.X: rta nan, lost 100%
[Tue Jan 30 15:59:36 2018] HOST ALERT: 1287060420_00_oneaccess-01;DOWN;HARD;4;CRITICAL - X.X.X.X: rta nan, lost 100%
While the in the host configuration "retry_interval = 3min"
define host {
name generic-host
max_check_attempts 4
check_interval 5
retry_interval 3
active_checks_enabled 1
passive_checks_enabled 1
check_period 24x7
check_freshness 1
event_handler_enabled 1
flap_detection_enabled 1
flap_detection_options o,u,
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
contact_groups MGMT-PBX
notification_interval 0
notification_period 24x7
first_notification_delay 0
notification_options d,
notifications_enabled 1
register 0
}
Thanks in advance for your quick reply
B.Regards