Nagios is not using configured retry interval times
Posted: Mon Jul 03, 2017 3:41 pm
Hello,
We have an issue with Nagios XI, which is not using configured retry interval times as expected.
For example, for check the host nwsrma1, whe have these values configured:
Check interval 4 min
Retry interval 1 min
Max check attempts 3 attempts
But when we check Nagios logs, on /var/log/messages, we see:
Jul 3 16:35:35 possrv1 nagios: HOST ALERT: nwsrma1;DOWN;SOFT;1;CRITICAL - 10.10.10.190: rta nan, lost 100%
Jul 3 16:36:05 possrv1 nagios: HOST ALERT: nwsrma1;DOWN;SOFT;2;CRITICAL - 10.10.10.190: rta nan, lost 100%
Jul 3 16:36:20 possrv1 nagios: HOST ALERT: nwsrma1;DOWN;HARD;3;CRITICAL - 10.10.10.190: rta nan, lost 100%
Jul 3 16:36:20 possrv1 nagios: HOST NOTIFICATION: Alejandro Guida;nwsrma1;DOWN;notify-host-by-syslog;CRITICAL - 10.10.10.190: rta nan, lost 100%
Jul 3 16:36:20 possrv1 nagios: HOST NOTIFICATION: Alejandro Guida;nwsrma1;DOWN;notify-host-by-email;CRITICAL - 10.10.10.190: rta nan, lost 100%
Jul 3 16:36:26 possrv1 nagios: HOST ALERT: nwsrma1;UP;HARD;3;OK - 10.10.10.190: rta 3.030ms, lost 0%
Jul 3 16:36:26 possrv1 nagios: HOST NOTIFICATION: Alejandro Guida;nwsrma1;UP;notify-host-by-syslog;OK - 10.10.10.190: rta 3.030ms, lost 0%
Jul 3 16:36:26 possrv1 nagios: HOST NOTIFICATION: Alejandro Guida;nwsrma1;UP;notify-host-by-email;OK - 10.10.10.190: rta 3.030ms, lost 0%
As you can see, the first check is at 16:35:35, the retry check is at 16:36:05 (30 seconds later), and the last retry check before nwsrma1 is marked DOWN in HARD STATE is at 16:36:20 (15 seconds later).
Can you explain to us what is happening, or what files/logs do you need to review this issue? Atthached profile.zip of our system.
Thanks in advance.
Regards.
Linux Distribution: CentOS release 6.5, 64 bits
Manual Installed Nagios XI, version 5.4.5
We have an issue with Nagios XI, which is not using configured retry interval times as expected.
For example, for check the host nwsrma1, whe have these values configured:
Check interval 4 min
Retry interval 1 min
Max check attempts 3 attempts
But when we check Nagios logs, on /var/log/messages, we see:
Jul 3 16:35:35 possrv1 nagios: HOST ALERT: nwsrma1;DOWN;SOFT;1;CRITICAL - 10.10.10.190: rta nan, lost 100%
Jul 3 16:36:05 possrv1 nagios: HOST ALERT: nwsrma1;DOWN;SOFT;2;CRITICAL - 10.10.10.190: rta nan, lost 100%
Jul 3 16:36:20 possrv1 nagios: HOST ALERT: nwsrma1;DOWN;HARD;3;CRITICAL - 10.10.10.190: rta nan, lost 100%
Jul 3 16:36:20 possrv1 nagios: HOST NOTIFICATION: Alejandro Guida;nwsrma1;DOWN;notify-host-by-syslog;CRITICAL - 10.10.10.190: rta nan, lost 100%
Jul 3 16:36:20 possrv1 nagios: HOST NOTIFICATION: Alejandro Guida;nwsrma1;DOWN;notify-host-by-email;CRITICAL - 10.10.10.190: rta nan, lost 100%
Jul 3 16:36:26 possrv1 nagios: HOST ALERT: nwsrma1;UP;HARD;3;OK - 10.10.10.190: rta 3.030ms, lost 0%
Jul 3 16:36:26 possrv1 nagios: HOST NOTIFICATION: Alejandro Guida;nwsrma1;UP;notify-host-by-syslog;OK - 10.10.10.190: rta 3.030ms, lost 0%
Jul 3 16:36:26 possrv1 nagios: HOST NOTIFICATION: Alejandro Guida;nwsrma1;UP;notify-host-by-email;OK - 10.10.10.190: rta 3.030ms, lost 0%
As you can see, the first check is at 16:35:35, the retry check is at 16:36:05 (30 seconds later), and the last retry check before nwsrma1 is marked DOWN in HARD STATE is at 16:36:20 (15 seconds later).
Can you explain to us what is happening, or what files/logs do you need to review this issue? Atthached profile.zip of our system.
Thanks in advance.
Regards.
Linux Distribution: CentOS release 6.5, 64 bits
Manual Installed Nagios XI, version 5.4.5