I have a remote Nagios XI server forwarding events to my main NagiosXI server via NRDP Outbound/Inbound transfers. All these monitors from the remote XI server are setup as passive monitors in the main NagiosXI server.
My problem is that as soon as the Host Down event hits the main NagiosXI server, it generates an alert immediately. Both the Host entry and the Passive Host Template have 4 retries configured.
In fact the Host History shows the event as "Hard 1of 4" instead of stepping through Soft 1 of 4 , Soft 2of 4 , Soft 3 of 4 before it should go "Hard 4 of 4".
Note - this is only happening with Passive Host events, The Passive Service events step through the required number of retries before going Hard as do the Active Host & Service events
I'm running NagioXI 5.4.3 on both the Remote & main NagiosXI servers.
Code: Select all
Date / Time Host Service State State Type Attempt Information
2017-08-06 16:08:03 x NTP Time OK SOFT 2 of 4 NTP OK: Offset 0.001703500748 secs
2017-08-06 16:06:38 x NTP Time CRITICAL SOFT 1 of 4 CRITICAL - Socket timeout after 10 seconds
2017-08-06 12:32:33 x Swap Usage OK SOFT 2 of 4 SWAP OK - 100% free (2047 MB out of 2047 MB)
2017-08-06 12:29:57 x Memory Usage OK SOFT 2 of 4 OK - 86.7% (16355192 kB) free.
2017-08-06 12:25:22 x Linux Service - sshd OK SOFT 2 of 4 openssh-daemon (pid 2327) is running...
2017-08-06 12:25:22 x Load OK SOFT 2 of 4 OK - load average: 0.49, 0.59, 0.72
2017-08-06 12:21:47 x UP HARD 1 of 4 OK - 10.0.32.116: rta 3.747ms, lost 0%
2017-08-06 12:20:12 x DOWN HARD 1 of 4 (Host Check Timed Out On Worker: z)
2017-08-06 12:19:57 x Swap Usage CRITICAL SOFT 1 of 4 (Service Check Timed Out On Worker: z)
2017-08-06 12:14:13 x ]UP HARD 1 of 4 OK - 10.0.32.116: rta 42.305ms, lost 0%
2017-08-06 12:10:47 x Memory Usage CRITICAL SOFT 1 of 4 (Service Check Timed Out On Worker: z)
2017-08-06 12:09:12 x DOWN HARD 1 of 4 (Host Check Timed Out On Worker: z)
2017-08-06 12:08:47 x Linux Service - sshd CRITICAL SOFT 1 of 4 (Service Check Timed Out On Worker: z)
2017-08-06 12:08:27 x Load CRITICAL SOFT 1 of 4 (Service Check Timed Out On Worker:z)