Host recovery emails being sent while host is unreachable

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
BIB
Posts: 46
Joined: Tue Dec 27, 2016 3:13 pm

Re: Host recovery emails being sent while host is unreachabl

Post by BIB »

check_ping v2.0.3 (nagios-plugins 2.0.3)
check_icmp v2.0.3 (nagios-plugins 2.0.3)
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Host recovery emails being sent while host is unreachabl

Post by rkennedy »

Do you have any custom event handlers or notifications running by chance?

Also, please PM over a profile for me to review. (Admin -> System Profile -> Download Profile) - the one posted previously was just the text version.

I suspect your timings may be too close, or a setting is causing the problem here.

EDIT: profile received.
Former Nagios Employee
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Host recovery emails being sent while host is unreachabl

Post by rkennedy »

Do you have any global event handlers in place? I can't seem to find anything in your profile that stands out. I am seeing core workers time out in your syslog, but it's hard to say if that's related.

How often does this occur?
Former Nagios Employee
BIB
Posts: 46
Joined: Tue Dec 27, 2016 3:13 pm

Re: Host recovery emails being sent while host is unreachabl

Post by BIB »

This happens very often, there is no exact period. Time between two "FALSE RECOVERY" notifications goes from a few minutes to a couple of days.
I do not know we have configured any custom event handler, you can find attached print screen of configuration related to event handlers.
You do not have the required permissions to view the files attached to this post.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Host recovery emails being sent while host is unreachabl

Post by tgriep »

The next time the problem happens, can you run the check_icmp command as root on the XI server in verbose mode and post the output so we can see what the plugin is doing?
Run the example below but change the xxx.xxx.xxx.xxx to the correct IP address.

Code: Select all

/usr/local/nagios/libexec/check_icmp -H xxx.xxx.xxx.xxx  -w 3000.0,80% -c 5000.0,100% -p 5 -v
Be sure to check out our Knowledgebase for helpful articles and solutions!
BIB
Posts: 46
Joined: Tue Dec 27, 2016 3:13 pm

Re: Host recovery emails being sent while host is unreachabl

Post by BIB »

Output attached. Problem occured multiple times in short periodo of time, different hosts. 3 occurences attached.
You do not have the required permissions to view the files attached to this post.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Host recovery emails being sent while host is unreachabl

Post by tgriep »

The output of those checks look to be correct, the host was down and the check confirmed it.
What I an looking for is when you know the host is down and the plugin says that it is not down. That would email out the recovery.
The next time you see this happen, can you post the following file here so we can view the log entries?

Code: Select all

/usr/local/nagios/var/nagios.log
Be sure to check out our Knowledgebase for helpful articles and solutions!
BIB
Posts: 46
Joined: Tue Dec 27, 2016 3:13 pm

Re: Host recovery emails being sent while host is unreachabl

Post by BIB »

Log file attached. This time, we received false recovery for a ping based service on a host which is down. Host unreachable notification and false recovery notification are below:


Notification Type: PROBLEM
Host: TLC_s_BG_146
State: DOWN
Address: 10.34.146.4
Info: CRITICAL - 10.34.146.4: rta nan, lost 100%
Date/Time: 2017-01-23 11:27:37


*************************************************************************

Notification Type: RECOVERY
Service: TLC_PING 5000-70 10000-100
State: OK
Host: TLC_s_BG_146
Info: PING OK - Packet loss = 16%, RTA = 0.32 ms
Address: 10.34.146.4
Date/Time: 2017-01-23 11:52:38
You do not have the required permissions to view the files attached to this post.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Host recovery emails being sent while host is unreachabl

Post by tgriep »

This is what I think is what is happening.
The check interval for the host check is set to 3 minutes and the service check is set to 1 minute.
If the host came up temporarily in the 3 minute host check interval, the service check will send out a recovery but the host check will not as it missed the host being up for a short time.
What you could do is to change the host's settings for the check interval to be less than the service check, that would stop the service from sending notifications until the host check recovers.
If you have any more questions, let us know.
Be sure to check out our Knowledgebase for helpful articles and solutions!
BIB
Posts: 46
Joined: Tue Dec 27, 2016 3:13 pm

Re: Host recovery emails being sent while host is unreachabl

Post by BIB »

Thank you for your response. We already checked this issue, no success. We will change check settings again and follow up what happens.
Our problem is that host is actually down (powered off), no reply possible, and this false recovery notifications occur.
Locked