Service alerts are issued prematurely when host goes down

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
fasterfourier
Posts: 16
Joined: Mon Sep 24, 2018 4:36 pm

Re: Service alerts are issued prematurely when host goes dow

Post by fasterfourier »

Looking at that closer, this seems to be the sequence of events:

-PING services goes into soft critical at T=0s
-Host goes into soft down at T=30s
-Service notifications go out at T=70s
-Service goes into hard critical at T=70s
-Host alerts go out at T=120s
-Host goes into hard down at T=120s

The host in this case has a normal check interval of 5min, a retry check interval of 1min, and a max_check_attempts of 2. The service has a normal check interval of 5min, a retry check interval of 1min, and a max_check_attempts of 3. This is according to the configuration display table in the web UI.

I'm having trouble understanding why the above is happening when the config indicates it should happen otherwise.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Service alerts are issued prematurely when host goes dow

Post by scottwilkerson »

I would need to see all the configurations to know for sure, but based on what you say, the service notifications should not be going out at 70s, it shouldn't be until several minutes later.
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
fasterfourier
Posts: 16
Joined: Mon Sep 24, 2018 4:36 pm

Re: Service alerts are issued prematurely when host goes dow

Post by fasterfourier »

Capture.JPG
Are configurations shown in the web GUI a comprehensive summary of all of the linked confugrations? If not, what can I post here to clear this up? I can post the service and host config, along with any linked templates, if that helps.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Service alerts are issued prematurely when host goes dow

Post by scottwilkerson »

Looking again at your nagios.cfg I noticed you are lacking the following 2 directives. While these are supposed to be enabled by default, can you add them, restart nagios and see if the problem persists

Code: Select all

enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
If the problem does persist, we would likely need all the configuration files from the system to attempt to re-create the issue
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
fasterfourier
Posts: 16
Joined: Mon Sep 24, 2018 4:36 pm

Re: Service alerts are issued prematurely when host goes dow

Post by fasterfourier »

Scott, it does look like I have those two options enabled in my nagios.cfg:

Code: Select all

# ENABLE PREDICTIVE HOST DEPENDENCY CHECKS
# This option determines whether or not Nagios will attempt to execute
# checks of hosts when it predicts that future dependency logic test
# may be needed.  These predictive checks can help ensure that your
# host dependency logic works well.
# Values:
#  0 = Disable predictive checks
#  1 = Enable predictive checks (default)

enable_predictive_host_dependency_checks=1



# ENABLE PREDICTIVE SERVICE DEPENDENCY CHECKS
# This option determines whether or not Nagios will attempt to execute
# checks of service when it predicts that future dependency logic test
# may be needed.  These predictive checks can help ensure that your
# service dependency logic works well.
# Values:
#  0 = Disable predictive checks
#  1 = Enable predictive checks (default)

enable_predictive_service_dependency_checks=1
Is there a way I can get my entire config over to you privately?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Service alerts are issued prematurely when host goes dow

Post by scottwilkerson »

You had already posted it here
https://support.nagios.com/forum/viewto ... 84#p263025

I'm sorry when I was searching it yesterday, I somehow didn't see the entries.

What I don't understand is why your services are going into hard critical at T=70s if you in fact have the max_check_attempts set to 3 because you should have 3 1 minute spans before the notifications go out.
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
fasterfourier
Posts: 16
Joined: Mon Sep 24, 2018 4:36 pm

Re: Service alerts are issued prematurely when host goes dow

Post by fasterfourier »

I don't get it either. I went back in the logs to before I did the 4.4.2 upgrade, and everything happens as expected: 3 checks at 1 minute intervals before the notification goes out:

Code: Select all

Service Ok[08-30-2018 13:47:25] SERVICE ALERT: sbh_annap_t1;PING;OK;HARD;3;PING OK - Packet loss = 0%, RTA = 25.93 ms
Service Critical[08-30-2018 13:42:31] SERVICE ALERT: sbh_annap_t1;PING;CRITICAL;HARD;3;PING CRITICAL - Packet loss = 100%
Service Critical[08-30-2018 13:41:31] SERVICE ALERT: sbh_annap_t1;PING;CRITICAL;SOFT;2;PING CRITICAL - Packet loss = 100%
Service Critical[08-30-2018 13:40:31] SERVICE ALERT: sbh_annap_t1;PING;CRITICAL;SOFT;1;PING CRITICAL - Packet loss = 100%
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Service alerts are issued prematurely when host goes dow

Post by scottwilkerson »

After going through this, I have confirmed this is a bug in Core and have filed a bug report on Github
https://github.com/NagiosEnterprises/na ... issues/584
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
fasterfourier
Posts: 16
Joined: Mon Sep 24, 2018 4:36 pm

Re: Service alerts are issued prematurely when host goes dow

Post by fasterfourier »

Thank you for thoroughly investigating this, Scott. Do you have any info on when the bug was introduced (so I can roll back to an unaffected version) or whether there is a workaround?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Service alerts are issued prematurely when host goes dow

Post by scottwilkerson »

fasterfourier wrote:Thank you for thoroughly investigating this, Scott. Do you have any info on when the bug was introduced (so I can roll back to an unaffected version) or whether there is a workaround?
My best guess would be at of after 4.4.0

I know 4.3.4 was extremely stable, and would be a good target to go to.
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
Locked