Page 2 of 3
Re: Service alerts are issued prematurely when host goes dow
Posted: Thu Sep 27, 2018 10:57 am
by fasterfourier
Looking at that closer, this seems to be the sequence of events:
-PING services goes into soft critical at T=0s
-Host goes into soft down at T=30s
-Service notifications go out at T=70s
-Service goes into hard critical at T=70s
-Host alerts go out at T=120s
-Host goes into hard down at T=120s
The host in this case has a normal check interval of 5min, a retry check interval of 1min, and a max_check_attempts of 2. The service has a normal check interval of 5min, a retry check interval of 1min, and a max_check_attempts of 3. This is according to the configuration display table in the web UI.
I'm having trouble understanding why the above is happening when the config indicates it should happen otherwise.
Re: Service alerts are issued prematurely when host goes dow
Posted: Thu Sep 27, 2018 11:10 am
by scottwilkerson
I would need to see all the configurations to know for sure, but based on what you say, the service notifications should not be going out at 70s, it shouldn't be until several minutes later.
Re: Service alerts are issued prematurely when host goes dow
Posted: Thu Sep 27, 2018 12:50 pm
by fasterfourier
Are configurations shown in the web GUI a comprehensive summary of all of the linked confugrations? If not, what can I post here to clear this up? I can post the service and host config, along with any linked templates, if that helps.
Re: Service alerts are issued prematurely when host goes dow
Posted: Thu Sep 27, 2018 1:30 pm
by scottwilkerson
Looking again at your nagios.cfg I noticed you are lacking the following 2 directives. While these are supposed to be enabled by default, can you add them, restart nagios and see if the problem persists
Code: Select all
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
If the problem does persist, we would likely need all the configuration files from the system to attempt to re-create the issue
Re: Service alerts are issued prematurely when host goes dow
Posted: Fri Sep 28, 2018 9:42 am
by fasterfourier
Scott, it does look like I have those two options enabled in my nagios.cfg:
Code: Select all
# ENABLE PREDICTIVE HOST DEPENDENCY CHECKS
# This option determines whether or not Nagios will attempt to execute
# checks of hosts when it predicts that future dependency logic test
# may be needed. These predictive checks can help ensure that your
# host dependency logic works well.
# Values:
# 0 = Disable predictive checks
# 1 = Enable predictive checks (default)
enable_predictive_host_dependency_checks=1
# ENABLE PREDICTIVE SERVICE DEPENDENCY CHECKS
# This option determines whether or not Nagios will attempt to execute
# checks of service when it predicts that future dependency logic test
# may be needed. These predictive checks can help ensure that your
# service dependency logic works well.
# Values:
# 0 = Disable predictive checks
# 1 = Enable predictive checks (default)
enable_predictive_service_dependency_checks=1
Is there a way I can get my entire config over to you privately?
Re: Service alerts are issued prematurely when host goes dow
Posted: Fri Sep 28, 2018 9:52 am
by scottwilkerson
You had already posted it here
https://support.nagios.com/forum/viewto ... 84#p263025
I'm sorry when I was searching it yesterday, I somehow didn't see the entries.
What I don't understand is why your services are going into hard critical at T=70s if you in fact have the max_check_attempts set to 3 because you should have 3 1 minute spans before the notifications go out.
Re: Service alerts are issued prematurely when host goes dow
Posted: Fri Sep 28, 2018 10:14 am
by fasterfourier
I don't get it either. I went back in the logs to before I did the 4.4.2 upgrade, and everything happens as expected: 3 checks at 1 minute intervals before the notification goes out:
Code: Select all
Service Ok[08-30-2018 13:47:25] SERVICE ALERT: sbh_annap_t1;PING;OK;HARD;3;PING OK - Packet loss = 0%, RTA = 25.93 ms
Service Critical[08-30-2018 13:42:31] SERVICE ALERT: sbh_annap_t1;PING;CRITICAL;HARD;3;PING CRITICAL - Packet loss = 100%
Service Critical[08-30-2018 13:41:31] SERVICE ALERT: sbh_annap_t1;PING;CRITICAL;SOFT;2;PING CRITICAL - Packet loss = 100%
Service Critical[08-30-2018 13:40:31] SERVICE ALERT: sbh_annap_t1;PING;CRITICAL;SOFT;1;PING CRITICAL - Packet loss = 100%
Re: Service alerts are issued prematurely when host goes dow
Posted: Fri Sep 28, 2018 2:10 pm
by scottwilkerson
After going through this, I have confirmed this is a bug in Core and have filed a bug report on Github
https://github.com/NagiosEnterprises/na ... issues/584
Re: Service alerts are issued prematurely when host goes dow
Posted: Fri Sep 28, 2018 2:36 pm
by fasterfourier
Thank you for thoroughly investigating this, Scott. Do you have any info on when the bug was introduced (so I can roll back to an unaffected version) or whether there is a workaround?
Re: Service alerts are issued prematurely when host goes dow
Posted: Fri Sep 28, 2018 2:40 pm
by scottwilkerson
fasterfourier wrote:Thank you for thoroughly investigating this, Scott. Do you have any info on when the bug was introduced (so I can roll back to an unaffected version) or whether there is a workaround?
My best guess would be at of after 4.4.0
I know 4.3.4 was extremely stable, and would be a good target to go to.