Service alerts are issued prematurely when host goes down
-
- Posts: 16
- Joined: Mon Sep 24, 2018 4:36 pm
Re: Service alerts are issued prematurely when host goes dow
Looking at that closer, this seems to be the sequence of events:
-PING services goes into soft critical at T=0s
-Host goes into soft down at T=30s
-Service notifications go out at T=70s
-Service goes into hard critical at T=70s
-Host alerts go out at T=120s
-Host goes into hard down at T=120s
The host in this case has a normal check interval of 5min, a retry check interval of 1min, and a max_check_attempts of 2. The service has a normal check interval of 5min, a retry check interval of 1min, and a max_check_attempts of 3. This is according to the configuration display table in the web UI.
I'm having trouble understanding why the above is happening when the config indicates it should happen otherwise.
-PING services goes into soft critical at T=0s
-Host goes into soft down at T=30s
-Service notifications go out at T=70s
-Service goes into hard critical at T=70s
-Host alerts go out at T=120s
-Host goes into hard down at T=120s
The host in this case has a normal check interval of 5min, a retry check interval of 1min, and a max_check_attempts of 2. The service has a normal check interval of 5min, a retry check interval of 1min, and a max_check_attempts of 3. This is according to the configuration display table in the web UI.
I'm having trouble understanding why the above is happening when the config indicates it should happen otherwise.
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Service alerts are issued prematurely when host goes dow
I would need to see all the configurations to know for sure, but based on what you say, the service notifications should not be going out at 70s, it shouldn't be until several minutes later.
-
- Posts: 16
- Joined: Mon Sep 24, 2018 4:36 pm
Re: Service alerts are issued prematurely when host goes dow
Are configurations shown in the web GUI a comprehensive summary of all of the linked confugrations? If not, what can I post here to clear this up? I can post the service and host config, along with any linked templates, if that helps.
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Service alerts are issued prematurely when host goes dow
Looking again at your nagios.cfg I noticed you are lacking the following 2 directives. While these are supposed to be enabled by default, can you add them, restart nagios and see if the problem persists
If the problem does persist, we would likely need all the configuration files from the system to attempt to re-create the issue
Code: Select all
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
-
- Posts: 16
- Joined: Mon Sep 24, 2018 4:36 pm
Re: Service alerts are issued prematurely when host goes dow
Scott, it does look like I have those two options enabled in my nagios.cfg:
Is there a way I can get my entire config over to you privately?
Code: Select all
# ENABLE PREDICTIVE HOST DEPENDENCY CHECKS
# This option determines whether or not Nagios will attempt to execute
# checks of hosts when it predicts that future dependency logic test
# may be needed. These predictive checks can help ensure that your
# host dependency logic works well.
# Values:
# 0 = Disable predictive checks
# 1 = Enable predictive checks (default)
enable_predictive_host_dependency_checks=1
# ENABLE PREDICTIVE SERVICE DEPENDENCY CHECKS
# This option determines whether or not Nagios will attempt to execute
# checks of service when it predicts that future dependency logic test
# may be needed. These predictive checks can help ensure that your
# service dependency logic works well.
# Values:
# 0 = Disable predictive checks
# 1 = Enable predictive checks (default)
enable_predictive_service_dependency_checks=1
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Service alerts are issued prematurely when host goes dow
You had already posted it here
https://support.nagios.com/forum/viewto ... 84#p263025
I'm sorry when I was searching it yesterday, I somehow didn't see the entries.
What I don't understand is why your services are going into hard critical at T=70s if you in fact have the max_check_attempts set to 3 because you should have 3 1 minute spans before the notifications go out.
https://support.nagios.com/forum/viewto ... 84#p263025
I'm sorry when I was searching it yesterday, I somehow didn't see the entries.
What I don't understand is why your services are going into hard critical at T=70s if you in fact have the max_check_attempts set to 3 because you should have 3 1 minute spans before the notifications go out.
-
- Posts: 16
- Joined: Mon Sep 24, 2018 4:36 pm
Re: Service alerts are issued prematurely when host goes dow
I don't get it either. I went back in the logs to before I did the 4.4.2 upgrade, and everything happens as expected: 3 checks at 1 minute intervals before the notification goes out:
Code: Select all
Service Ok[08-30-2018 13:47:25] SERVICE ALERT: sbh_annap_t1;PING;OK;HARD;3;PING OK - Packet loss = 0%, RTA = 25.93 ms
Service Critical[08-30-2018 13:42:31] SERVICE ALERT: sbh_annap_t1;PING;CRITICAL;HARD;3;PING CRITICAL - Packet loss = 100%
Service Critical[08-30-2018 13:41:31] SERVICE ALERT: sbh_annap_t1;PING;CRITICAL;SOFT;2;PING CRITICAL - Packet loss = 100%
Service Critical[08-30-2018 13:40:31] SERVICE ALERT: sbh_annap_t1;PING;CRITICAL;SOFT;1;PING CRITICAL - Packet loss = 100%
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Service alerts are issued prematurely when host goes dow
After going through this, I have confirmed this is a bug in Core and have filed a bug report on Github
https://github.com/NagiosEnterprises/na ... issues/584
https://github.com/NagiosEnterprises/na ... issues/584
-
- Posts: 16
- Joined: Mon Sep 24, 2018 4:36 pm
Re: Service alerts are issued prematurely when host goes dow
Thank you for thoroughly investigating this, Scott. Do you have any info on when the bug was introduced (so I can roll back to an unaffected version) or whether there is a workaround?
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Service alerts are issued prematurely when host goes dow
My best guess would be at of after 4.4.0fasterfourier wrote:Thank you for thoroughly investigating this, Scott. Do you have any info on when the bug was introduced (so I can roll back to an unaffected version) or whether there is a workaround?
I know 4.3.4 was extremely stable, and would be a good target to go to.