Page 1 of 2
Passive Host Check not honoring check_interval
Posted: Sat Nov 04, 2017 11:02 am
by acefreakz
Hi guys, it's my first post here
am a happy Nagios user!
Ok here's the problem, my Nagios installation (4.3.2) passive host check seemed to be ignoring the check_interval directive. There's some hosts behind firewall, so passive check comes into action. These passive hosts submit passive check via NCPA to my Nagios NRDP server. It works great when the host's up, but when the passive host goes down, it took sometime to go into critical hard state!
Here's my config for passive-host, where check_interval is set to run minutely. The intention is the make the passive host go down (hard state) asap in order to suppress the host's services' notification.
Code: Select all
define host{
name passive-host
use generic-host
active_checks_enabled 0
passive_checks_enabled 1
check_interval 1
max_check_attempts 1
freshness_threshold 120
check_command check_dummy!2!"Host is stale"
register 0
}
I have a debug log enabled, from the result below i observed that the check runs at a interval of 4 minute. why? Did i miss something?
Code: Select all
root@jp-1:/usr/local/nagios/var# grep "puppetmas" nagios.debug | grep "Host Che"
[1509796693.001547] [008.0] [pid=593] ** Host Check Event ==> Host: 'puppetmaster.somedomain.com', Options: 3, Latency: 1.000070 sec
[1509796933.008809] [008.0] [pid=593] ** Host Check Event ==> Host: 'puppetmaster.somedomain.com', Options: 3, Latency: 0.000535 sec
[1509797173.001135] [008.0] [pid=593] ** Host Check Event ==> Host: 'puppetmaster.somedomain.com', Options: 3, Latency: 1.000068 sec
Thank you.
Re: Passive Host Check not honoring check_interval
Posted: Mon Nov 06, 2017 3:06 pm
by tgriep
Passive checks are not run by the Nagios process as they are run by remote systems and the timing for that is set be the remote host and not by Nagios.
Take a look at this link for more details.
https://assets.nagios.com/downloads/nag ... hecks.html
If you want the system to generate an alert after a certain time period, you would have to enable the server to check the freshness of the last results.
Take a look at this link.
https://assets.nagios.com/downloads/nag ... hness.html
In your example, you setup the freshness_threshold to 120 seconds, if you enable freshness by setting the check_freshness option to 1, then after Nagios detects that the check is stale, it will run the check command displaying that is is stale and have a critical status.
check_freshness *: This directive is used to determine whether or not freshness checks are enabled for this service. Values: 0 = disable freshness checks, 1 = enable freshness checks
Re: Passive Host Check not honoring check_interval
Posted: Tue Nov 07, 2017 11:45 am
by acefreakz
Thanks for the response! btw am well aware of the given articles.
I have the passive host check interval defined at 60s as below (NCPA passive cfg)
Code: Select all
[passive checks]
%HOSTNAME%|__HOST__|60 = /system/agent_version
It works fine when the host is up, where it submits to the Nagios NRDP per minute. The problem arise when the passive host is down, where it no longer submits result to Nagios NRDP (Nagios freshness check and freshness threshold is working fine at this stage), the 'check interval' of the passive host happened to run every 4 minute, i'm seeking a way to tune this 4 minute thing to say 2 minute interval, so that my passive host can be down (hard) quicker than the passive host's passive checks (to suppress notifications).
Any idea? Thanks!
Re: Passive Host Check not honoring check_interval
Posted: Tue Nov 07, 2017 5:11 pm
by tgriep
The the check should go stale after 120 seconds and then run the active check with maybe a little delay.
Can you look at the nagios.log file for when that host is stale and post the output here?
Do you have any settings for the host check that are overriding the template causing the longer delay?
Re: Passive Host Check not honoring check_interval
Posted: Sun Nov 12, 2017 6:30 am
by acefreakz
Sorry to get back late!
Here's the nagios log regarding the passive host getting stale:
Code: Select all
[1510484627] Warning: The results of host 'puppetmaster.somedomain.com' are stale by 0d 0h 3m 45s (threshold=0d 0h 2m 0s). I'm forcing an immediate check of the host.
[1510484627] HOST ALERT: puppetmaster.somedomain.com;DOWN;SOFT;1;CRITICAL: Host is stale
[1510484927] Warning: The results of host 'puppetmaster.somedomain.com' are stale by 0d 0h 3m 0s (threshold=0d 0h 2m 0s). I'm forcing an immediate check of the host.
[1510484927] HOST ALERT: puppetmaster.somedomain.com;DOWN;HARD;1;CRITICAL: Host is stale
Let's get to the calculation of the 'delay'.
for 1st alert (soft crit), the delay is estimated at 2m + 3m45s = 5m45s
for 2nd alert (hard crit), the delay is estimated at 2m + 3m = 5m
so, the finally nagios is send 'host down' email notification to me, after 10m45s. My goal is to reduce the delay. I think am good with the threshold set at 2m, it would be best if the check-threshold-execution can be executed much sooner.
Do you have any settings for the host check that are overriding the template causing the longer delay?
Not that am aware of, i double checked the objects.cache file that the checks interval and threshold to be identical to the configuration.
Re: Passive Host Check not honoring check_interval
Posted: Mon Nov 13, 2017 4:34 pm
by tgriep
If you set your check interval to 0, that will set it in to a hard state right away removing the extra delay from going from a hard up state to a soft down state, finally to the hard down state.
That should get the check closer to 2 minutes.
Re: Passive Host Check not honoring check_interval
Posted: Mon Nov 13, 2017 11:06 pm
by acefreakz
Thanks for reply!
If you set your check interval to 0, that will set it in to a hard state right away removing the extra delay from going from a hard up state to a soft down state, finally to the hard down state.
This doesn't work as you said, that the host still go into soft state 1st. It would be great if i can make it to skip the soft state
Code: Select all
[1510630338] HOST ALERT: puppetmaster.somedomain.com;DOWN;SOFT;1;CRITICAL: Host is stale
...
[1510630578] HOST ALERT: puppetmaster.somedomain.com;DOWN;HARD;1;CRITICAL: Host is stale
Below is the host definition from objects.cache file, check_interval is indeed set to 0.
Code: Select all
define host {
host_name puppetmaster.somedomain.com
alias PuppetMaster
address puppetmaster.somedomain.com
parents r610.backend.somedomain.com
check_command check_dummy!2!"Host is stale"
contact_groups admins
notification_period 24x7
initial_state o
importance 0
check_interval 0.000000
retry_interval 1.000000
max_check_attempts 1
active_checks_enabled 0
passive_checks_enabled 1
obsess 0
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 1
flap_detection_options a
freshness_threshold 120
check_freshness 1
notification_options r,d,u
notifications_enabled 1
notification_interval 60.000000
first_notification_delay 0.000000
stalking_options n
process_perf_data 0
retain_status_information 1
retain_nonstatus_information 1
}
Please advise
thank you.
Re: Passive Host Check not honoring check_interval
Posted: Tue Nov 14, 2017 3:21 pm
by tgriep
The only configuration difference I see from my config to yours is that I have obsess enabled.
Try that and let us know if this fixes the issue.
Re: Passive Host Check not honoring check_interval
Posted: Tue Nov 14, 2017 4:14 pm
by tgriep
One more setting you will need to change. This setting is also causing the first soft state. Change it to 0 and that should fix it for you.
# PASSIVE HOST CHECKS ARE SOFT OPTION
# This determines whether or not Nagios will treat passive host
# checks as being HARD or SOFT. By default, a passive host check
# result will put a host into a HARD state type. This can be changed
# by enabling this option.
# Values: 0 = passive checks are HARD, 1 = passive checks are SOFT
Re: Passive Host Check not honoring check_interval
Posted: Wed Nov 15, 2017 11:13 am
by acefreakz
Hi, according to the doc, the obsess property probably will not affect this, but let me try too!
Obsess:
This directive determines whether or not checks for the host will be "obsessed" over using the ochp_command. Values: 0 = disabled, 1 = enabled (default).
I have this property disabled (0). I found that this setting be true for passive service checks, the checks indeed are HARD. But this doesn't apply to passive host apparently in my case, or is it a bug? *unsure*
Code: Select all
# PASSIVE HOST CHECKS ARE SOFT OPTION
# This determines whether or not Nagios will treat passive host
# checks as being HARD or SOFT. By default, a passive host check
# result will put a host into a HARD state type. This can be changed
# by enabling this option.
# Values: 0 = passive checks are HARD, 1 = passive checks are SOFT
passive_host_checks_are_soft=0
Thanks again for the response, i will see if the obsess cmd resolved the 'delay' issue!