Passive Host Check not honoring check_interval

acefreakz · Post by **acefreakz** » Sat Nov 04, 2017 11:02 am

Hi guys, it's my first post here

am a happy Nagios user!

Ok here's the problem, my Nagios installation (4.3.2) passive host check seemed to be ignoring the check_interval directive. There's some hosts behind firewall, so passive check comes into action. These passive hosts submit passive check via NCPA to my Nagios NRDP server. It works great when the host's up, but when the passive host goes down, it took sometime to go into critical hard state!

Here's my config for passive-host, where check_interval is set to run minutely. The intention is the make the passive host go down (hard state) asap in order to suppress the host's services' notification.

Code: Select all

define host{
    name                            passive-host
    use                             generic-host
    active_checks_enabled           0
    passive_checks_enabled          1
    check_interval                  1
    max_check_attempts              1
    freshness_threshold             120
    check_command                   check_dummy!2!"Host is stale"
    register                        0
}

I have a debug log enabled, from the result below i observed that the check runs at a interval of 4 minute. why? Did i miss something?

Code: Select all

root@jp-1:/usr/local/nagios/var# grep "puppetmas" nagios.debug  | grep "Host Che"
[1509796693.001547] [008.0] [pid=593] ** Host Check Event ==> Host: 'puppetmaster.somedomain.com', Options: 3, Latency: 1.000070 sec
[1509796933.008809] [008.0] [pid=593] ** Host Check Event ==> Host: 'puppetmaster.somedomain.com', Options: 3, Latency: 0.000535 sec
[1509797173.001135] [008.0] [pid=593] ** Host Check Event ==> Host: 'puppetmaster.somedomain.com', Options: 3, Latency: 1.000068 sec

Thank you.

Post by **tgriep** » Mon Nov 06, 2017 3:06 pm

Passive checks are not run by the Nagios process as they are run by remote systems and the timing for that is set be the remote host and not by Nagios.
Take a look at this link for more details.
https://assets.nagios.com/downloads/nag ... hecks.html

If you want the system to generate an alert after a certain time period, you would have to enable the server to check the freshness of the last results.
Take a look at this link.
https://assets.nagios.com/downloads/nag ... hness.html

In your example, you setup the freshness_threshold to 120 seconds, if you enable freshness by setting the check_freshness option to 1, then after Nagios detects that the check is stale, it will run the check command displaying that is is stale and have a critical status.

check_freshness *: This directive is used to determine whether or not freshness checks are enabled for this service. Values: 0 = disable freshness checks, 1 = enable freshness checks

acefreakz · Post by **acefreakz** » Tue Nov 07, 2017 11:45 am

Thanks for the response! btw am well aware of the given articles.

I have the passive host check interval defined at 60s as below (NCPA passive cfg)

Code: Select all

[passive checks]
%HOSTNAME%|__HOST__|60 = /system/agent_version

It works fine when the host is up, where it submits to the Nagios NRDP per minute. The problem arise when the passive host is down, where it no longer submits result to Nagios NRDP (Nagios freshness check and freshness threshold is working fine at this stage), the 'check interval' of the passive host happened to run every 4 minute, i'm seeking a way to tune this 4 minute thing to say 2 minute interval, so that my passive host can be down (hard) quicker than the passive host's passive checks (to suppress notifications).

Any idea? Thanks!

Post by **tgriep** » Tue Nov 07, 2017 5:11 pm

The the check should go stale after 120 seconds and then run the active check with maybe a little delay.
Can you look at the nagios.log file for when that host is stale and post the output here?
Do you have any settings for the host check that are overriding the template causing the longer delay?

acefreakz · Post by **acefreakz** » Sun Nov 12, 2017 6:30 am

Sorry to get back late!

Here's the nagios log regarding the passive host getting stale:

Code: Select all

[1510484627] Warning: The results of host 'puppetmaster.somedomain.com' are stale by 0d 0h 3m 45s (threshold=0d 0h 2m 0s).  I'm forcing an immediate check of the host.
[1510484627] HOST ALERT: puppetmaster.somedomain.com;DOWN;SOFT;1;CRITICAL: Host is stale
[1510484927] Warning: The results of host 'puppetmaster.somedomain.com' are stale by 0d 0h 3m 0s (threshold=0d 0h 2m 0s).  I'm forcing an immediate check of the host.
[1510484927] HOST ALERT: puppetmaster.somedomain.com;DOWN;HARD;1;CRITICAL: Host is stale

Let's get to the calculation of the 'delay'.
for 1st alert (soft crit), the delay is estimated at 2m + 3m45s = 5m45s
for 2nd alert (hard crit), the delay is estimated at 2m + 3m = 5m
so, the finally nagios is send 'host down' email notification to me, after 10m45s. My goal is to reduce the delay. I think am good with the threshold set at 2m, it would be best if the check-threshold-execution can be executed much sooner.

Do you have any settings for the host check that are overriding the template causing the longer delay?

Not that am aware of, i double checked the objects.cache file that the checks interval and threshold to be identical to the configuration.

Post by **tgriep** » Mon Nov 13, 2017 4:34 pm

If you set your check interval to 0, that will set it in to a hard state right away removing the extra delay from going from a hard up state to a soft down state, finally to the hard down state.
That should get the check closer to 2 minutes.

acefreakz · Post by **acefreakz** » Mon Nov 13, 2017 11:06 pm

Thanks for reply!

If you set your check interval to 0, that will set it in to a hard state right away removing the extra delay from going from a hard up state to a soft down state, finally to the hard down state.

This doesn't work as you said, that the host still go into soft state 1st. It would be great if i can make it to skip the soft state

Code: Select all

[1510630338] HOST ALERT: puppetmaster.somedomain.com;DOWN;SOFT;1;CRITICAL: Host is stale
...
[1510630578] HOST ALERT: puppetmaster.somedomain.com;DOWN;HARD;1;CRITICAL: Host is stale

Below is the host definition from objects.cache file, check_interval is indeed set to 0.

Code: Select all

define host {
    host_name   puppetmaster.somedomain.com
    alias   PuppetMaster
    address puppetmaster.somedomain.com
    parents r610.backend.somedomain.com
    check_command   check_dummy!2!"Host is stale"
    contact_groups  admins
    notification_period 24x7
    initial_state   o
    importance  0
    check_interval  0.000000
    retry_interval  1.000000
    max_check_attempts  1
    active_checks_enabled   0
    passive_checks_enabled  1
    obsess  0
    event_handler_enabled   1
    low_flap_threshold  0.000000
    high_flap_threshold 0.000000
    flap_detection_enabled  1
    flap_detection_options  a
    freshness_threshold 120
    check_freshness 1
    notification_options    r,d,u
    notifications_enabled   1
    notification_interval   60.000000
    first_notification_delay    0.000000
    stalking_options    n
    process_perf_data   0
    retain_status_information   1
    retain_nonstatus_information    1
    }

Please advise

thank you.

Post by **tgriep** » Tue Nov 14, 2017 3:21 pm

The only configuration difference I see from my config to yours is that I have obsess enabled.

Code: Select all

obsess  1

Try that and let us know if this fixes the issue.

Post by **tgriep** » Tue Nov 14, 2017 4:14 pm

One more setting you will need to change. This setting is also causing the first soft state. Change it to 0 and that should fix it for you.

# PASSIVE HOST CHECKS ARE SOFT OPTION
# This determines whether or not Nagios will treat passive host
# checks as being HARD or SOFT. By default, a passive host check
# result will put a host into a HARD state type. This can be changed
# by enabling this option.
# Values: 0 = passive checks are HARD, 1 = passive checks are SOFT

Code: Select all

passive_host_checks_are_soft=0

acefreakz · Post by **acefreakz** » Wed Nov 15, 2017 11:13 am

Hi, according to the doc, the obsess property probably will not affect this, but let me try too!

Obsess:

This directive determines whether or not checks for the host will be "obsessed" over using the ochp_command. Values: 0 = disabled, 1 = enabled (default).

I have this property disabled (0). I found that this setting be true for passive service checks, the checks indeed are HARD. But this doesn't apply to passive host apparently in my case, or is it a bug? *unsure*

Code: Select all

# PASSIVE HOST CHECKS ARE SOFT OPTION
# This determines whether or not Nagios will treat passive host
# checks as being HARD or SOFT.  By default, a passive host check
# result will put a host into a HARD state type.  This can be changed
# by enabling this option.
# Values: 0 = passive checks are HARD, 1 = passive checks are SOFT
passive_host_checks_are_soft=0

Thanks again for the response, i will see if the obsess cmd resolved the 'delay' issue!

Nagios Support Forum

Passive Host Check not honoring check_interval

Passive Host Check not honoring check_interval

Re: Passive Host Check not honoring check_interval

Re: Passive Host Check not honoring check_interval

Re: Passive Host Check not honoring check_interval

Re: Passive Host Check not honoring check_interval

Re: Passive Host Check not honoring check_interval

Re: Passive Host Check not honoring check_interval

Re: Passive Host Check not honoring check_interval

Re: Passive Host Check not honoring check_interval

Re: Passive Host Check not honoring check_interval