Passive Host Check not honoring check_interval

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
acefreakz
Posts: 9
Joined: Mon Dec 26, 2016 6:20 am

Passive Host Check not honoring check_interval

Post by acefreakz »

Hi guys, it's my first post here :D am a happy Nagios user!

Ok here's the problem, my Nagios installation (4.3.2) passive host check seemed to be ignoring the check_interval directive. There's some hosts behind firewall, so passive check comes into action. These passive hosts submit passive check via NCPA to my Nagios NRDP server. It works great when the host's up, but when the passive host goes down, it took sometime to go into critical hard state!

Here's my config for passive-host, where check_interval is set to run minutely. The intention is the make the passive host go down (hard state) asap in order to suppress the host's services' notification.

Code: Select all

define host{
    name                            passive-host
    use                             generic-host
    active_checks_enabled           0
    passive_checks_enabled          1
    check_interval                  1
    max_check_attempts              1
    freshness_threshold             120
    check_command                   check_dummy!2!"Host is stale"
    register                        0
}
I have a debug log enabled, from the result below i observed that the check runs at a interval of 4 minute. why? Did i miss something?

Code: Select all

root@jp-1:/usr/local/nagios/var# grep "puppetmas" nagios.debug  | grep "Host Che"
[1509796693.001547] [008.0] [pid=593] ** Host Check Event ==> Host: 'puppetmaster.somedomain.com', Options: 3, Latency: 1.000070 sec
[1509796933.008809] [008.0] [pid=593] ** Host Check Event ==> Host: 'puppetmaster.somedomain.com', Options: 3, Latency: 0.000535 sec
[1509797173.001135] [008.0] [pid=593] ** Host Check Event ==> Host: 'puppetmaster.somedomain.com', Options: 3, Latency: 1.000068 sec
Thank you.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Passive Host Check not honoring check_interval

Post by tgriep »

Passive checks are not run by the Nagios process as they are run by remote systems and the timing for that is set be the remote host and not by Nagios.
Take a look at this link for more details.
https://assets.nagios.com/downloads/nag ... hecks.html

If you want the system to generate an alert after a certain time period, you would have to enable the server to check the freshness of the last results.
Take a look at this link.
https://assets.nagios.com/downloads/nag ... hness.html

In your example, you setup the freshness_threshold to 120 seconds, if you enable freshness by setting the check_freshness option to 1, then after Nagios detects that the check is stale, it will run the check command displaying that is is stale and have a critical status.
check_freshness *: This directive is used to determine whether or not freshness checks are enabled for this service. Values: 0 = disable freshness checks, 1 = enable freshness checks
Be sure to check out our Knowledgebase for helpful articles and solutions!
acefreakz
Posts: 9
Joined: Mon Dec 26, 2016 6:20 am

Re: Passive Host Check not honoring check_interval

Post by acefreakz »

Thanks for the response! btw am well aware of the given articles.

I have the passive host check interval defined at 60s as below (NCPA passive cfg)

Code: Select all

[passive checks]
%HOSTNAME%|__HOST__|60 = /system/agent_version
It works fine when the host is up, where it submits to the Nagios NRDP per minute. The problem arise when the passive host is down, where it no longer submits result to Nagios NRDP (Nagios freshness check and freshness threshold is working fine at this stage), the 'check interval' of the passive host happened to run every 4 minute, i'm seeking a way to tune this 4 minute thing to say 2 minute interval, so that my passive host can be down (hard) quicker than the passive host's passive checks (to suppress notifications).

Any idea? Thanks!
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Passive Host Check not honoring check_interval

Post by tgriep »

The the check should go stale after 120 seconds and then run the active check with maybe a little delay.
Can you look at the nagios.log file for when that host is stale and post the output here?
Do you have any settings for the host check that are overriding the template causing the longer delay?
Be sure to check out our Knowledgebase for helpful articles and solutions!
acefreakz
Posts: 9
Joined: Mon Dec 26, 2016 6:20 am

Re: Passive Host Check not honoring check_interval

Post by acefreakz »

Sorry to get back late!

Here's the nagios log regarding the passive host getting stale:

Code: Select all

[1510484627] Warning: The results of host 'puppetmaster.somedomain.com' are stale by 0d 0h 3m 45s (threshold=0d 0h 2m 0s).  I'm forcing an immediate check of the host.
[1510484627] HOST ALERT: puppetmaster.somedomain.com;DOWN;SOFT;1;CRITICAL: Host is stale
[1510484927] Warning: The results of host 'puppetmaster.somedomain.com' are stale by 0d 0h 3m 0s (threshold=0d 0h 2m 0s).  I'm forcing an immediate check of the host.
[1510484927] HOST ALERT: puppetmaster.somedomain.com;DOWN;HARD;1;CRITICAL: Host is stale
Let's get to the calculation of the 'delay'.
for 1st alert (soft crit), the delay is estimated at 2m + 3m45s = 5m45s
for 2nd alert (hard crit), the delay is estimated at 2m + 3m = 5m
so, the finally nagios is send 'host down' email notification to me, after 10m45s. My goal is to reduce the delay. I think am good with the threshold set at 2m, it would be best if the check-threshold-execution can be executed much sooner.
Do you have any settings for the host check that are overriding the template causing the longer delay?
Not that am aware of, i double checked the objects.cache file that the checks interval and threshold to be identical to the configuration.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Passive Host Check not honoring check_interval

Post by tgriep »

If you set your check interval to 0, that will set it in to a hard state right away removing the extra delay from going from a hard up state to a soft down state, finally to the hard down state.
That should get the check closer to 2 minutes.
Be sure to check out our Knowledgebase for helpful articles and solutions!
acefreakz
Posts: 9
Joined: Mon Dec 26, 2016 6:20 am

Re: Passive Host Check not honoring check_interval

Post by acefreakz »

Thanks for reply!
If you set your check interval to 0, that will set it in to a hard state right away removing the extra delay from going from a hard up state to a soft down state, finally to the hard down state.
This doesn't work as you said, that the host still go into soft state 1st. It would be great if i can make it to skip the soft state :)

Code: Select all

[1510630338] HOST ALERT: puppetmaster.somedomain.com;DOWN;SOFT;1;CRITICAL: Host is stale
...
[1510630578] HOST ALERT: puppetmaster.somedomain.com;DOWN;HARD;1;CRITICAL: Host is stale
Below is the host definition from objects.cache file, check_interval is indeed set to 0.

Code: Select all

define host {
    host_name   puppetmaster.somedomain.com
    alias   PuppetMaster
    address puppetmaster.somedomain.com
    parents r610.backend.somedomain.com
    check_command   check_dummy!2!"Host is stale"
    contact_groups  admins
    notification_period 24x7
    initial_state   o
    importance  0
    check_interval  0.000000
    retry_interval  1.000000
    max_check_attempts  1
    active_checks_enabled   0
    passive_checks_enabled  1
    obsess  0
    event_handler_enabled   1
    low_flap_threshold  0.000000
    high_flap_threshold 0.000000
    flap_detection_enabled  1
    flap_detection_options  a
    freshness_threshold 120
    check_freshness 1
    notification_options    r,d,u
    notifications_enabled   1
    notification_interval   60.000000
    first_notification_delay    0.000000
    stalking_options    n
    process_perf_data   0
    retain_status_information   1
    retain_nonstatus_information    1
    }
Please advise :) thank you.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Passive Host Check not honoring check_interval

Post by tgriep »

The only configuration difference I see from my config to yours is that I have obsess enabled.

Code: Select all

obsess  1
Try that and let us know if this fixes the issue.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Passive Host Check not honoring check_interval

Post by tgriep »

One more setting you will need to change. This setting is also causing the first soft state. Change it to 0 and that should fix it for you.
# PASSIVE HOST CHECKS ARE SOFT OPTION
# This determines whether or not Nagios will treat passive host
# checks as being HARD or SOFT. By default, a passive host check
# result will put a host into a HARD state type. This can be changed
# by enabling this option.
# Values: 0 = passive checks are HARD, 1 = passive checks are SOFT

Code: Select all

passive_host_checks_are_soft=0
Be sure to check out our Knowledgebase for helpful articles and solutions!
acefreakz
Posts: 9
Joined: Mon Dec 26, 2016 6:20 am

Re: Passive Host Check not honoring check_interval

Post by acefreakz »

Hi, according to the doc, the obsess property probably will not affect this, but let me try too!

Obsess:
This directive determines whether or not checks for the host will be "obsessed" over using the ochp_command. Values: 0 = disabled, 1 = enabled (default).
I have this property disabled (0). I found that this setting be true for passive service checks, the checks indeed are HARD. But this doesn't apply to passive host apparently in my case, or is it a bug? *unsure*

Code: Select all

# PASSIVE HOST CHECKS ARE SOFT OPTION
# This determines whether or not Nagios will treat passive host
# checks as being HARD or SOFT.  By default, a passive host check
# result will put a host into a HARD state type.  This can be changed
# by enabling this option.
# Values: 0 = passive checks are HARD, 1 = passive checks are SOFT
passive_host_checks_are_soft=0
Thanks again for the response, i will see if the obsess cmd resolved the 'delay' issue!
Locked