Page 1 of 1

Soft vs Hard States

Posted: Tue Feb 27, 2018 4:42 am
by invade
Hi.

Could someone explain to me the following events from the log:

[Mon Feb 26 04:50:11 2018] PASSIVE HOST CHECK: host1;0;OK - Feb-26 04:50:01 GMT
[Mon Feb 26 04:50:30 2018] PASSIVE SERVICE CHECK: host1;System-Partitions;0;DISK OK
[Mon Feb 26 05:50:51 2018] Warning: The results of service 'System-Partitions' on host 'host1' are stale by 0d 0h 0m 35s (threshold=0d 1h 0m 0s). I'm forcing an immediate check of the service.
[Mon Feb 26 05:50:51 2018] Warning: The results of host 'host1' are stale by 0d 0h 0m 50s (threshold=0d 1h 0m 0s). I'm forcing an immediate check of the host.
[Mon Feb 26 05:51:01 2018] HOST ALERT: host1;DOWN;SOFT;1;CRITICAL: No Recent Passive Host Checks.
[Mon Feb 26 05:51:01 2018] SERVICE ALERT: host1;System-Partitions;CRITICAL;HARD;1;CRITICAL: No Recent Passive Service Checks.
[Mon Feb 26 05:51:01 2018] SERVICE NOTIFICATION: host1;host1;System-Partitions;CRITICAL;notifyservice-host1;CRITICAL: No Recent Passive Service Checks.

We have a Nagios system that receives passive only checks from numerous hosts. The checks run every five minutes, have a max checks attempts value of one, and have a freshness threshold of one hour.

In the example above the host went down just after 04:50 and so an hour later Nagios ran the check command (check_dummy) resulting in a critical alert status.

What I don't understand is why the host check went into a soft state but the service alert went into a hard state (resulting in a notification). The same thing happens for every host and service check on all our systems.

Ideally I would like both host and service alerts to use the same state in the event that no checks are received.

I did have a look through the forums and I can see, what appears to be people asking a similar question but, no conclusive explanation. Apologies if the answer is already out there.

If further configuration details are required then I can add them to the thread.

Thanks in advance.

Re: Soft vs Hard States

Posted: Tue Feb 27, 2018 12:37 pm
by mcapra
Here's the official documentation regarding state types (It's worth mentioning there's a typo on that page. CTRL+F notifified):
https://assets.nagios.com/downloads/nag ... types.html
Hard states occur for hosts and services in the following situations:
...
  • When a service check results in a non-OK state and its corresponding host is either DOWN or UNREACHABLE.
Per that, since your host went into a DOWN state before the service check entered a CRITICAL state:

Code: Select all

[Mon Feb 26 05:51:01 2018] HOST ALERT: host1;DOWN;SOFT;1;CRITICAL: No Recent Passive Host Checks.
[Mon Feb 26 05:51:01 2018] SERVICE ALERT: host1;System-Partitions;CRITICAL;HARD;1;CRITICAL: No Recent Passive Service Checks.
The service's critical state was flagged as a HARD state.

As for why the host is even in a soft state to begin with (because by your description it shouldn't be), can you share the host definition as well as any templates being applied?

Re: Soft vs Hard States

Posted: Wed Feb 28, 2018 4:27 pm
by npolovenko
Thanks, @mcapra!
@invade, We'll be able to tell why your host never went into a Hard State after we see how it's defined.

max_check_attempts: This directive is used to define the number of times that Nagios will retry the host check command if it returns any state other than an OK state.
A host will be in a Soft state until the maximum number of allowed retries is reached.
https://assets.nagios.com/downloads/nag ... tions.html

Code: Select all

Setting this value to 1 will cause Nagios to generate an alert without retrying the host check.
That's probably what you want.

Re: Soft vs Hard States

Posted: Mon Mar 05, 2018 4:43 am
by invade
Below are the host and service definitions as requested:

Code: Select all

define host{
        name                            passive-host-customers     ; The name of this host template
        notifications_enabled           1       ; Host notifications are enabled
        active_checks_enabled           0
        passive_checks_enabled          1       ;
        event_handler_enabled           1       ; Host event handler is enabled
        flap_detection_enabled          0       ; Flap detection is enabled
        process_perf_data               1       ; Process performance data
        retain_status_information       1       ; Retain status information across program restarts
        retain_nonstatus_information    1       ; Retain non-status information across program restarts
        freshness_threshold             3600    ; Complain if the data received is more than 10 minutes old.
        check_command                   host_stale   ; Report staleness
        check_interval                  0       ; Set the check interval to be 0 as this doesn't force active checks, whilst passive ones are working
        max_check_attempts              1
        check_freshness                 1
        notification_interval           0
        notification_period             24x7
        notification_options            d,u,r
        contact_groups                  admins
        register                        0       ; Don't register this template
        }

Code: Select all

define service{
        name                            passive-service-customers ; The 'name' of this service template
        active_checks_enabled           0       ; Active service checks are enabled
        passive_checks_enabled          1       ; Passive service checks are enabled/accepted
        parallelize_check               1       ; Active service checks should be parallelized (disabling this can lead to major performance problems)
        obsess_over_service             0       ; We should obsess over this service (if necessary)
        check_freshness                 1       ; check service 'freshness'
        freshness_threshold             3600    ; Complain if the data received is more than 1 hour old.
        check_command                   service_stale   ; Report staleness
        notifications_enabled           1       ; Service notifications are enabled
        event_handler_enabled           1       ; Service event handler is enabled
        flap_detection_enabled          0       ; Flap detection is enabled .... disabled
        process_perf_data               1       ; Process performance data
        retain_status_information       1       ; Retain status information across program restarts
        retain_nonstatus_information    1       ; Retain non-status information across program restarts
        notification_interval           0       ; Only send notifications on status change by default.
        check_period                    24x7
        check_interval                  0
        retry_interval                  1440
        max_check_attempts              1
        notification_period             24x7
        notification_options            w,u,c,r
        contact_groups                  admins
        register                        0       ; Don't register this template
        }

Re: Soft vs Hard States

Posted: Mon Mar 05, 2018 11:10 am
by scottwilkerson
in your nagios.cfg what is this setting set to?

Code: Select all

passive_host_checks_are_soft
From the docs
https://assets.nagios.com/downloads/nag ... gmain.html
Passive Host Checks Are SOFT Option

Format: passive_host_checks_are_soft=<0/1>
Example: passive_host_checks_are_soft=1

This option determines whether or not Nagios will treat passive host checks as HARD states or SOFT states. By default, a passive host check result will put a host into a HARD state type. You can change this behavior by enabling this option.

0 = Passive host checks are HARD (default)
1 = Passive host checks are SOFT

Re: Soft vs Hard States

Posted: Tue Mar 06, 2018 10:01 am
by invade
I have added the full config file to this post.

Code: Select all

passive_host_checks_are_soft=0

Re: Soft vs Hard States

Posted: Tue Mar 06, 2018 10:54 am
by scottwilkerson
looking into this further I believe this is a bug which was never directly addresses. I have reopened the issue here
https://github.com/NagiosEnterprises/na ... /issues/23

Re: Soft vs Hard States

Posted: Tue Mar 06, 2018 1:11 pm
by invade
Many thanks for the info. I will watch the issue for updates.