Soft vs Hard States

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
invade
Posts: 29
Joined: Thu Nov 16, 2017 7:45 am

Soft vs Hard States

Post by invade »

Hi.

Could someone explain to me the following events from the log:

[Mon Feb 26 04:50:11 2018] PASSIVE HOST CHECK: host1;0;OK - Feb-26 04:50:01 GMT
[Mon Feb 26 04:50:30 2018] PASSIVE SERVICE CHECK: host1;System-Partitions;0;DISK OK
[Mon Feb 26 05:50:51 2018] Warning: The results of service 'System-Partitions' on host 'host1' are stale by 0d 0h 0m 35s (threshold=0d 1h 0m 0s). I'm forcing an immediate check of the service.
[Mon Feb 26 05:50:51 2018] Warning: The results of host 'host1' are stale by 0d 0h 0m 50s (threshold=0d 1h 0m 0s). I'm forcing an immediate check of the host.
[Mon Feb 26 05:51:01 2018] HOST ALERT: host1;DOWN;SOFT;1;CRITICAL: No Recent Passive Host Checks.
[Mon Feb 26 05:51:01 2018] SERVICE ALERT: host1;System-Partitions;CRITICAL;HARD;1;CRITICAL: No Recent Passive Service Checks.
[Mon Feb 26 05:51:01 2018] SERVICE NOTIFICATION: host1;host1;System-Partitions;CRITICAL;notifyservice-host1;CRITICAL: No Recent Passive Service Checks.

We have a Nagios system that receives passive only checks from numerous hosts. The checks run every five minutes, have a max checks attempts value of one, and have a freshness threshold of one hour.

In the example above the host went down just after 04:50 and so an hour later Nagios ran the check command (check_dummy) resulting in a critical alert status.

What I don't understand is why the host check went into a soft state but the service alert went into a hard state (resulting in a notification). The same thing happens for every host and service check on all our systems.

Ideally I would like both host and service alerts to use the same state in the event that no checks are received.

I did have a look through the forums and I can see, what appears to be people asking a similar question but, no conclusive explanation. Apologies if the answer is already out there.

If further configuration details are required then I can add them to the thread.

Thanks in advance.
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Soft vs Hard States

Post by mcapra »

Here's the official documentation regarding state types (It's worth mentioning there's a typo on that page. CTRL+F notifified):
https://assets.nagios.com/downloads/nag ... types.html
Hard states occur for hosts and services in the following situations:
...
  • When a service check results in a non-OK state and its corresponding host is either DOWN or UNREACHABLE.
Per that, since your host went into a DOWN state before the service check entered a CRITICAL state:

Code: Select all

[Mon Feb 26 05:51:01 2018] HOST ALERT: host1;DOWN;SOFT;1;CRITICAL: No Recent Passive Host Checks.
[Mon Feb 26 05:51:01 2018] SERVICE ALERT: host1;System-Partitions;CRITICAL;HARD;1;CRITICAL: No Recent Passive Service Checks.
The service's critical state was flagged as a HARD state.

As for why the host is even in a soft state to begin with (because by your description it shouldn't be), can you share the host definition as well as any templates being applied?
Former Nagios employee
https://www.mcapra.com/
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Soft vs Hard States

Post by npolovenko »

Thanks, @mcapra!
@invade, We'll be able to tell why your host never went into a Hard State after we see how it's defined.

max_check_attempts: This directive is used to define the number of times that Nagios will retry the host check command if it returns any state other than an OK state.
A host will be in a Soft state until the maximum number of allowed retries is reached.
https://assets.nagios.com/downloads/nag ... tions.html

Code: Select all

Setting this value to 1 will cause Nagios to generate an alert without retrying the host check.
That's probably what you want.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
invade
Posts: 29
Joined: Thu Nov 16, 2017 7:45 am

Re: Soft vs Hard States

Post by invade »

Below are the host and service definitions as requested:

Code: Select all

define host{
        name                            passive-host-customers     ; The name of this host template
        notifications_enabled           1       ; Host notifications are enabled
        active_checks_enabled           0
        passive_checks_enabled          1       ;
        event_handler_enabled           1       ; Host event handler is enabled
        flap_detection_enabled          0       ; Flap detection is enabled
        process_perf_data               1       ; Process performance data
        retain_status_information       1       ; Retain status information across program restarts
        retain_nonstatus_information    1       ; Retain non-status information across program restarts
        freshness_threshold             3600    ; Complain if the data received is more than 10 minutes old.
        check_command                   host_stale   ; Report staleness
        check_interval                  0       ; Set the check interval to be 0 as this doesn't force active checks, whilst passive ones are working
        max_check_attempts              1
        check_freshness                 1
        notification_interval           0
        notification_period             24x7
        notification_options            d,u,r
        contact_groups                  admins
        register                        0       ; Don't register this template
        }

Code: Select all

define service{
        name                            passive-service-customers ; The 'name' of this service template
        active_checks_enabled           0       ; Active service checks are enabled
        passive_checks_enabled          1       ; Passive service checks are enabled/accepted
        parallelize_check               1       ; Active service checks should be parallelized (disabling this can lead to major performance problems)
        obsess_over_service             0       ; We should obsess over this service (if necessary)
        check_freshness                 1       ; check service 'freshness'
        freshness_threshold             3600    ; Complain if the data received is more than 1 hour old.
        check_command                   service_stale   ; Report staleness
        notifications_enabled           1       ; Service notifications are enabled
        event_handler_enabled           1       ; Service event handler is enabled
        flap_detection_enabled          0       ; Flap detection is enabled .... disabled
        process_perf_data               1       ; Process performance data
        retain_status_information       1       ; Retain status information across program restarts
        retain_nonstatus_information    1       ; Retain non-status information across program restarts
        notification_interval           0       ; Only send notifications on status change by default.
        check_period                    24x7
        check_interval                  0
        retry_interval                  1440
        max_check_attempts              1
        notification_period             24x7
        notification_options            w,u,c,r
        contact_groups                  admins
        register                        0       ; Don't register this template
        }
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Soft vs Hard States

Post by scottwilkerson »

in your nagios.cfg what is this setting set to?

Code: Select all

passive_host_checks_are_soft
From the docs
https://assets.nagios.com/downloads/nag ... gmain.html
Passive Host Checks Are SOFT Option

Format: passive_host_checks_are_soft=<0/1>
Example: passive_host_checks_are_soft=1

This option determines whether or not Nagios will treat passive host checks as HARD states or SOFT states. By default, a passive host check result will put a host into a HARD state type. You can change this behavior by enabling this option.

0 = Passive host checks are HARD (default)
1 = Passive host checks are SOFT
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
invade
Posts: 29
Joined: Thu Nov 16, 2017 7:45 am

Re: Soft vs Hard States

Post by invade »

I have added the full config file to this post.

Code: Select all

passive_host_checks_are_soft=0
Attachments
nagios.cfg
(43.75 KiB) Downloaded 466 times
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Soft vs Hard States

Post by scottwilkerson »

looking into this further I believe this is a bug which was never directly addresses. I have reopened the issue here
https://github.com/NagiosEnterprises/na ... /issues/23
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
invade
Posts: 29
Joined: Thu Nov 16, 2017 7:45 am

Re: Soft vs Hard States

Post by invade »

Many thanks for the info. I will watch the issue for updates.
Locked