Hi.
Could someone explain to me the following events from the log:
[Mon Feb 26 04:50:11 2018] PASSIVE HOST CHECK: host1;0;OK - Feb-26 04:50:01 GMT
[Mon Feb 26 04:50:30 2018] PASSIVE SERVICE CHECK: host1;System-Partitions;0;DISK OK
[Mon Feb 26 05:50:51 2018] Warning: The results of service 'System-Partitions' on host 'host1' are stale by 0d 0h 0m 35s (threshold=0d 1h 0m 0s). I'm forcing an immediate check of the service.
[Mon Feb 26 05:50:51 2018] Warning: The results of host 'host1' are stale by 0d 0h 0m 50s (threshold=0d 1h 0m 0s). I'm forcing an immediate check of the host.
[Mon Feb 26 05:51:01 2018] HOST ALERT: host1;DOWN;SOFT;1;CRITICAL: No Recent Passive Host Checks.
[Mon Feb 26 05:51:01 2018] SERVICE ALERT: host1;System-Partitions;CRITICAL;HARD;1;CRITICAL: No Recent Passive Service Checks.
[Mon Feb 26 05:51:01 2018] SERVICE NOTIFICATION: host1;host1;System-Partitions;CRITICAL;notifyservice-host1;CRITICAL: No Recent Passive Service Checks.
We have a Nagios system that receives passive only checks from numerous hosts. The checks run every five minutes, have a max checks attempts value of one, and have a freshness threshold of one hour.
In the example above the host went down just after 04:50 and so an hour later Nagios ran the check command (check_dummy) resulting in a critical alert status.
What I don't understand is why the host check went into a soft state but the service alert went into a hard state (resulting in a notification). The same thing happens for every host and service check on all our systems.
Ideally I would like both host and service alerts to use the same state in the event that no checks are received.
I did have a look through the forums and I can see, what appears to be people asking a similar question but, no conclusive explanation. Apologies if the answer is already out there.
If further configuration details are required then I can add them to the thread.
Thanks in advance.
Soft vs Hard States
Re: Soft vs Hard States
Here's the official documentation regarding state types (It's worth mentioning there's a typo on that page. CTRL+F notifified):
https://assets.nagios.com/downloads/nag ... types.html
The service's critical state was flagged as a HARD state.
As for why the host is even in a soft state to begin with (because by your description it shouldn't be), can you share the host definition as well as any templates being applied?
https://assets.nagios.com/downloads/nag ... types.html
Per that, since your host went into a DOWN state before the service check entered a CRITICAL state:Hard states occur for hosts and services in the following situations:
...
- When a service check results in a non-OK state and its corresponding host is either DOWN or UNREACHABLE.
Code: Select all
[Mon Feb 26 05:51:01 2018] HOST ALERT: host1;DOWN;SOFT;1;CRITICAL: No Recent Passive Host Checks.
[Mon Feb 26 05:51:01 2018] SERVICE ALERT: host1;System-Partitions;CRITICAL;HARD;1;CRITICAL: No Recent Passive Service Checks.
As for why the host is even in a soft state to begin with (because by your description it shouldn't be), can you share the host definition as well as any templates being applied?
Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
-
npolovenko
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: Soft vs Hard States
Thanks, @mcapra!
@invade, We'll be able to tell why your host never went into a Hard State after we see how it's defined.
max_check_attempts: This directive is used to define the number of times that Nagios will retry the host check command if it returns any state other than an OK state.
A host will be in a Soft state until the maximum number of allowed retries is reached.
https://assets.nagios.com/downloads/nag ... tions.html
That's probably what you want.
@invade, We'll be able to tell why your host never went into a Hard State after we see how it's defined.
max_check_attempts: This directive is used to define the number of times that Nagios will retry the host check command if it returns any state other than an OK state.
A host will be in a Soft state until the maximum number of allowed retries is reached.
https://assets.nagios.com/downloads/nag ... tions.html
Code: Select all
Setting this value to 1 will cause Nagios to generate an alert without retrying the host check.As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Soft vs Hard States
Below are the host and service definitions as requested:
Code: Select all
define host{
name passive-host-customers ; The name of this host template
notifications_enabled 1 ; Host notifications are enabled
active_checks_enabled 0
passive_checks_enabled 1 ;
event_handler_enabled 1 ; Host event handler is enabled
flap_detection_enabled 0 ; Flap detection is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
freshness_threshold 3600 ; Complain if the data received is more than 10 minutes old.
check_command host_stale ; Report staleness
check_interval 0 ; Set the check interval to be 0 as this doesn't force active checks, whilst passive ones are working
max_check_attempts 1
check_freshness 1
notification_interval 0
notification_period 24x7
notification_options d,u,r
contact_groups admins
register 0 ; Don't register this template
}Code: Select all
define service{
name passive-service-customers ; The 'name' of this service template
active_checks_enabled 0 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service 0 ; We should obsess over this service (if necessary)
check_freshness 1 ; check service 'freshness'
freshness_threshold 3600 ; Complain if the data received is more than 1 hour old.
check_command service_stale ; Report staleness
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 0 ; Flap detection is enabled .... disabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
notification_interval 0 ; Only send notifications on status change by default.
check_period 24x7
check_interval 0
retry_interval 1440
max_check_attempts 1
notification_period 24x7
notification_options w,u,c,r
contact_groups admins
register 0 ; Don't register this template
}-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Soft vs Hard States
in your nagios.cfg what is this setting set to?
From the docs
https://assets.nagios.com/downloads/nag ... gmain.html
Code: Select all
passive_host_checks_are_softhttps://assets.nagios.com/downloads/nag ... gmain.html
Passive Host Checks Are SOFT Option
Format: passive_host_checks_are_soft=<0/1>
Example: passive_host_checks_are_soft=1
This option determines whether or not Nagios will treat passive host checks as HARD states or SOFT states. By default, a passive host check result will put a host into a HARD state type. You can change this behavior by enabling this option.
0 = Passive host checks are HARD (default)
1 = Passive host checks are SOFT
Re: Soft vs Hard States
I have added the full config file to this post.
Code: Select all
passive_host_checks_are_soft=0- Attachments
-
nagios.cfg- (43.75 KiB) Downloaded 466 times
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Soft vs Hard States
looking into this further I believe this is a bug which was never directly addresses. I have reopened the issue here
https://github.com/NagiosEnterprises/na ... /issues/23
https://github.com/NagiosEnterprises/na ... /issues/23
Re: Soft vs Hard States
Many thanks for the info. I will watch the issue for updates.