Service Availability Report seems not accurate for my servic

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
source888
Posts: 13
Joined: Wed Apr 12, 2017 10:10 pm

Service Availability Report seems not accurate for my servic

Post by source888 »

Dear nagios core expert,

I use nagios core to generate availability report for one service , in the availability report chart , i see this service's critical time is totally 1d 1h 52m 43s (this duration seems much bigger than what we observed in reallife in last month ), but from the detailed Service Log Entries, i really can't find how this time is calculated , can you help to point out?
1_1.png
2.png
3.png
Last edited by source888 on Mon Feb 17, 2020 12:21 am, edited 1 time in total.
source888
Posts: 13
Joined: Wed Apr 12, 2017 10:10 pm

Re: Service Availability Report seems not accurate for my se

Post by source888 »

In the whole Dec ,only 3 occurance of the critical alert, and 1 occurance is in the maintenance window, so i think it haven't be calculated in.
6.png
source888
Posts: 13
Joined: Wed Apr 12, 2017 10:10 pm

Re: Service Availability Report seems not accurate for my se

Post by source888 »

When i click in the "Service State Breakdowns" diagram, i see following detail ,and in this diagram , the critical time is only: Critical : (0.148%) 0d 1h 6m 12s for the whole last month , why here it is much smaller?
7_7.png
Last edited by source888 on Mon Feb 17, 2020 12:22 am, edited 1 time in total.
User avatar
tacolover101
Posts: 432
Joined: Mon Apr 10, 2017 11:55 am

Re: Service Availability Report seems not accurate for my se

Post by tacolover101 »

can you please post your host config for an0vm020 and service config for 'Message exchanged' from the /usr/local/nagios/var/objects.cache file?

a few things i'm noticing:
- the host is in maintenance mode at times
- the checks could have dependency
- pending how often these checks run (which looks to be every 24h), there could be complications on what your retries, and max check attempts are set to.
ie. if nagios is only set to check daily, then your interval could be incorrect here
source888
Posts: 13
Joined: Wed Apr 12, 2017 10:10 pm

Re: Service Availability Report seems not accurate for my se

Post by source888 »

Here's the content of host and service config from objects.cache:
define host {
host_name an0
alias an0
address 10.1.*.* (sensitive content processed)
check_command check_tcp!55555
event_handler change_host_svc_notification
contact_groups eai_* (ssensitive content processed)
initial_state o
importance 0
check_interval 5.000000
retry_interval 5.000000
max_check_attempts 3
active_checks_enabled 1
passive_checks_enabled 1
obsess 1
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 1
flap_detection_options a
freshness_threshold 0
check_freshness 0
notification_options d
notifications_enabled 1
notification_interval 0.000000
first_notification_delay 0.000000
stalking_options n
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
_USER nagios
_PASS o*** (sensitive content processed)
_SEMPURL http://an0.***.com:8080/SEMP (sensitive content processed)
_SEC_PORT 10** (sensitive content processed)
}




define service {
host_name an0
service_description Message exchanged
check_period E24x7
check_command check_solace!com*** (sensitive content processed)
contact_groups e_***_group (sensitive content processed)
notification_period E_24x7
initial_state o
importance 0
check_interval 5.000000
retry_interval 5.000000
max_check_attempts 3
is_volatile 0
parallelize_check 1
active_checks_enabled 1
passive_checks_enabled 1
obsess 1
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 1
flap_detection_options a
freshness_threshold 0
check_freshness 0
notification_options r,w,u,c
notifications_enabled 1
notification_interval 0.000000
first_notification_delay 0.000000
stalking_options n
process_perf_data 1
action_url /nagiosil/cgi-bin/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$' onMouseOver='showGraphPopup(this)' onMouseOut='hideGraphPopup()' rel='/nagiosil/cgi-bin/showgraph.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&period=week&rrdopts=-w+450+-j
retain_status_information 1
retain_nonstatus_information 1
}
Last edited by source888 on Mon Feb 17, 2020 12:23 am, edited 1 time in total.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Service Availability Report seems not accurate for my se

Post by scottwilkerson »

source888 wrote:In the whole Dec ,only 3 occurance of the critical alert, and 1 occurance is in the maintenance window, so i think it haven't be calculated in.
6.png
From 12/10 - 12/11 when the service went down during downtime the recovery didn't come until 15h 4m 32s after the downtime ended.

This time counts and is added to the other times.
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
source888
Posts: 13
Joined: Wed Apr 12, 2017 10:10 pm

Re: Service Availability Report seems not accurate for my se

Post by source888 »

Hello,Scott

yes,you're right.According to your thinking manner, i now found the calculation of availability is correct .

What cause the report not correct as i think now seem a issue in "soft recovery " not going to "hard recovery" ,as you can see in the following screenshot , there exist one soft recovery on 2019-11-26 11:03:19 ,but then until 2019-11-27 00:00:00 ,this period are all calculated as service critial , and at this time , i assume when see in the service view , this service is green as it is in soft recovery state.
2020-02-17_11h28_33.png
Attachments
2020-02-17_11h25_57.png
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Service Availability Report seems not accurate for my se

Post by scottwilkerson »

The downtime had already ended before that 9hour downtime so that time counts
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
source888
Posts: 13
Joined: Wed Apr 12, 2017 10:10 pm

Re: Service Availability Report seems not accurate for my se

Post by source888 »

For that 9 hour critical period , the info i can get from availability report and service event log is following:

on 2019-11-26 10:53:19 , there starts a new service critical (HARD)

on 2019-11-26 11:03:19 , there occurs a OK (SOFT)

on 2019-11-26 20:00:01 , nagios process restart

on 2019-11-27 00:00:00 , on availability report , finally see that service become OK (HARD)

so currently my confusion is why this service can't get an OK (HARD) after the first OK (SOFT) on 2019-11-26 11:03:19 . Is it possible there still exist the bug which was mentioned get fixed in 4.4.3 version , that is : * Fixed services in soft states sometimes not switching into hard states (#576) (Jake Omann)
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Service Availability Report seems not accurate for my se

Post by scottwilkerson »

If it was in a SOFT CRITICAL the recovery that is logged should be SOFT. It will switch to a hard (in memory) but the recorded as SOFT so event handlers trigger appropriately.

Here's the docs on state types with examples
https://assets.nagios.com/downloads/nag ... types.html
Re:
Service experiences a SOFT recovery. Event handlers execute, but notification are not sent, as this wasn't a "real" problem. State type is set HARD and check # is reset to 1 immediately after this happens.
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
Locked