Page 1 of 1

State Type Hard Up vs Soft Up

Posted: Wed May 29, 2019 8:30 am
by olmgroup
Hi all,
Earlier this week we were patching some servers that are monitored by Nagios XI. We scheduled downtime in Nagios for these servers as it is part of our patching regime. 2 of these monitored servers have identical Nagios configurations in that they are using Check_tcp to poll a server on port 1790; Once the maintenance was completed one of the server returned with "State Type OK Hard" and the other with "State Type OK Soft" and this is affecting our service availability reports.

So 2 questions:

1) How can we fix the service availability report to accurately show when the service was available
2) How can we stop this issue occurring in the future?

Screenshots shown at this URL
https://photos.google.com/share/AF1QipP ... R0ajF6UUVn

Running Nagios XI Installed Version: 5.5.9

Re: State Type Hard Up vs Soft Up

Posted: Wed May 29, 2019 3:50 pm
by cdienger
The data is coming from nagios.log and files in /usr/local/nagios/var/archives/ so we could edit those if necessary, but I'm not sure why it would have different results like that. I'd like to get the the logs covering the 26th through the 29th as well as a profile from Admin > System Config > System Profile > Download Profile. Please pm these to me - compress the log files if they are not already.

Re: State Type Hard Up vs Soft Up

Posted: Mon Jun 10, 2019 4:21 pm
by cdienger
The logs show the -swas machine was actually unreachable(no route to host) while it was in downtime. When the host is in a non-OK status, then its services will automatically go into a HARD state if they also have a non-OK status:
[Mon May 27 20:00:04 PDT 2019] SERVICE DOWNTIME ALERT: -swas; LOGIN;STARTED; Service has entered a period of scheduled downtime
[Mon May 27 20:00:06 PDT 2019] HOST DOWNTIME ALERT: -swas;STARTED; Host has entered a period of scheduled downtime


[Mon May 27 21:30:44 PDT 2019] HOST ALERT: -swas;DOWN;SOFT;1;CRITICAL - 172.31.120.14: rta nan, lost 100%

[Mon May 27 21:31:33 PDT 2019] SERVICE ALERT: -swas; LOGIN;CRITICAL;HARD;1;connect to address 172.31.120.14 and port 1790: No route to host

[Mon May 27 21:31:45 PDT 2019] HOST ALERT: -swas;DOWN;SOFT;2;CRITICAL - 172.31.120.14: Host unreachable @ 79.99.65.57. rta nan, lost 100%
[Mon May 27 21:32:46 PDT 2019] HOST ALERT: -swas;DOWN;SOFT;3;CRITICAL - 172.31.120.14: Host unreachable @ 79.99.65.57. rta nan, lost 100%
[Mon May 27 21:33:47 PDT 2019] HOST ALERT: -swas;DOWN;SOFT;4;CRITICAL - 172.31.120.14: Host unreachable @ 79.99.65.57. rta nan, lost 100%
[Mon May 27 21:34:45 PDT 2019] HOST ALERT: -swas;UP;SOFT;1;OK - 172.31.120.14: rta 0.819ms, lost 0%

[Mon May 27 21:41:26 PDT 2019] SERVICE ALERT: -swas; LOGIN;OK;SOFT;1;TCP OK - 0.001 second response time on 172.31.120.14 port 1790
Are you using the "Hide scheduled downtime" option found under Advanced when you run the SLA reports? Does this option make a difference?

Re: State Type Hard Up vs Soft Up

Posted: Tue Jun 11, 2019 7:57 am
by olmgroup
I have just tried the "Hide Scheduled Downtime" in the report but the output is the same

Re: State Type Hard Up vs Soft Up

Posted: Tue Jun 11, 2019 10:22 am
by lmiltchev
I have just tried the "Hide Scheduled Downtime" in the report but the output is the same
The output shouldn't be the same, unless the host is still in downtime. The downtime start/end times should be both known in order for math to work in reports... I am not sure if this is the case here.

Re: State Type Hard Up vs Soft Up

Posted: Thu Jun 13, 2019 3:34 am
by olmgroup
Sorry - I rechecked the output and tweaked the report times and can confirm that once I isolated the period where the work took place and excluded scheduled downtime it does show 100% availability on the service.

Thanks for your assistance!

Re: State Type Hard Up vs Soft Up

Posted: Thu Jun 13, 2019 8:34 am
by lmiltchev
Great! Let us know if it is OK to close the topic then. Thank you!