Page 1 of 1
duration value?
Posted: Fri Apr 08, 2016 9:36 am
by jriker1
So I am using the latest version of Nagios core, 4.1.1 at the time of this writing, and the Duration value seems off. I just had a computer randomly reboot with no log data, and Nagios is still showing 25 days duration which I'm assuming is uptime as that's what it says it is in the details page, however I've restarted this thing many time for Windows updates and the like beyond this failure. Is it just assuming if there were no alerts the system has been up for that time? And just to clarify, when I see:
Host Status: UP (for 25d 12h 39m 56s)
To me that means it hasn't gone done for any reason in 25 days.
Thanks.
JR
Re: duration value?
Posted: Fri Apr 08, 2016 10:16 am
by hsmith
That doesn't seem right. How often are you performing checks against this host to make sure that it's up?
Re: duration value?
Posted: Fri Apr 08, 2016 10:24 am
by jriker1
Uptime check is using the standard generic-service service template. It's set to 24x7 and check_interval of 10.
Thanks.
JR
EDIT: Clarifying, my Uptime check shows 0 days, 3 hours, 2 minutes. The built in I guess status when you click on Hosts on the sidebar is showing this. No idea where the "Duration" gets set on that particular summary page.
Re: duration value?
Posted: Fri Apr 08, 2016 10:44 am
by rkennedy
If a check runs, and the reboot happens before the next check runs, then Nagios will assume that the system didn't go down at all. You can always lower the check time to say 3 minutes, and at that point it would probably catch the updates / reboot.
I do believe that a system is still pingable during updates though, so it won't be going down for very long.
Re: duration value?
Posted: Fri Apr 08, 2016 10:57 am
by jriker1
Thanks. Out of the box, which is the case here, where does that value come from in the config? I have an Uptime check on individual servers, but at the "Hosts" level looks like it's something that's pre-existing that I never set/added. Guessing also the duration check is OS agnostic so it doesn't connect to the server physically to get uptime information.
Should probably ask also if that display value can be overwritten. So like on a PC if I'm NSClient++ to get Uptime, can that data be used to drive the "Duration" on the main page?
Thanks.
JR
Re: duration value?
Posted: Fri Apr 08, 2016 11:26 am
by jriker1
Think I figured it out but correct me if I'm wrong. Replace the check-host-alive value for check_command on the host template with a new one?
EDIT: OK that didn't fully work. So now the "Status Information" shows System Uptime - 0 day(s) 4 hour(s) 12 minute(s) but the duration still at 2d 22h 30m 24s and ticking. Looking more and more like the duration just keeps ticking away on some internal timer until whatever process is checking if the system is up returns a failure in the job. Thought I could tie it into real analytics but not looking that way.
Thanks.
JR
Re: duration value?
Posted: Fri Apr 08, 2016 2:02 pm
by rkennedy
jriker1 wrote:Think I figured it out but correct me if I'm wrong. Replace the check-host-alive value for check_command on the host template with a new one?
EDIT: OK that didn't fully work. So now the "Status Information" shows System Uptime - 0 day(s) 4 hour(s) 12 minute(s) but the duration still at 2d 22h 30m 24s and ticking. Looking more and more like the duration just keeps ticking away on some internal timer until whatever process is checking if the system is up returns a failure in the job. Thought I could tie it into real analytics but not looking that way.
Thanks.
JR
The duration is going to show how long the 'OK' state has been active. Since the state hasn't changed, even when you made the changes to the host command, it is going to stay at an increasing duration.
You're at the right place though on where to change it, it's just the host command per that host. (or template as you've indicated)