Page 1 of 1

Services, Plugins, and Status

Posted: Mon Feb 23, 2015 5:22 pm
by CountryLife08
Good evening all,

I have gotten into place a Nagios Core installation on CentOS 6.5 and everything is running nice and smooth. I am using various plugins to monitor data from our machines and network devices and I am only having an issue with 1 of them. I posted earlier about Check_SNMP and how to modify the return status message in nagios, and that all worked out excellently.

Yesterday a section of the network went down and the technician who was working was confused because as the SNMP service checks timeout (the equipment was offline) the service went to "Unknown" status instead of the -c status I had programmed in. I checked the Nagios.cfg file and service timeouts are set to go critical not unknown. What am I missing?


Thanks,
J

Re: Services, Plugins, and Status

Posted: Mon Feb 23, 2015 5:32 pm
by jdalrymple
Hi,

After reviewing the post I think you're referencing - http://support.nagios.com/forum/viewtopic.php?t=30535 I have to say I'm confused as to what your solution was. As I read it, it looks like Luke left you with the info that if that snmp check times out trying to contact the snmp daemon that you'll get UNKNOWN status and that you needed to make perhaps a more elaborate check solution. I think specifically he meant to implement a service dependency, but it's hard to say without asking him directly. What was your solution to get the service status desired when the snmp connection timed out? If we know that then we can try to identify why it didn't work for you.

Thanks

Re: Services, Plugins, and Status

Posted: Thu Feb 26, 2015 8:46 pm
by CountryLife08
Hey,

Sorry if I am a bit confusing, I am new to working this in depth with network monitoring, so I am still semi learning what exactly means what and how things work. So originally I used the -r, -w, and -c values for working equipment and tried to use ping to check up vs. down but alas with some switches being in a building across town I struck out as I can't find a way to ping abc.abc.def.def:9210 unless I am missing something in the ping/check host alive options. So I found this nifty little section of the nagios.cfg and went back to trying with SNMP

Code: Select all

# SERVICE CHECK TIMEOUT STATE
# This setting determines the state Nagios will report when a
# service check times out - that is does not respond within
# service_check_timeout seconds.  This can be useful if a
# machine is running at too high a load and you do not want
# to consider a failed service check to be critical (the default).
# Valid settings are:
# c - Critical (default)
# u - Unknown
# w - Warning
# o - OK

service_check_timeout_state=c
This to me means that if a service check times out then Nagios should interpret the timeout as a critical issue vs unknown. I guess the appropriate question is actually Is check_snmp a service check? or something entirely different?

Thanks,
J

P.S. Sorry for the newbieness :D :geek:

Re: Services, Plugins, and Status

Posted: Thu Feb 26, 2015 9:31 pm
by Box293
CountryLife08 wrote:This to me means that if a service check times out then Nagios should interpret the timeout as a critical issue vs unknown. I guess the appropriate question is actually Is check_snmp a service check? or something entirely different?
I think I understand your issue.

First lets talk about a timeout from a plugin perspective.

check_something -H 10.10.10.10 -t 30 argument argument blah blah

Where -t 30 means the timeout is 30 seconds, thats how long the plugin is allowed to run.

If 10.10.10.10 cannot be reached within 30 seconds, the plugin with return an UNKNOWN state (assume this plugin is programmed this way).

So from Nagios's perspective, the plugin completed successfully and did not timeout. It accepts the state as UNKNOWN.

Now lets talk about timeout from a Nagios perspective.

Nagios has a default setting of 60 seconds for service check timeouts. This means that if the plugin does not return a result within 60 seconds, nagios treats this as a timeout and will return the state as per service_check_timeout_state=c.

Sooooo if I ran the plugin like:

check_something -H 10.10.10.10 -t 90 argument argument blah blah

This time I allowed 90 seconds for the pluging to run. Because this is longer than nagios's internal 60 second timeout, one the service check runs for 60 seconds nagios will treat this as a timeout and will return the state as per service_check_timeout_state=c.

Does that make sense?

Re: Services, Plugins, and Status

Posted: Sat Feb 28, 2015 8:29 pm
by CountryLife08
It makes sense and I think I completely understand how the plugins and nagios timeouts work together!

Well due to the lovely ice and snow we are out of office and someone turned the server off, I will let you know tomorrow night if it works.

Re: Services, Plugins, and Status

Posted: Sun Mar 01, 2015 4:22 pm
by CountryLife08
Box 293,


Worked like it should for SNMP! Thank you!! Now when a piece of equipment goes down it shows as critical vs. unknown.

Thank you,

J