Page 1 of 2
Connection Timeout/Refused State
Posted: Thu Jul 26, 2012 3:22 am
by Fred Kroeger
Hi - Is there a way to change the state of Connection Timeout/Refused events?
Currently it returns a Critical state for these events - I would like it be a Warning State only.
thanks... Fred
Re: Connection Timeout/Refused State
Posted: Thu Jul 26, 2012 9:08 am
by lmiltchev
What is the check (plugin) that is returning "Critical"?
Re: Connection Timeout/Refused State
Posted: Thu Jul 26, 2012 7:32 pm
by Fred Kroeger
check_nrpe - it's the standard plugin installed with NagiosXI 2011R1.8
# ./check_nrpe -h
NRPE Plugin for Nagios
Copyright (c) 1999-2008 Ethan Galstad (
[email protected])
Version: 2.12
Last Modified: 03-10-2008
License: GPL v2 with exemptions (-l for more info)
SSL/TLS Available: Anonymous DH Mode, OpenSSL 0.9.6 or higher required
ahhh... now that I've read the Options list I think the -u flag is what I need.
Thanks
Re: Connection Timeout/Refused State
Posted: Sun Jul 29, 2012 8:40 pm
by Fred Kroeger
OK - the -u flag to nrpe has worked in that if I get a socket timeout from nrpe, I now get an UNKNOWN event
However, I still get a CRITICAL event for "Timeout while attempting connection".
Is there any way I can get these service events to retun an UNKNOWN/WARNING state ?
In nrpe.cfg I have:
command_tiemout=60
connection_timeout=300
and in nagios.cfg I have:
service_check_timeout=90
I can continue to ping the host so I don't get a Host down event but I do get multiple Critical service events with the "Timeout while attempting connection" message.
thanks... Fred
Re: Connection Timeout/Refused State
Posted: Wed Aug 01, 2012 10:11 am
by mguthrie
Check NRPE also has a -t flag that can be set for the plugin timeout, but apart from what was mentioned in the above post, I don't know of any way to change the return code from a timeout other than to modify it in the plugin source.
The other possibility would be to turn off the notification option for UNKNOWN states for these hosts/services.
Re: Connection Timeout/Refused State
Posted: Wed Aug 01, 2012 7:20 pm
by Fred Kroeger
Thanks
Yes I'm using the -t option but that is just the threshold for the command timeout.
Yes I have disabled notifications for UNKNOWN States but the problem is that a CRITICAL state is displayed on a timeout.
Also in a similar vein.....
I have a host that is down & displays a CRITICAL state - this is what I expect.
However, all of the services for that host also show CRITICAL .
So when the host goes down I get a notification for the host event and then one for every service event. Surely if the host can't be reached, Nagios shouldn't be running the service checks and then set the state to UNKNOWN ?
regards.. Fred
Re: Connection Timeout/Refused State
Posted: Thu Aug 02, 2012 10:07 am
by mguthrie
As far as the services go, to my knowledge that is correct behavior. Because services can have their own contacts separate from the host, Nagios will still notify any and all service contacts according to the notification parameters. You could probably set up service dependencies to prevent this behavior, but depending on how many hosts you wanted to do this for that could be quite a few definitions to create.
Re: Connection Timeout/Refused State
Posted: Fri Aug 03, 2012 12:30 am
by Fred Kroeger
Hi - I'm a little confused now?
If a host goes down, why would Nagios be setting all it's services to Critical? Surely they should be Unknown?
Also , if a host is down, why would Nagios be checking the services anyway?
My understanding was that the service checks would be suspended until the host was online again.
Fred
Re: Connection Timeout/Refused State
Posted: Fri Aug 03, 2012 9:12 am
by lmiltchev
If you set up a parent-child relationships in Nagios, and the parent host goes down, the children would be "Unreachable", instead of "Down".
However, it doesn't work this way with services. Nagios keeps checking all of the services even if the host is "Down". It makes sense to use this kind of logic, because the host may show "Critical" but the services may be perfectly fine. Consider this scenario:
Ping response is disabled on a linux host. The host would be in a critical state. Nagios will be still checking all of the services (disk, cpu, and memory usage, open files, etc.) All checks will be in a OK state but the Ping check. I hope this makes sense.
Re: Connection Timeout/Refused State
Posted: Sun Aug 05, 2012 11:55 pm
by Fred Kroeger
The parent/Child relationships of hosts makes sense.
But if we're using ping (check_icmp) to determine the reachability of a host then you would expect that if you can't ping the server, then it is DOWN.
In this scenario, why would you want to continue checking the services.
What is the purpose then of the check_host_freshness & host_freshness_check interval settings ? My understanding was that if the host_freshness had exceeded it's threshold, then a host_check is initaited before the service_check. To me there appears a dependancy on the host being up before the service_check runs.
So, in the setup decsribed in the previous reply, how do I avoid the 10 services I am monitoring on a host going CRITICAL and triggering all the alerting when a host goes DOWN ? I would be getting 11 SMS alerts (1 for the host & 1 for each service). I am interfacing Nagios into a ServiceDesk system, so it also means that I am creating 11 Incidents when a host goes Down.