Connection Timeout/Refused State

Fred Kroeger · Post by **Fred Kroeger** » Thu Jul 26, 2012 3:22 am

Hi - Is there a way to change the state of Connection Timeout/Refused events?
Currently it returns a Critical state for these events - I would like it be a Warning State only.

thanks... Fred

Post by **lmiltchev** » Thu Jul 26, 2012 9:08 am

What is the check (plugin) that is returning "Critical"?

Fred Kroeger · Post by **Fred Kroeger** » Thu Jul 26, 2012 7:32 pm

check_nrpe - it's the standard plugin installed with NagiosXI 2011R1.8

# ./check_nrpe -h
NRPE Plugin for Nagios
Copyright (c) 1999-2008 Ethan Galstad (nagios@nagios.org)
Version: 2.12
Last Modified: 03-10-2008
License: GPL v2 with exemptions (-l for more info)
SSL/TLS Available: Anonymous DH Mode, OpenSSL 0.9.6 or higher required

ahhh... now that I've read the Options list I think the -u flag is what I need.

Thanks

Fred Kroeger · Post by **Fred Kroeger** » Sun Jul 29, 2012 8:40 pm

OK - the -u flag to nrpe has worked in that if I get a socket timeout from nrpe, I now get an UNKNOWN event

However, I still get a CRITICAL event for "Timeout while attempting connection".
Is there any way I can get these service events to retun an UNKNOWN/WARNING state ?
In nrpe.cfg I have:
command_tiemout=60
connection_timeout=300

and in nagios.cfg I have:
service_check_timeout=90

I can continue to ping the host so I don't get a Host down event but I do get multiple Critical service events with the "Timeout while attempting connection" message.

thanks... Fred

mguthrie · Post by **mguthrie** » Wed Aug 01, 2012 10:11 am

Check NRPE also has a -t flag that can be set for the plugin timeout, but apart from what was mentioned in the above post, I don't know of any way to change the return code from a timeout other than to modify it in the plugin source.

The other possibility would be to turn off the notification option for UNKNOWN states for these hosts/services.

Fred Kroeger · Post by **Fred Kroeger** » Wed Aug 01, 2012 7:20 pm

Thanks
Yes I'm using the -t option but that is just the threshold for the command timeout.
Yes I have disabled notifications for UNKNOWN States but the problem is that a CRITICAL state is displayed on a timeout.

Also in a similar vein.....
I have a host that is down & displays a CRITICAL state - this is what I expect.
However, all of the services for that host also show CRITICAL .
So when the host goes down I get a notification for the host event and then one for every service event. Surely if the host can't be reached, Nagios shouldn't be running the service checks and then set the state to UNKNOWN ?

regards.. Fred

mguthrie · Post by **mguthrie** » Thu Aug 02, 2012 10:07 am

As far as the services go, to my knowledge that is correct behavior. Because services can have their own contacts separate from the host, Nagios will still notify any and all service contacts according to the notification parameters. You could probably set up service dependencies to prevent this behavior, but depending on how many hosts you wanted to do this for that could be quite a few definitions to create.

Fred Kroeger · Post by **Fred Kroeger** » Fri Aug 03, 2012 12:30 am

Hi - I'm a little confused now?
If a host goes down, why would Nagios be setting all it's services to Critical? Surely they should be Unknown?
Also , if a host is down, why would Nagios be checking the services anyway?
My understanding was that the service checks would be suspended until the host was online again.

Fred

Post by **lmiltchev** » Fri Aug 03, 2012 9:12 am

If you set up a parent-child relationships in Nagios, and the parent host goes down, the children would be "Unreachable", instead of "Down".
However, it doesn't work this way with services. Nagios keeps checking all of the services even if the host is "Down". It makes sense to use this kind of logic, because the host may show "Critical" but the services may be perfectly fine. Consider this scenario:
Ping response is disabled on a linux host. The host would be in a critical state. Nagios will be still checking all of the services (disk, cpu, and memory usage, open files, etc.) All checks will be in a OK state but the Ping check. I hope this makes sense.

Fred Kroeger · Post by **Fred Kroeger** » Sun Aug 05, 2012 11:55 pm

The parent/Child relationships of hosts makes sense.
But if we're using ping (check_icmp) to determine the reachability of a host then you would expect that if you can't ping the server, then it is DOWN.
In this scenario, why would you want to continue checking the services.
What is the purpose then of the check_host_freshness & host_freshness_check interval settings ? My understanding was that if the host_freshness had exceeded it's threshold, then a host_check is initaited before the service_check. To me there appears a dependancy on the host being up before the service_check runs.

So, in the setup decsribed in the previous reply, how do I avoid the 10 services I am monitoring on a host going CRITICAL and triggering all the alerting when a host goes DOWN ? I would be getting 11 SMS alerts (1 for the host & 1 for each service). I am interfacing Nagios into a ServiceDesk system, so it also means that I am creating 11 Incidents when a host goes Down.

Nagios Support Forum

Connection Timeout/Refused State

Connection Timeout/Refused State

Re: Connection Timeout/Refused State

Re: Connection Timeout/Refused State

Re: Connection Timeout/Refused State

Re: Connection Timeout/Refused State

Re: Connection Timeout/Refused State

Re: Connection Timeout/Refused State

Re: Connection Timeout/Refused State

Re: Connection Timeout/Refused State

Re: Connection Timeout/Refused State