Connection Timeout/Refused State

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Connection Timeout/Refused State

Post by Fred Kroeger »

Hi - Is there a way to change the state of Connection Timeout/Refused events?
Currently it returns a Critical state for these events - I would like it be a Warning State only.

thanks... Fred
Last edited by Fred Kroeger on Wed Aug 01, 2012 2:10 am, edited 1 time in total.
User avatar
lmiltchev
Former Nagios Staff
Posts: 13587
Joined: Mon May 23, 2011 12:15 pm

Re: Connection Timeout/Refused State

Post by lmiltchev »

What is the check (plugin) that is returning "Critical"?
Be sure to check out our Knowledgebase for helpful articles and solutions!
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: Connection Timeout/Refused State

Post by Fred Kroeger »

check_nrpe - it's the standard plugin installed with NagiosXI 2011R1.8

# ./check_nrpe -h
NRPE Plugin for Nagios
Copyright (c) 1999-2008 Ethan Galstad (nagios@nagios.org)
Version: 2.12
Last Modified: 03-10-2008
License: GPL v2 with exemptions (-l for more info)
SSL/TLS Available: Anonymous DH Mode, OpenSSL 0.9.6 or higher required

ahhh... now that I've read the Options list I think the -u flag is what I need.

Thanks
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: Connection Timeout/Refused State

Post by Fred Kroeger »

OK - the -u flag to nrpe has worked in that if I get a socket timeout from nrpe, I now get an UNKNOWN event :-)

However, I still get a CRITICAL event for "Timeout while attempting connection".
Is there any way I can get these service events to retun an UNKNOWN/WARNING state ?
In nrpe.cfg I have:
command_tiemout=60
connection_timeout=300

and in nagios.cfg I have:
service_check_timeout=90

I can continue to ping the host so I don't get a Host down event but I do get multiple Critical service events with the "Timeout while attempting connection" message.

thanks... Fred
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Connection Timeout/Refused State

Post by mguthrie »

Check NRPE also has a -t flag that can be set for the plugin timeout, but apart from what was mentioned in the above post, I don't know of any way to change the return code from a timeout other than to modify it in the plugin source.

The other possibility would be to turn off the notification option for UNKNOWN states for these hosts/services.
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: Connection Timeout/Refused State

Post by Fred Kroeger »

Thanks
Yes I'm using the -t option but that is just the threshold for the command timeout.
Yes I have disabled notifications for UNKNOWN States but the problem is that a CRITICAL state is displayed on a timeout.

Also in a similar vein.....
I have a host that is down & displays a CRITICAL state - this is what I expect.
However, all of the services for that host also show CRITICAL .
So when the host goes down I get a notification for the host event and then one for every service event. Surely if the host can't be reached, Nagios shouldn't be running the service checks and then set the state to UNKNOWN ?

regards.. Fred
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Connection Timeout/Refused State

Post by mguthrie »

As far as the services go, to my knowledge that is correct behavior. Because services can have their own contacts separate from the host, Nagios will still notify any and all service contacts according to the notification parameters. You could probably set up service dependencies to prevent this behavior, but depending on how many hosts you wanted to do this for that could be quite a few definitions to create.
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: Connection Timeout/Refused State

Post by Fred Kroeger »

Hi - I'm a little confused now?
If a host goes down, why would Nagios be setting all it's services to Critical? Surely they should be Unknown?
Also , if a host is down, why would Nagios be checking the services anyway?
My understanding was that the service checks would be suspended until the host was online again.

Fred
User avatar
lmiltchev
Former Nagios Staff
Posts: 13587
Joined: Mon May 23, 2011 12:15 pm

Re: Connection Timeout/Refused State

Post by lmiltchev »

If you set up a parent-child relationships in Nagios, and the parent host goes down, the children would be "Unreachable", instead of "Down".
However, it doesn't work this way with services. Nagios keeps checking all of the services even if the host is "Down". It makes sense to use this kind of logic, because the host may show "Critical" but the services may be perfectly fine. Consider this scenario:
Ping response is disabled on a linux host. The host would be in a critical state. Nagios will be still checking all of the services (disk, cpu, and memory usage, open files, etc.) All checks will be in a OK state but the Ping check. I hope this makes sense.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: Connection Timeout/Refused State

Post by Fred Kroeger »

The parent/Child relationships of hosts makes sense.
But if we're using ping (check_icmp) to determine the reachability of a host then you would expect that if you can't ping the server, then it is DOWN.
In this scenario, why would you want to continue checking the services.
What is the purpose then of the check_host_freshness & host_freshness_check interval settings ? My understanding was that if the host_freshness had exceeded it's threshold, then a host_check is initaited before the service_check. To me there appears a dependancy on the host being up before the service_check runs.

So, in the setup decsribed in the previous reply, how do I avoid the 10 services I am monitoring on a host going CRITICAL and triggering all the alerting when a host goes DOWN ? I would be getting 11 SMS alerts (1 for the host & 1 for each service). I am interfacing Nagios into a ServiceDesk system, so it also means that I am creating 11 Incidents when a host goes Down.
Locked