Connection Timeout/Refused State
-
- Posts: 588
- Joined: Wed Oct 19, 2011 11:36 pm
- Location: Perth, Western Australia
- Contact:
Connection Timeout/Refused State
Hi - Is there a way to change the state of Connection Timeout/Refused events?
Currently it returns a Critical state for these events - I would like it be a Warning State only.
thanks... Fred
Currently it returns a Critical state for these events - I would like it be a Warning State only.
thanks... Fred
Last edited by Fred Kroeger on Wed Aug 01, 2012 2:10 am, edited 1 time in total.
Re: Connection Timeout/Refused State
What is the check (plugin) that is returning "Critical"?
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
- Posts: 588
- Joined: Wed Oct 19, 2011 11:36 pm
- Location: Perth, Western Australia
- Contact:
Re: Connection Timeout/Refused State
check_nrpe - it's the standard plugin installed with NagiosXI 2011R1.8
# ./check_nrpe -h
NRPE Plugin for Nagios
Copyright (c) 1999-2008 Ethan Galstad (nagios@nagios.org)
Version: 2.12
Last Modified: 03-10-2008
License: GPL v2 with exemptions (-l for more info)
SSL/TLS Available: Anonymous DH Mode, OpenSSL 0.9.6 or higher required
ahhh... now that I've read the Options list I think the -u flag is what I need.
Thanks
# ./check_nrpe -h
NRPE Plugin for Nagios
Copyright (c) 1999-2008 Ethan Galstad (nagios@nagios.org)
Version: 2.12
Last Modified: 03-10-2008
License: GPL v2 with exemptions (-l for more info)
SSL/TLS Available: Anonymous DH Mode, OpenSSL 0.9.6 or higher required
ahhh... now that I've read the Options list I think the -u flag is what I need.
Thanks
-
- Posts: 588
- Joined: Wed Oct 19, 2011 11:36 pm
- Location: Perth, Western Australia
- Contact:
Re: Connection Timeout/Refused State
OK - the -u flag to nrpe has worked in that if I get a socket timeout from nrpe, I now get an UNKNOWN event
However, I still get a CRITICAL event for "Timeout while attempting connection".
Is there any way I can get these service events to retun an UNKNOWN/WARNING state ?
In nrpe.cfg I have:
command_tiemout=60
connection_timeout=300
and in nagios.cfg I have:
service_check_timeout=90
I can continue to ping the host so I don't get a Host down event but I do get multiple Critical service events with the "Timeout while attempting connection" message.
thanks... Fred
However, I still get a CRITICAL event for "Timeout while attempting connection".
Is there any way I can get these service events to retun an UNKNOWN/WARNING state ?
In nrpe.cfg I have:
command_tiemout=60
connection_timeout=300
and in nagios.cfg I have:
service_check_timeout=90
I can continue to ping the host so I don't get a Host down event but I do get multiple Critical service events with the "Timeout while attempting connection" message.
thanks... Fred
Re: Connection Timeout/Refused State
Check NRPE also has a -t flag that can be set for the plugin timeout, but apart from what was mentioned in the above post, I don't know of any way to change the return code from a timeout other than to modify it in the plugin source.
The other possibility would be to turn off the notification option for UNKNOWN states for these hosts/services.
The other possibility would be to turn off the notification option for UNKNOWN states for these hosts/services.
-
- Posts: 588
- Joined: Wed Oct 19, 2011 11:36 pm
- Location: Perth, Western Australia
- Contact:
Re: Connection Timeout/Refused State
Thanks
Yes I'm using the -t option but that is just the threshold for the command timeout.
Yes I have disabled notifications for UNKNOWN States but the problem is that a CRITICAL state is displayed on a timeout.
Also in a similar vein.....
I have a host that is down & displays a CRITICAL state - this is what I expect.
However, all of the services for that host also show CRITICAL .
So when the host goes down I get a notification for the host event and then one for every service event. Surely if the host can't be reached, Nagios shouldn't be running the service checks and then set the state to UNKNOWN ?
regards.. Fred
Yes I'm using the -t option but that is just the threshold for the command timeout.
Yes I have disabled notifications for UNKNOWN States but the problem is that a CRITICAL state is displayed on a timeout.
Also in a similar vein.....
I have a host that is down & displays a CRITICAL state - this is what I expect.
However, all of the services for that host also show CRITICAL .
So when the host goes down I get a notification for the host event and then one for every service event. Surely if the host can't be reached, Nagios shouldn't be running the service checks and then set the state to UNKNOWN ?
regards.. Fred
Re: Connection Timeout/Refused State
As far as the services go, to my knowledge that is correct behavior. Because services can have their own contacts separate from the host, Nagios will still notify any and all service contacts according to the notification parameters. You could probably set up service dependencies to prevent this behavior, but depending on how many hosts you wanted to do this for that could be quite a few definitions to create.
-
- Posts: 588
- Joined: Wed Oct 19, 2011 11:36 pm
- Location: Perth, Western Australia
- Contact:
Re: Connection Timeout/Refused State
Hi - I'm a little confused now?
If a host goes down, why would Nagios be setting all it's services to Critical? Surely they should be Unknown?
Also , if a host is down, why would Nagios be checking the services anyway?
My understanding was that the service checks would be suspended until the host was online again.
Fred
If a host goes down, why would Nagios be setting all it's services to Critical? Surely they should be Unknown?
Also , if a host is down, why would Nagios be checking the services anyway?
My understanding was that the service checks would be suspended until the host was online again.
Fred
Re: Connection Timeout/Refused State
If you set up a parent-child relationships in Nagios, and the parent host goes down, the children would be "Unreachable", instead of "Down".
However, it doesn't work this way with services. Nagios keeps checking all of the services even if the host is "Down". It makes sense to use this kind of logic, because the host may show "Critical" but the services may be perfectly fine. Consider this scenario:
Ping response is disabled on a linux host. The host would be in a critical state. Nagios will be still checking all of the services (disk, cpu, and memory usage, open files, etc.) All checks will be in a OK state but the Ping check. I hope this makes sense.
However, it doesn't work this way with services. Nagios keeps checking all of the services even if the host is "Down". It makes sense to use this kind of logic, because the host may show "Critical" but the services may be perfectly fine. Consider this scenario:
Ping response is disabled on a linux host. The host would be in a critical state. Nagios will be still checking all of the services (disk, cpu, and memory usage, open files, etc.) All checks will be in a OK state but the Ping check. I hope this makes sense.
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
- Posts: 588
- Joined: Wed Oct 19, 2011 11:36 pm
- Location: Perth, Western Australia
- Contact:
Re: Connection Timeout/Refused State
The parent/Child relationships of hosts makes sense.
But if we're using ping (check_icmp) to determine the reachability of a host then you would expect that if you can't ping the server, then it is DOWN.
In this scenario, why would you want to continue checking the services.
What is the purpose then of the check_host_freshness & host_freshness_check interval settings ? My understanding was that if the host_freshness had exceeded it's threshold, then a host_check is initaited before the service_check. To me there appears a dependancy on the host being up before the service_check runs.
So, in the setup decsribed in the previous reply, how do I avoid the 10 services I am monitoring on a host going CRITICAL and triggering all the alerting when a host goes DOWN ? I would be getting 11 SMS alerts (1 for the host & 1 for each service). I am interfacing Nagios into a ServiceDesk system, so it also means that I am creating 11 Incidents when a host goes Down.
But if we're using ping (check_icmp) to determine the reachability of a host then you would expect that if you can't ping the server, then it is DOWN.
In this scenario, why would you want to continue checking the services.
What is the purpose then of the check_host_freshness & host_freshness_check interval settings ? My understanding was that if the host_freshness had exceeded it's threshold, then a host_check is initaited before the service_check. To me there appears a dependancy on the host being up before the service_check runs.
So, in the setup decsribed in the previous reply, how do I avoid the 10 services I am monitoring on a host going CRITICAL and triggering all the alerting when a host goes DOWN ? I would be getting 11 SMS alerts (1 for the host & 1 for each service). I am interfacing Nagios into a ServiceDesk system, so it also means that I am creating 11 Incidents when a host goes Down.