Page 1 of 1

Socket Timeouts

Posted: Mon Aug 09, 2010 11:55 am
by natedmac
Hello,

We currently have our nagios server in office and for a majority of the time checks are fine, however during some our busier times we tend to get socket timeouts, as seen below. Is there a way to go in globally and set this up to say 20/30 seconds to allow some breathing room as it is traversing our QMOE link to our datacenter.

Code: Select all

Notification Type: PROBLEM

Service: Memory Usage
Host: Voyager.xxx.net
Address: 172.27.3.100
State: CRITICAL
Info:
CRITICAL - Socket timeout after 10 seconds
Date/Time: 08/07/2010 16:01:58

Re: Socket Timeouts

Posted: Mon Aug 09, 2010 1:29 pm
by mmestnik
This depends on the check command, your posting is a bit ambiguous in that it can't guess what operating system Voyager.xxx.net is running.

This setting is most likely in the Core Config Manager.

Re: Socket Timeouts

Posted: Mon Aug 09, 2010 3:04 pm
by natedmac
All windows systems, using the NSClient+ getting several of these from Disk space, to memory to CPU usage. I will see if i can find something in the core config manager to adjust the socket timeout value.

Also can it be done globally so that all checks are using a longer timeout.

Re: Socket Timeouts

Posted: Tue Aug 10, 2010 10:16 am
by mmestnik
There are too many ways to do it. At least a handful would be global, the most sensible way that would be global is to add the parameter to the check command object definition. Look-up the service in question, an example, and read it's check command. Then look-up this command(navigating away from services) definition. You will be adding some text, like " -t 45" to the command syntax... For example after "$ARG1$". Keep in mind any command that runs for more then 60 seconds is killed off with no chance to intervene or return data, so don't make this much bigger then 45 if it's bigger then 20 at all. The application needs to to start load and open the socket(perhaps 3 seconds) then after the time out it'l need time to close out of things (2 seconds), plus this is wall time and not CPU time so on a busy system 5 seconds can seem like 15(when there are 2 other apps all taking 5 seconds).

I'm not familiar with NSClient++'s check command. I can say that the term socket time out doesn't sound like it applies, connection time out would sound more accurate... or time out waiting for reply from peer. "Connection reset by peer" is also a vary common error, though I digress.

Suffice to say I wouldn't expect you to find a setting labeled "socket timeout value", so be looking for the same thing by another name. I did find reference to this, it's a Java programming construct... So we are only drifting further away from the sea(s) I've sailed.

Re: Socket Timeouts

Posted: Wed Aug 11, 2010 10:29 am
by natedmac
If that is the case could this be related to the client itself. If so is there a better client that you can recommend for the windows hosts.

Re: Socket Timeouts

Posted: Wed Aug 11, 2010 11:55 am
by mmestnik
I'm just not a Windows user so I can't advise on how to make the best use of it.

We don't have a client(agent is a better term) for Windows, even though many hundreds of Nagios installs monitor Windows boxes. For a check command(known as a plugin) you can use check_nrpe with NSClient++, though I can't say you'd have a better experience.

What I would do is learn more about the check command you are using and find it's parameters for time out value. Changing check commands now you only mean that you would have to learn more about the check command...