Socket Timeouts

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
natedmac
Posts: 20
Joined: Mon Apr 26, 2010 4:52 pm

Socket Timeouts

Post by natedmac »

Hello,

We currently have our nagios server in office and for a majority of the time checks are fine, however during some our busier times we tend to get socket timeouts, as seen below. Is there a way to go in globally and set this up to say 20/30 seconds to allow some breathing room as it is traversing our QMOE link to our datacenter.

Code: Select all

Notification Type: PROBLEM

Service: Memory Usage
Host: Voyager.xxx.net
Address: 172.27.3.100
State: CRITICAL
Info:
CRITICAL - Socket timeout after 10 seconds
Date/Time: 08/07/2010 16:01:58
mmestnik
Posts: 972
Joined: Mon Feb 15, 2010 2:23 pm

Re: Socket Timeouts

Post by mmestnik »

This depends on the check command, your posting is a bit ambiguous in that it can't guess what operating system Voyager.xxx.net is running.

This setting is most likely in the Core Config Manager.
natedmac
Posts: 20
Joined: Mon Apr 26, 2010 4:52 pm

Re: Socket Timeouts

Post by natedmac »

All windows systems, using the NSClient+ getting several of these from Disk space, to memory to CPU usage. I will see if i can find something in the core config manager to adjust the socket timeout value.

Also can it be done globally so that all checks are using a longer timeout.
mmestnik
Posts: 972
Joined: Mon Feb 15, 2010 2:23 pm

Re: Socket Timeouts

Post by mmestnik »

There are too many ways to do it. At least a handful would be global, the most sensible way that would be global is to add the parameter to the check command object definition. Look-up the service in question, an example, and read it's check command. Then look-up this command(navigating away from services) definition. You will be adding some text, like " -t 45" to the command syntax... For example after "$ARG1$". Keep in mind any command that runs for more then 60 seconds is killed off with no chance to intervene or return data, so don't make this much bigger then 45 if it's bigger then 20 at all. The application needs to to start load and open the socket(perhaps 3 seconds) then after the time out it'l need time to close out of things (2 seconds), plus this is wall time and not CPU time so on a busy system 5 seconds can seem like 15(when there are 2 other apps all taking 5 seconds).

I'm not familiar with NSClient++'s check command. I can say that the term socket time out doesn't sound like it applies, connection time out would sound more accurate... or time out waiting for reply from peer. "Connection reset by peer" is also a vary common error, though I digress.

Suffice to say I wouldn't expect you to find a setting labeled "socket timeout value", so be looking for the same thing by another name. I did find reference to this, it's a Java programming construct... So we are only drifting further away from the sea(s) I've sailed.
natedmac
Posts: 20
Joined: Mon Apr 26, 2010 4:52 pm

Re: Socket Timeouts

Post by natedmac »

If that is the case could this be related to the client itself. If so is there a better client that you can recommend for the windows hosts.
mmestnik
Posts: 972
Joined: Mon Feb 15, 2010 2:23 pm

Re: Socket Timeouts

Post by mmestnik »

I'm just not a Windows user so I can't advise on how to make the best use of it.

We don't have a client(agent is a better term) for Windows, even though many hundreds of Nagios installs monitor Windows boxes. For a check command(known as a plugin) you can use check_nrpe with NSClient++, though I can't say you'd have a better experience.

What I would do is learn more about the check command you are using and find it's parameters for time out value. Changing check commands now you only mean that you would have to learn more about the check command...
Locked