Sporadic Timeouts on Windows

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
cs_nagcc
Posts: 17
Joined: Fri Dec 28, 2012 7:35 am

Sporadic Timeouts on Windows

Post by cs_nagcc »

Over the past week or so, I've noticed sporadic "Socket Timeout after 10 seconds" errors on three of my Windows servers. They all run the same kind of software, but there is a fourth server also running the software that hasn't shown any issues. I also have about 20 other Windows servers being monitored that haven't had any timeout issues. When I try to run a test from the Nagios server to the Windows server, I get a response back about 1/3 of the time. The other 2/3 I receive the socket timeout error. I've tried increasing the timeout times, but it simply times out with that time limit. It almost seems like it is opening a connection to the server and leaving it open and when it attempts to run a check again, it fails. When I run a "netstat -a" on one of the Windows servers, there are a ton of ports in "TIME_WAIT" status from my Nagios server. Is there a reason? Could this be causing the problem?
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Sporadic Timeouts on Windows

Post by slansing »

Hmm, sounds like a network based issue. How many active checks are you running to these windows servers and how often? It's possible that the checks are stacking up since they are all hitting the same port. Do you know of any other services on those systems that may be trying to use port 12489 or 5666? Or whatever you changed the default check_nt and check_nrpe ports to?
cs_nagcc
Posts: 17
Joined: Fri Dec 28, 2012 7:35 am

Re: Sporadic Timeouts on Windows

Post by cs_nagcc »

There are currently 7 checks every 3 minutes. I think your theory of checks "stacking up" is exactly what is happening, but I'm just not sure why it started happening suddenly. The system has been running fine for over a year, and then I started to get these socket timeouts and I'm not sure why. No other processes are using the standard ports for Nagios, so I'm wondering if it is blocking itself.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Sporadic Timeouts on Windows

Post by lmiltchev »

Do you see anything in the nsclient.log that can shed some light on the cause of the problem?
Be sure to check out our Knowledgebase for helpful articles and solutions!
cs_nagcc
Posts: 17
Joined: Fri Dec 28, 2012 7:35 am

Re: Sporadic Timeouts on Windows

Post by cs_nagcc »

Unfortunately not. I can see when the service returns to the "OK" state, but I don't see anything when it times out. Is there a way to see more verbose logs?
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Sporadic Timeouts on Windows

Post by lmiltchev »

Under the [/settings/log] section, you can change this line:

Code: Select all

level = info
to this:

Code: Select all

level = debug
then restart the nsclient++ service.
Be sure to check out our Knowledgebase for helpful articles and solutions!
cs_nagcc
Posts: 17
Joined: Fri Dec 28, 2012 7:35 am

Re: Sporadic Timeouts on Windows

Post by cs_nagcc »

Thanks lmiltchev. I didn't see a "level" option in the log section of the ini file, but I added it in. Unfortunately it didn't seem to change anything in regards to what is logging. As a side note, I'm running an older version of the nsclient++ executable(0.3.9.327). I would like to blame the older version on this issue, but I have the same version running on my other Windows servers without issues.
User avatar
eloyd
Cool Title Here
Posts: 2190
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Sporadic Timeouts on Windows

Post by eloyd »

Are you running the Windows firewall service on these machines? If so, may be rate-limiting the number of connections allowed.
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
cs_nagcc
Posts: 17
Joined: Fri Dec 28, 2012 7:35 am

Re: Sporadic Timeouts on Windows

Post by cs_nagcc »

Nope, no firewall running on the servers.
User avatar
eloyd
Cool Title Here
Posts: 2190
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Sporadic Timeouts on Windows

Post by eloyd »

I can't say what it is but I'm pretty sure it's not Nagios. I'm guessing something on the Windows server(s) is blocking after too many open connections or too many TIME_WAIT or something similar. Are these four boxes exactly the same, used exactly the same amount, and always responding to the [approximately] exactly same number of requests, or are these three used more than the fourth?
Last edited by eloyd on Tue Jul 08, 2014 8:20 am, edited 1 time in total.
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
Locked