Nagios Support Forum

Posted: **Sun Oct 20, 2013 1:54 pm**

Hi,

We migrated to new 2012R2.2 based VMs approximately 5 months ago and We get far more false alarms with this version than the previous version.

We predominantly monitor windows servers using the NSClient++ along with some ESXi servers, and Cisco Switch monitoring with SNMP polling. We only seem to get false alarms with NSClint++ configured services.

an example of a false alarm would be:

Code: Select all

Nagios has detected a problem with this service.

Notification Type: PROBLEM

Service: Uptime
Host: psm4syslog1.fnz.com
Address: 192.168.227.47
State: CRITICAL
Info:
CRITICAL - Socket timeout after 10 seconds
Date/Time: 2013-10-20 21:35:05

The version of NSClient++ we are running is 0.3.9.328 (x64)

Perhaps we need to upgrade the client?

Posted: **Mon Oct 21, 2013 10:17 am**

You should not have to update the client, for some reason the check is timing out. Does this service constantly time out now? Or does it return a valid check at some times?

If it always times out, I'd recommend adding a longer timeout range for the check to start with, and manually run it from the command line like so:

Code: Select all

/usr/local/nagios/libexec/check_nt -H windows.ip.addr -p 12489 -t 30 -v UPTIME

Note: "-t 30" is adding a timeout of 30 seconds.

Posted: **Tue Oct 22, 2013 5:22 pm**

Hi,

it seems to flap. but it is across the board.

I will try them command as you say and let you know how we get on.

Cheers,

C.

Posted: **Tue Oct 22, 2013 5:41 pm**

Hi,

So that was interesting. I checked our recent emails for a server that has been flapping with this error and targeted it and got the following response

Code: Select all

[root@psu4nagiosxi libexec]# ./check_nt -H 10.139.1.25 -p 12489 -t 30 -v UPTIME
CRITICAL - Socket timeout after 30 seconds

yet if i tried another server i got an answer

Code: Select all

[root@psu4nagiosxi libexec]# ./check_nt -H 10.137.1.21 -s <REDACTED> -p 12489 -t 30 -v UPTIME
System Uptime - 183 day(s) 21 hour(s) 23 minute(s)
[root@psu4nagiosxi libexec]#

if i tried the same syntax with the original server, i still got the same timeout error. I think I will need to take this to our network guys as it looks like Nagios is doing everything as it should be doing.

Cheers,

C.

Posted: **Wed Oct 23, 2013 10:01 am**

KiwiBloke wrote:CRITICAL - Socket timeout after 30 seconds

I would guess it is one of the following issues:
1. Firewall issues
2. NSClient service not running
3. Incorrect password
4. Nagios server IP not declared in allowed hosts

Best of luck!

Posted: **Mon Oct 28, 2013 4:18 pm**

Hi,

Minor breakthough.

We think its the version of NSClient++ we are running.

A colleague hga reason to run netstat on one of our monitored hosts (netstat -anb) and discovered over 20k TIME_WAIT connections on TCP 12489 to our nagios server..

I have repeated this on several other servers and found the same thing.

A google search found this: http://www.nsclient.org/nscp/discussion/topic/1142 which then lead to this: http://support.microsoft.com/kb/2553549

Most of our monitored hosts are Windows 2008 and we have been unable to get a change window for security patching for a long time (easily over 300 days) , so it seems all the servers that we have been having issues with havent been rebooted as part of any other work and so are hitting this issue.

I will need to take this up with my colleagues and press for that window!

Cheers,

C.

key words for other users with the same problem
socket timeout nsclient time_wait connections uptime

Posted: **Mon Oct 28, 2013 4:28 pm**

Sounds good, let us know how it goes. This sounds like it very well may be the resolution to this particular problem.

Posted: **Wed Nov 06, 2013 10:40 pm**

Hi,

I had a bunch of low risk, non platform servers that were showing the same issue and got approval to apply the hotfix and reboot.

I'm not able to say whether this has fixed the issue yet as we will need to wait another 300 days

but so far at least i have not seen any connections with status = TIME_WAIT backing up from requests from the nagiosxi server and the logs files are under control.

Cheers

Posted: **Thu Nov 07, 2013 10:23 am**

Well.....alrighty! Let us know! You can take a look at NCPA in the interim! http://assets.nagios.com/downloads/ncpa ... g_NCPA.pdf

Nagios Support Forum

more frequent false alarms in 2012R2.2

more frequent false alarms in 2012R2.2

Re: more frequent false alarms in 2012R2.2

Re: more frequent false alarms in 2012R2.2

Re: more frequent false alarms in 2012R2.2

Re: more frequent false alarms in 2012R2.2

Re: more frequent false alarms in 2012R2.2

Re: more frequent false alarms in 2012R2.2

Re: more frequent false alarms in 2012R2.2

Re: more frequent false alarms in 2012R2.2