lmiltchev wrote:I'll reply back when I have a chance to do that.
Let us know how things are working with the older version of NSClient++ whenever you have a chance.
Okay, I'll provide my steps and the results, so stay with me on this.
1). We are running Nagios XI 2014R2.6 on the Nagios XI monitoring server (I know that 2014R2.7 has recently been released).
2). The server that was reporting an issue is a physical server running Windows 2008 R2 64 bit.
3). I have uninstalled the NSClient++ 0.4.1.105-x64 (I have this client installed on a few other Windows servers and they have no issue) on the "problem" server, and then installed NSClient++ 0.3.9-x64 (the client that is linked within Nagios XI and the Windows Server and is installed on the majority of our Windows 2003 and 2008 servers).
4). The important NSClient++ "NSC.INI" file entries are:
[modules]
NSClientListener.dll
CheckWMI.dll
FileLogger.dll
CheckSystem.dll
CheckDisk.dll
CheckEventLog.dll
CheckHelpers.dll
use_file=1
allowed_hosts=<IP Address of Nagios XI server>
password=<correct password>
*POSSIBLE ISSUE HERE* This is the default and is configured on all the other Windows servers, in the same VLAN, which are not experiencing issues
;# NSCLIENT PORT NUMBER
; This is the port the NSClientListener.dll will listen to.
;port=12489
[External Alias]
alias_cpu=checkCPU
warn=80 crit=90 time=5m time=1m time=30s <---- This concerns me since I configure the Nagios XI Windows Server Wizard with warn at 90 and crit at 95. This is on all
the other Windows servers, too.
alias_cpu_ex=checkCPU warn=$ARG1$ crit=$ARG2$ time=5m time=1m time=30s
alias_mem=checkMem MaxWarn=80% MaxCrit=90% ShowAll=long type=physical type=virtual type=paged type=page <---- This concerns me since I configure the Nagios XI Windows Server Wizard with
warn at 90 and crit at 95. This is on all the other Windows servers, too.
alias_up=checkUpTime MinWarn=1d MinWarn=1h
[NRPE Client Handlers]
check_other=-H <some default IP address> -p 5666 -c remote_command -a arguments
5). The following are the Nagios XI Windows Server Wizard checks for the "problem server" (and are manually setup the same way with all the other Windows servers):
check_xi_service_nsclient!nagi0sadm1n!UPTIME
check_xi_service_nsclient!nagi0sadm1n!CPULOAD!-l 5,90,95
check_xi_service_nsclient!nagi0sadm1n!USEDDISKSPACE!-l C -w 90 -c 95
***The next check identifies a difference between the default client check and the Wizard check that I configure***
check_xi_service_nsclient!nagi0sadm1n!MEMUSE!-w 90 -c 95
6). I did a little research and came across the following:
a). "
http://geekpeek.net/socket-timeout-afte ... ds-nagios/"
b). "
http://support.nagios.com/forum/viewtop ... =7&t=24924"
7). From the Nagios XI server console, I ran NMAP against the "problem" server:
a).
Starting Nmap 5.51 (
http://nmap.org ) at 2015-05-05 11:03 CDT
Nmap scan report for <FQDN of server> (<IP Address of server>)
Host is up (0.00068s latency).
Not shown: 986 closed ports
PORT STATE SERVICE
80/tcp open http
135/tcp open msrpc
139/tcp open netbios-ssn
445/tcp open microsoft-ds
1025/tcp open NFS-or-IIS
1026/tcp open LSA-or-nterm
1027/tcp open IIS
1028/tcp open unknown
2301/tcp open compaqdiag
2381/tcp open compaq-https
3389/tcp open ms-term-serv
8400/tcp open cvd
8402/tcp open abarsd
8600/tcp open asterix
b). Then I ran the modified version: nmap -p 5666,12489:
PORT STATE SERVICE
5666/tcp closed nrpe
12489/tcp open unknown
8). I modified the following in the NSC.INI on the "problem" server, then restarted the NSClient++ service:
a). Uncommented the port line:
;# NSCLIENT PORT NUMBER
; This is the port the NSClientListener.dll will listen to.
port=12489
b). Modified the warning and critical levels:
[External Alias]
alias_cpu=checkCPU
warn=90 crit=95 time=5m time=1m time=30s
alias_mem=checkMem
MaxWarn=90% MaxCrit=95% ShowAll=long type=physical type=virtual type=paged type=page
9). I have verified that the "problem" server's Windows Firewall was turned off, but I noted that the NSClient++ was not listed as an exception (like I noted in a couple of other Windows servers), just in case.
10). I have deleted the previous "problem" server's Services and Host, then applied the configuration, then added the "problem" server back in to Nagios XI using the Windows Server Wizard, and have been monitoring the "problem" server for a couple of hours.
11). The two Services that pose a CRITICAL - Socket timeout after 10 seconds are "Uptime", "IIS Web Server", "Memory Usage", and "Drive C: Disk Usage". The "Uptime" service has been extremely flaky/been flapping, but appears to clear up after a scheduled forced immediate check.
12). System Uptime reports 569 days!! How can I clear this - a server reboot??