Nagios Support Forum

Posted: **Mon Mar 15, 2021 11:57 am**

We upgraded to Nagios XI 5.8.2 on friday (from 5.7.2) and since then we're getting "Critical - Socket Timeout" errors on all checks with check_xi_service_nsclient

No other changes have been made and I have confirmed ports are still open.

An example command is:

$USER1$/check_nt -H $HOSTADDRESS$ -s "$ARG1$" -p 12489 -v $ARG2$ $ARG3$ $ARG4$

with standard arguments for password and then which check and thresholds we want. Other checks on these servers are working fine and it's just the check_xi_service_nsclient ones that are now failing.

Posted: **Mon Mar 15, 2021 5:11 pm**

If your new Nagios XI machine has a different IP address from the one your old machine had, you'll have to update the NRPE receiver host (i.e. the host being monitored) to accept traffic from the new IP address.
In NSClient++, it's in nsclient.ini:

Code: Select all

[/settings/default]
password = 1234
# v-- modify this --v
allowed hosts = 192.168.23.44

What you could do instead, is set the new machine to use the IP address of the old machine, and take the old machine offline. That way, traffic to NSClient+ will "look like" it's coming from the old server.

Here's a guide on how to set a static IP address: https://support.nagios.com/kb/article/c ... s-549.html

Posted: **Mon Mar 15, 2021 6:13 pm**

Thanks for the reply but I probably wasn't clear in my terminology. This was an update rather than "upgrade". It was installed on the command line on the same machine - no IP address or machine changes happened, as it's the same server...just a bump from 5.7.2 to 5.8.2

Posted: **Tue Mar 16, 2021 2:32 pm**

If you PM me a system profile I can diagnose further. Get one by going to Admin (top menu) => System Profile (in the left menu), then clicking the blue button.

If you're unable to generate the the profile through the web interface, please try generating it from the command line by running these commands as root:

Code: Select all

rm -rf /usr/local/nagiosxi/var/components/profile*
/usr/local/nagiosxi/scripts/components/getprofile.sh SUPPORT

Then send me the resulting /usr/local/nagiosxi/var/components/profile.zip file.
If the profile script fails, please include the ENTIRE output.

Posted: **Wed Mar 17, 2021 10:47 am**

The profile you sent me wasn't a complete profile, but that's okay, I'll just ask for what I need.

What is the command that runs to perform the check on a service where it says "Socket timeout"? Get this information by going into the Core Config Manager and finding the service, then clicking the Run Check Command button. A screenshot of that would be fine.

It should look like /usr/local/nagios/libexec/check_nt -H <IP ADDRESS> -s password1 -p 12489 -v FOO

Also, what are the outputs from the following commands?

Code: Select all

ip addr
ip route
ping -c 5 google.com

And is the computer running NSClient++ firewalled or behind a NAT?

Posted: **Wed Mar 17, 2021 11:22 am**

An example of one of the commands is:

/usr/local/nagios/libexec/check_nt -H excalibur.idir.bcgov -s "OyT43s1SNLTZJ94s" -p 12489 -v CPULOAD -l 5,80,90
CRITICAL - Socket timeout

ip addr:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:15:5d:10:00:0c brd ff:ff:ff:ff:ff:ff
inet 10.220.100.188/23 brd 10.220.101.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet6 fe80::eb4e:ef93:a92d:967/64 scope link noprefixroute
valid_lft forever preferred_lft forever

ip route:

default via 10.220.101.1 dev eth0 proto static metric 100
10.220.100.0/23 dev eth0 proto kernel scope link src 10.220.100.188 metric 100

ping -c 5 google.com:

PING google.com (142.250.69.206) 56(84) bytes of data.
64 bytes from sea30s08-in-f14.1e100.net (142.250.69.206): icmp_seq=1 ttl=118 time=6.73 ms
64 bytes from sea30s08-in-f14.1e100.net (142.250.69.206): icmp_seq=2 ttl=118 time=6.43 ms
64 bytes from sea30s08-in-f14.1e100.net (142.250.69.206): icmp_seq=3 ttl=118 time=6.30 ms
64 bytes from sea30s08-in-f14.1e100.net (142.250.69.206): icmp_seq=4 ttl=118 time=6.53 ms
64 bytes from sea30s08-in-f14.1e100.net (142.250.69.206): icmp_seq=5 ttl=118 time=6.52 ms

--- google.com ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4004ms
rtt min/avg/max/mdev = 6.307/6.506/6.731/0.138 ms

The servers we are monitoring are behind a firewall but all ports are open - there have been absolutely no changes other than the update Nagios 5.8.2 on our monitoring server. All other checks to these servers are still working but it seems to just be these specific ones that have stopped. The nsclient++ version on most of the servers we're monitoring is at 0.5.1.x. I'll also PM you the config file now in case that helps at all.

Posted: **Wed Mar 17, 2021 3:52 pm**

What is the output of this command from the Nagios XI machine?

Code: Select all

hash nmap || yum install -y nmap
nmap -Pn -p 12489 excalibur.idir.bcgov

Posted: **Wed Mar 17, 2021 4:23 pm**

dchurch wrote:What is the output of this command from the Nagios XI machine?
Code: Select all
hash nmap || yum install -y nmap
nmap -Pn -p 12489 excalibur.idir.bcgov

I was able to resolve this just now and it looks like you were on the right path. I did a temporary restore of the VM before the update to 5.8.2 and when looking, found that the update had changed the port on one of our commands from 5666 to 12489. I have no idea why, as I didn't make any changed...but after I changed it back all of the checks are functional again. Perhaps 12489 was default and we changed it to 5666 at some point and then the upgrade changed it back to default....

Anyway, this can be closed and thank you.

Posted: **Wed Mar 17, 2021 4:46 pm**

Yes, 12489 was the default port for communicating with NSClient++. Like you said, the upgrade must have touched or failed to respect your port settings for that command, likely due to a flaw in our code; upgrading is not supposed to change your custom definitions.

Glad to hear you resolved it.

Nagios Support Forum

Critical - Socket Timeout after upgrade to 5.8.2

Critical - Socket Timeout after upgrade to 5.8.2

Re: Critical - Socket Timeout after upgrade to 5.8.2

Re: Critical - Socket Timeout after upgrade to 5.8.2

Re: Critical - Socket Timeout after upgrade to 5.8.2

Re: Critical - Socket Timeout after upgrade to 5.8.2

Re: Critical - Socket Timeout after upgrade to 5.8.2

Re: Critical - Socket Timeout after upgrade to 5.8.2

Re: Critical - Socket Timeout after upgrade to 5.8.2

Re: Critical - Socket Timeout after upgrade to 5.8.2