[solved] Socket timeout madness
-
- Posts: 7
- Joined: Sat Sep 15, 2018 4:09 pm
[solved] Socket timeout madness
Hi,
My nagios 4.2.1 throws a timeout socket on certain hosts, with nrpe and ssh checks, I put a tcp dump on nagios server and one of the affected remote servers, if nagios attempt the check no traffic is detected but if I perform the check via normal shell the tcp dump on both machices detect traffic and the check works perfectly.
This behavior is new, till today the nagios instance was working perfectly. I deleted the logs and retention files, reboot nagios container (lcx) and the remote container too, some of the containers are on another machine and some on the same phisical machine.
Please, any ideas?
Thanks in advance
Nomar
My nagios 4.2.1 throws a timeout socket on certain hosts, with nrpe and ssh checks, I put a tcp dump on nagios server and one of the affected remote servers, if nagios attempt the check no traffic is detected but if I perform the check via normal shell the tcp dump on both machices detect traffic and the check works perfectly.
This behavior is new, till today the nagios instance was working perfectly. I deleted the logs and retention files, reboot nagios container (lcx) and the remote container too, some of the containers are on another machine and some on the same phisical machine.
Please, any ideas?
Thanks in advance
Nomar
Last edited by KalimAlRazif on Wed Sep 26, 2018 3:46 pm, edited 1 time in total.
Re: Socket timeout madness
Are the configured checks configured to use the IP address or hostname of the destination? How exactly are you running the plugins and tcpdump on the command line? If the tcpdump is capturing only a specific port or IP address and the configured checks fail due to a DNS issue, this could produce the behavior you're seeing.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
-
- Posts: 7
- Joined: Sat Sep 15, 2018 4:09 pm
Re: Socket timeout madness
Hi,cdienger wrote:Are the configured checks configured to use the IP address or hostname of the destination? How exactly are you running the plugins and tcpdump on the command line? If the tcpdump is capturing only a specific port or IP address and the configured checks fail due to a DNS issue, this could produce the behavior you're seeing.
Indeed all of the affected hosts were defined by name, I did change them and now are defined by IP but no difference
Re: Socket timeout madness
And the nagios service was restarted after making these changes, correct?
What options are running with the tcpdump ? I would update it to capture port 53 traffic and also the IP addresses of the that the hosts names we pointing at.
Can you share the config files that include the check and command config ?
What options are running with the tcpdump ? I would update it to capture port 53 traffic and also the IP addresses of the that the hosts names we pointing at.
Can you share the config files that include the check and command config ?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
-
- Posts: 7
- Joined: Sat Sep 15, 2018 4:09 pm
Re: Socket timeout madness
Yes, the service was restarted.cdienger wrote:And the nagios service was restarted after making these changes, correct?
What options are running with the tcpdump ? I would update it to capture port 53 traffic and also the IP addresses of the that the hosts names we pointing at.
Can you share the config files that include the check and command config ?
on nagios host:
Code: Select all
tcpdump -n dst host remote_host_ip -vv
Code: Select all
tcpdump -n src host nagios_host_ip -vv
Code: Select all
define command {
command_name check_nrpe
command_line $USER1$/check_nrpe -H '$HOSTADDRESS$' -c '$ARG1$' -t 30:3
}
define command {
command_name check_ssh
command_line $USER1$/check_ssh $ARG1$ $HOSTADDRESS$
}
Code: Select all
define service {
service_description ssh
check_command check_ssh!
check_period 24x7
notification_period 24x7
host_name list of hosts separated by comma
servicegroups ssh
contact_groups +admins,jefecg
use generic-service
}
Code: Select all
define service {
service_description load
check_command check_nrpe!check_load
check_period 24x7
notification_period 24x7
host_name list of hosts separated by comma
contact_groups +admins,jefecg
use generic-service
}
-
- Posts: 7
- Joined: Sat Sep 15, 2018 4:09 pm
Re: Socket timeout madness
No way using only IP address on configs did not work
-
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: Socket timeout madness
@KalimAlRazif, Can you run the nmap command with the hosts IP address from the Nagios server and show me the output?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
-
- Posts: 7
- Joined: Sat Sep 15, 2018 4:09 pm
Re: Socket timeout madness
the output of the command, executed from nagios host
Code: Select all
root@nagios:~# nmap -P0 ip_of_one_of_the_failed_hosts
Starting Nmap 6.00 ( http://nmap.org ) at 2018-09-24 14:04 EDT
Nmap scan report for ipXXX.ip-XX-XX-XX.net (ip_of_one_of_the_failed_hosts)
Host is up (0.090s latency).
Not shown: 996 closed ports
PORT STATE SERVICE
22/tcp open ssh
5666/tcp open nrpe
9101/tcp open jetdirect
9103/tcp open jetdirect
Nmap done: 1 IP address (1 host up) scanned in 1.57 seconds
Code: Select all
root@nagios-new:~# nmap -P0 another_host_with_error
Starting Nmap 6.00 ( http://nmap.org ) at 2018-09-24 14:07 EDT
Nmap scan report for another_host_with_error (another_host_with_error)
Host is up (0.0094s latency).
Not shown: 996 closed ports
PORT STATE SERVICE
22/tcp open ssh
25/tcp open smtp
5666/tcp open nrpe
9102/tcp open jetdirect
Nmap done: 1 IP address (1 host up) scanned in 0.31 seconds
-
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: Socket timeout madness
@KalimAlRazif, You said
Can you actually run the command from the command line and show me the output? And then take a screenshot of the command failing in the web interface?if nagios attempts the check no traffic is detected but if I perform the check via normal shell the TCP dump on both machines detect traffic and the check works perfectly.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
-
- Posts: 7
- Joined: Sat Sep 15, 2018 4:09 pm
Re: Socket timeout madness
Sure:
For example ssh is "failing"
For example ssh is "failing"
Code: Select all
./check_ssh -H remote_ip
SSH OK - OpenSSH_7.2p2 Ubuntu-4ubuntu2.2 (protocol 2.0) | time=0.186555s;;;0.000000;10.000000