Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Hi,
My nagios 4.2.1 throws a timeout socket on certain hosts, with nrpe and ssh checks, I put a tcp dump on nagios server and one of the affected remote servers, if nagios attempt the check no traffic is detected but if I perform the check via normal shell the tcp dump on both machices detect traffic and the check works perfectly.
This behavior is new, till today the nagios instance was working perfectly. I deleted the logs and retention files, reboot nagios container (lcx) and the remote container too, some of the containers are on another machine and some on the same phisical machine.
Please, any ideas?
Thanks in advance
Nomar
Last edited by KalimAlRazif on Wed Sep 26, 2018 3:46 pm, edited 1 time in total.
Are the configured checks configured to use the IP address or hostname of the destination? How exactly are you running the plugins and tcpdump on the command line? If the tcpdump is capturing only a specific port or IP address and the configured checks fail due to a DNS issue, this could produce the behavior you're seeing.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
cdienger wrote:Are the configured checks configured to use the IP address or hostname of the destination? How exactly are you running the plugins and tcpdump on the command line? If the tcpdump is capturing only a specific port or IP address and the configured checks fail due to a DNS issue, this could produce the behavior you're seeing.
Hi,
Indeed all of the affected hosts were defined by name, I did change them and now are defined by IP but no difference
And the nagios service was restarted after making these changes, correct?
What options are running with the tcpdump ? I would update it to capture port 53 traffic and also the IP addresses of the that the hosts names we pointing at.
Can you share the config files that include the check and command config ?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
cdienger wrote:And the nagios service was restarted after making these changes, correct?
What options are running with the tcpdump ? I would update it to capture port 53 traffic and also the IP addresses of the that the hosts names we pointing at.
Can you share the config files that include the check and command config ?
define service {
service_description ssh
check_command check_ssh!
check_period 24x7
notification_period 24x7
host_name list of hosts separated by comma
servicegroups ssh
contact_groups +admins,jefecg
use generic-service
}
define service {
service_description load
check_command check_nrpe!check_load
check_period 24x7
notification_period 24x7
host_name list of hosts separated by comma
contact_groups +admins,jefecg
use generic-service
}
But let me do some changes on host names, the address are ip address, but host name still are the fqdn of host.
root@nagios:~# nmap -P0 ip_of_one_of_the_failed_hosts
Starting Nmap 6.00 ( http://nmap.org ) at 2018-09-24 14:04 EDT
Nmap scan report for ipXXX.ip-XX-XX-XX.net (ip_of_one_of_the_failed_hosts)
Host is up (0.090s latency).
Not shown: 996 closed ports
PORT STATE SERVICE
22/tcp open ssh
5666/tcp open nrpe
9101/tcp open jetdirect
9103/tcp open jetdirect
Nmap done: 1 IP address (1 host up) scanned in 1.57 seconds
root@nagios-new:~# nmap -P0 another_host_with_error
Starting Nmap 6.00 ( http://nmap.org ) at 2018-09-24 14:07 EDT
Nmap scan report for another_host_with_error (another_host_with_error)
Host is up (0.0094s latency).
Not shown: 996 closed ports
PORT STATE SERVICE
22/tcp open ssh
25/tcp open smtp
5666/tcp open nrpe
9102/tcp open jetdirect
Nmap done: 1 IP address (1 host up) scanned in 0.31 seconds
if nagios attempts the check no traffic is detected but if I perform the check via normal shell the TCP dump on both machines detect traffic and the check works perfectly.
Can you actually run the command from the command line and show me the output? And then take a screenshot of the command failing in the web interface?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.