Page 2 of 2

Re: how can solve Socket timeout problem

Posted: Wed Aug 17, 2016 2:49 pm
by baber
rkennedy wrote:For the record, which machine are we troubleshooting? It looks like you posted 3 separate nmap's.

i have proble with all 3 machine

Re: how can solve Socket timeout problem

Posted: Wed Aug 17, 2016 2:54 pm
by rkennedy
Which one have we been working through on this thread? We do like to keep topics organized, to keep things from getting off track. With that, we'll need to work through one at a time.

Which one is onlinecard_cdc1?

Re: how can solve Socket timeout problem

Posted: Wed Aug 17, 2016 3:00 pm
by baber
rkennedy wrote:Which one have we been working through on this thread? We do like to keep topics organized, to keep things from getting off track. With that, we'll need to work through one at a time.

Which one is onlinecard_cdc1?

i have deleted that from nagios monitoring but all of these are in same ip range and get same error

Re: how can solve Socket timeout problem

Posted: Wed Aug 17, 2016 4:16 pm
by bwallace

Code: Select all

nmap 10.4.1.144 -p 12489

Starting Nmap 5.21 ( http://nmap.org ) at 2016-08-18 00:08 IRDT
mass_dns: warning: Unable to determine any DNS servers. Reverse DNS is disabled. Try using --system-dns or specify valid servers with --dns-servers
Nmap scan report for 10.4.1.144
Host is up (0.00085s latency).
PORT      STATE    SERVICE
12489/tcp filtered unknown
... that last line - filtered means a firewall is blocking access or the port is simply closed. Since you've mentioned this occurs intermittently, could a FW be closing port 12489 periodically?
I would run a tcpdump on the XI machine the next time this happens and time it so the tcpdump will capture one of these failed checks - this will confirm the theory above at least.

====================================================

TCPDUMP

*Have the Nagios XI UI up and ready*
SSH into your Nagios XI machine and start a tcpdump using this cmd:

tcpdump -s 0 -i any -w fileName.pcap

*If you get the error: "-bash: tcpdump: command not found" then install it with this cmd:

yum install tcpdump

+ once tcpdump is running go back to the Nagios UI and reproduce the issue
+ stop the tcpdump = Ctl+c

The .pcap file will be written to whatever directory you issued the tcpdump command from. You can use something like WinSCP to retrieve the pcap file.

=====================================================

Re: how can solve Socket timeout problem

Posted: Wed Aug 17, 2016 11:25 pm
by baber
bwallace wrote:

Code: Select all

nmap 10.4.1.144 -p 12489

Starting Nmap 5.21 ( http://nmap.org ) at 2016-08-18 00:08 IRDT
mass_dns: warning: Unable to determine any DNS servers. Reverse DNS is disabled. Try using --system-dns or specify valid servers with --dns-servers
Nmap scan report for 10.4.1.144
Host is up (0.00085s latency).
PORT      STATE    SERVICE
12489/tcp filtered unknown
... that last line - filtered means a firewall is blocking access or the port is simply closed. Since you've mentioned this occurs intermittently, could a FW be closing port 12489 periodically?
I would run a tcpdump on the XI machine the next time this happens and time it so the tcpdump will capture one of these failed checks - this will confirm the theory above at least.

====================================================

TCPDUMP

*Have the Nagios XI UI up and ready*
SSH into your Nagios XI machine and start a tcpdump using this cmd:

tcpdump -s 0 -i any -w fileName.pcap

*If you get the error: "-bash: tcpdump: command not found" then install it with this cmd:

yum install tcpdump

+ once tcpdump is running go back to the Nagios UI and reproduce the issue
+ stop the tcpdump = Ctl+c

The .pcap file will be written to whatever directory you issued the tcpdump command from. You can use something like WinSCP to retrieve the pcap file.

=====================================================
so thanks

but 2 question :

1 - here i show you 3 server that has same problem socket time out in sometimes but one of them nmpa output command is show 12489/tcp filtered unknown and the others not show this error why ?

2 - i don't have nagios XI UI i just can use nagios core 4.1.1 can do tcpdump on that ?

BR

Re: how can solve Socket timeout problem

Posted: Thu Aug 18, 2016 9:45 am
by bwallace
Sorry I thought you were using XI, yes you can run a tcpdump on Core - same instructions can be used.
So far, we have confirmed the checks that are failing intermittently are check_nt where check _nrpe is problem free. At this point it is worth noting that according to the NSClient developer, check_nt is deprecated in favor of check_nrpe:
Check_nt is NOT a good protocol and is considerd abandoneware. NSClient++ supports it only for legacy reasons. There is generally no reason to use check_nt
Earlier in this thread another colleague of mentioned that something may be intermittently closing :12489, just as I mentioned in the previous post. I was pointing out the 'filtered' state of the nmap output on that one server since that is rather obvious. We could focus on that first then move on to the other two, we can't troubleshoot all three at once. But the other two may be having port 12489 closed intermittently or the servers themselves could be under high load at the time preventing them from responding within the given timeout value, hence our suggestions to increase it.

Re: how can solve Socket timeout problem

Posted: Fri Aug 19, 2016 4:02 am
by baber
bwallace wrote:Sorry I thought you were using XI, yes you can run a tcpdump on Core - same instructions can be used.
So far, we have confirmed the checks that are failing intermittently are check_nt where check _nrpe is problem free. At this point it is worth noting that according to the NSClient developer, check_nt is deprecated in favor of check_nrpe:
Check_nt is NOT a good protocol and is considerd abandoneware. NSClient++ supports it only for legacy reasons. There is generally no reason to use check_nt
Earlier in this thread another colleague of mentioned that something may be intermittently closing :12489, just as I mentioned in the previous post. I was pointing out the 'filtered' state of the nmap output on that one server since that is rather obvious. We could focus on that first then move on to the other two, we can't troubleshoot all three at once. But the other two may be having port 12489 closed intermittently or the servers themselves could be under high load at the time preventing them from responding within the given timeout value, hence our suggestions to increase it.
i have get tcpdump file and ha size is big 98Mb and when open that not readable what do i have to do ?

Re: how can solve Socket timeout problem

Posted: Fri Aug 19, 2016 10:28 am
by bwallace
Was the issued reproduced when you ran the capture? If not, there is no need to review it.

If you saved it as a .pcap you can open it in Wireshark.
In Wireshark, filter for the destination server's IP address that was timing out.
The filter will look like this:
ip.addr==xxx.xxx.xxx.xxx

or filter for port 12489:
tcp.port==12489

Or to be more precise, combine the two filters like so:
ip.addr==xxx.xxx.xxx.xxx && tcp.port==12489

What you''ll need to do is identify the request coming from XI --> the Server. Then how long does it take for the Server to respond?
If you see a Request from XI with no response at all, that is a sure sign of a firewall dropping the packets.