how can solve Socket timeout problem

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
baber
Posts: 308
Joined: Wed Oct 21, 2015 4:39 am

Re: how can solve Socket timeout problem

Post by baber »

rkennedy wrote:For the record, which machine are we troubleshooting? It looks like you posted 3 separate nmap's.

i have proble with all 3 machine
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: how can solve Socket timeout problem

Post by rkennedy »

Which one have we been working through on this thread? We do like to keep topics organized, to keep things from getting off track. With that, we'll need to work through one at a time.

Which one is onlinecard_cdc1?
Former Nagios Employee
baber
Posts: 308
Joined: Wed Oct 21, 2015 4:39 am

Re: how can solve Socket timeout problem

Post by baber »

rkennedy wrote:Which one have we been working through on this thread? We do like to keep topics organized, to keep things from getting off track. With that, we'll need to work through one at a time.

Which one is onlinecard_cdc1?

i have deleted that from nagios monitoring but all of these are in same ip range and get same error
bwallace
Posts: 1145
Joined: Tue Nov 17, 2015 1:57 pm

Re: how can solve Socket timeout problem

Post by bwallace »

Code: Select all

nmap 10.4.1.144 -p 12489

Starting Nmap 5.21 ( http://nmap.org ) at 2016-08-18 00:08 IRDT
mass_dns: warning: Unable to determine any DNS servers. Reverse DNS is disabled. Try using --system-dns or specify valid servers with --dns-servers
Nmap scan report for 10.4.1.144
Host is up (0.00085s latency).
PORT      STATE    SERVICE
12489/tcp filtered unknown
... that last line - filtered means a firewall is blocking access or the port is simply closed. Since you've mentioned this occurs intermittently, could a FW be closing port 12489 periodically?
I would run a tcpdump on the XI machine the next time this happens and time it so the tcpdump will capture one of these failed checks - this will confirm the theory above at least.

====================================================

TCPDUMP

*Have the Nagios XI UI up and ready*
SSH into your Nagios XI machine and start a tcpdump using this cmd:

tcpdump -s 0 -i any -w fileName.pcap

*If you get the error: "-bash: tcpdump: command not found" then install it with this cmd:

yum install tcpdump

+ once tcpdump is running go back to the Nagios UI and reproduce the issue
+ stop the tcpdump = Ctl+c

The .pcap file will be written to whatever directory you issued the tcpdump command from. You can use something like WinSCP to retrieve the pcap file.

=====================================================
Be sure to check out the Knowledgebase for helpful articles and solutions!
baber
Posts: 308
Joined: Wed Oct 21, 2015 4:39 am

Re: how can solve Socket timeout problem

Post by baber »

bwallace wrote:

Code: Select all

nmap 10.4.1.144 -p 12489

Starting Nmap 5.21 ( http://nmap.org ) at 2016-08-18 00:08 IRDT
mass_dns: warning: Unable to determine any DNS servers. Reverse DNS is disabled. Try using --system-dns or specify valid servers with --dns-servers
Nmap scan report for 10.4.1.144
Host is up (0.00085s latency).
PORT      STATE    SERVICE
12489/tcp filtered unknown
... that last line - filtered means a firewall is blocking access or the port is simply closed. Since you've mentioned this occurs intermittently, could a FW be closing port 12489 periodically?
I would run a tcpdump on the XI machine the next time this happens and time it so the tcpdump will capture one of these failed checks - this will confirm the theory above at least.

====================================================

TCPDUMP

*Have the Nagios XI UI up and ready*
SSH into your Nagios XI machine and start a tcpdump using this cmd:

tcpdump -s 0 -i any -w fileName.pcap

*If you get the error: "-bash: tcpdump: command not found" then install it with this cmd:

yum install tcpdump

+ once tcpdump is running go back to the Nagios UI and reproduce the issue
+ stop the tcpdump = Ctl+c

The .pcap file will be written to whatever directory you issued the tcpdump command from. You can use something like WinSCP to retrieve the pcap file.

=====================================================
so thanks

but 2 question :

1 - here i show you 3 server that has same problem socket time out in sometimes but one of them nmpa output command is show 12489/tcp filtered unknown and the others not show this error why ?

2 - i don't have nagios XI UI i just can use nagios core 4.1.1 can do tcpdump on that ?

BR
bwallace
Posts: 1145
Joined: Tue Nov 17, 2015 1:57 pm

Re: how can solve Socket timeout problem

Post by bwallace »

Sorry I thought you were using XI, yes you can run a tcpdump on Core - same instructions can be used.
So far, we have confirmed the checks that are failing intermittently are check_nt where check _nrpe is problem free. At this point it is worth noting that according to the NSClient developer, check_nt is deprecated in favor of check_nrpe:
Check_nt is NOT a good protocol and is considerd abandoneware. NSClient++ supports it only for legacy reasons. There is generally no reason to use check_nt
Earlier in this thread another colleague of mentioned that something may be intermittently closing :12489, just as I mentioned in the previous post. I was pointing out the 'filtered' state of the nmap output on that one server since that is rather obvious. We could focus on that first then move on to the other two, we can't troubleshoot all three at once. But the other two may be having port 12489 closed intermittently or the servers themselves could be under high load at the time preventing them from responding within the given timeout value, hence our suggestions to increase it.
Be sure to check out the Knowledgebase for helpful articles and solutions!
baber
Posts: 308
Joined: Wed Oct 21, 2015 4:39 am

Re: how can solve Socket timeout problem

Post by baber »

bwallace wrote:Sorry I thought you were using XI, yes you can run a tcpdump on Core - same instructions can be used.
So far, we have confirmed the checks that are failing intermittently are check_nt where check _nrpe is problem free. At this point it is worth noting that according to the NSClient developer, check_nt is deprecated in favor of check_nrpe:
Check_nt is NOT a good protocol and is considerd abandoneware. NSClient++ supports it only for legacy reasons. There is generally no reason to use check_nt
Earlier in this thread another colleague of mentioned that something may be intermittently closing :12489, just as I mentioned in the previous post. I was pointing out the 'filtered' state of the nmap output on that one server since that is rather obvious. We could focus on that first then move on to the other two, we can't troubleshoot all three at once. But the other two may be having port 12489 closed intermittently or the servers themselves could be under high load at the time preventing them from responding within the given timeout value, hence our suggestions to increase it.
i have get tcpdump file and ha size is big 98Mb and when open that not readable what do i have to do ?
bwallace
Posts: 1145
Joined: Tue Nov 17, 2015 1:57 pm

Re: how can solve Socket timeout problem

Post by bwallace »

Was the issued reproduced when you ran the capture? If not, there is no need to review it.

If you saved it as a .pcap you can open it in Wireshark.
In Wireshark, filter for the destination server's IP address that was timing out.
The filter will look like this:
ip.addr==xxx.xxx.xxx.xxx

or filter for port 12489:
tcp.port==12489

Or to be more precise, combine the two filters like so:
ip.addr==xxx.xxx.xxx.xxx && tcp.port==12489

What you''ll need to do is identify the request coming from XI --> the Server. Then how long does it take for the Server to respond?
If you see a Request from XI with no response at all, that is a sure sign of a firewall dropping the packets.
Be sure to check out the Knowledgebase for helpful articles and solutions!
Locked