Page 2 of 2
Re: occasional "socket timeout after 10 seconds"
Posted: Thu Aug 24, 2017 4:46 pm
by scottwilkerson
lpereira wrote:I have a few servers being monitored on remote locations with a ping TTL flapping between 715 and 1000 ms. The regular TTL is about 150 to 350ms. should that be the cause? is there a way to fix that if is the case?
It certainly could be, especially if it is accompanied by packet loss. The only way to fix it would be to somehow improve the connection, or you may decide for these to both increase the timeout in hopes that will allow a connection to establish and also you may want to set the max check attempts higher on these to prevent too many alerts if this is common in your environment.
Re: occasional "socket timeout after 10 seconds"
Posted: Fri Aug 25, 2017 7:43 am
by lpereira
Attached are some screenshot from the latest alert i got today. As you can see i have 2 services in critical, the rest of the services are in green status, this not happen with all the services at the same time, except of course when the server goes down. ping is fine, the ttl is normal. And when i do a recheck the service back to green again.
so the issue seemt to be elsewere. One cause might be the TTL with my boxes on remote location, but this is not the case.
Re: occasional "socket timeout after 10 seconds"
Posted: Fri Aug 25, 2017 9:02 am
by scottwilkerson
It also depends on the command you are running for the services that are timing out. Can you run these from the command line?
How long do they take to execute?
Re: occasional "socket timeout after 10 seconds"
Posted: Fri Aug 25, 2017 9:24 am
by lpereira
i can run it from CLI and the response is immediate. it's a simple service check using check_nt.
Code: Select all
[root@nagiosxi libexec]# ./check_nt -H AGDARMSD01 -s "" -p 12489 -v SERVICESTATE -l 'NetBackup Legacy Client Service' -d SHOWALL
NetBackup Legacy Client Service: Started
Re: occasional "socket timeout after 10 seconds"
Posted: Fri Aug 25, 2017 9:49 am
by bolson
Did you run this from the XI command line:
The idea is to see if the host in question has an intermittently slow connection or packet loss corresponding to the timeouts.
Let the ping command run for hours and then attach ping.txt
Re: occasional "socket timeout after 10 seconds"
Posted: Fri Aug 25, 2017 1:37 pm
by lpereira
bolson wrote:Did you run this from the XI command line:
The idea is to see if the host in question has an intermittently slow connection or packet loss corresponding to the timeouts.
Let the ping command run for hours and then attach ping.txt
i have sent you a PM for the ping test. it was running for 3 hours. no packet loss
Re: occasional "socket timeout after 10 seconds"
Posted: Fri Aug 25, 2017 1:54 pm
by bolson
Any timeouts for this host during this time period?
Re: occasional "socket timeout after 10 seconds"
Posted: Fri Aug 25, 2017 2:21 pm
by bolson
As you stated there were no lost packets during this time period. However, there was substantial variability in terms of response times. This variability would probably not account for your timeout issues but it does suggest some sporadic slowdowns on your network. How frequently do these timeouts occur?
Re: occasional "socket timeout after 10 seconds"
Posted: Mon Aug 28, 2017 7:51 am
by lpereira
Several times during the day. But is a ramdom issue, i can have a timeout from a remote server and 5 minutes later, a timeout from a server located 4 feets from my desk (literraly).
Before implement XI, i had the same boxes monitored by Nagios Core and this issue was not present then.
Re: occasional "socket timeout after 10 seconds"
Posted: Mon Aug 28, 2017 11:31 am
by bolson
Hello lpereira,
The key to resolving this will be to determine if these timeouts correspond to elevated ping times. Looking at the ping.txt file you sent me indicates that on that particular host, you have ping times that would likely result in sporadic or random timeouts. If we can identify a host that has NO elevated ping times but is experiencing the timeout issue we'll have to pursue another possible cause. Ping the host which is 4 feet from your desk. Let it run and look at it from time to time and let us know what you observe.
Thank you!