It certainly could be, especially if it is accompanied by packet loss. The only way to fix it would be to somehow improve the connection, or you may decide for these to both increase the timeout in hopes that will allow a connection to establish and also you may want to set the max check attempts higher on these to prevent too many alerts if this is common in your environment.lpereira wrote:I have a few servers being monitored on remote locations with a ping TTL flapping between 715 and 1000 ms. The regular TTL is about 150 to 350ms. should that be the cause? is there a way to fix that if is the case?
occasional "socket timeout after 10 seconds"
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: occasional "socket timeout after 10 seconds"
Re: occasional "socket timeout after 10 seconds"
Attached are some screenshot from the latest alert i got today. As you can see i have 2 services in critical, the rest of the services are in green status, this not happen with all the services at the same time, except of course when the server goes down. ping is fine, the ttl is normal. And when i do a recheck the service back to green again.
so the issue seemt to be elsewere. One cause might be the TTL with my boxes on remote location, but this is not the case.
so the issue seemt to be elsewere. One cause might be the TTL with my boxes on remote location, but this is not the case.
You do not have the required permissions to view the files attached to this post.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: occasional "socket timeout after 10 seconds"
It also depends on the command you are running for the services that are timing out. Can you run these from the command line?
How long do they take to execute?
How long do they take to execute?
Re: occasional "socket timeout after 10 seconds"
i can run it from CLI and the response is immediate. it's a simple service check using check_nt.
Code: Select all
[root@nagiosxi libexec]# ./check_nt -H AGDARMSD01 -s "" -p 12489 -v SERVICESTATE -l 'NetBackup Legacy Client Service' -d SHOWALL
NetBackup Legacy Client Service: Started-
bolson
Re: occasional "socket timeout after 10 seconds"
Did you run this from the XI command line:
The idea is to see if the host in question has an intermittently slow connection or packet loss corresponding to the timeouts.
Let the ping command run for hours and then attach ping.txt
Code: Select all
ping ip_address > ping.txtLet the ping command run for hours and then attach ping.txt
Re: occasional "socket timeout after 10 seconds"
i have sent you a PM for the ping test. it was running for 3 hours. no packet lossbolson wrote:Did you run this from the XI command line:
The idea is to see if the host in question has an intermittently slow connection or packet loss corresponding to the timeouts.Code: Select all
ping ip_address > ping.txt
Let the ping command run for hours and then attach ping.txt
-
bolson
Re: occasional "socket timeout after 10 seconds"
Any timeouts for this host during this time period?
-
bolson
Re: occasional "socket timeout after 10 seconds"
As you stated there were no lost packets during this time period. However, there was substantial variability in terms of response times. This variability would probably not account for your timeout issues but it does suggest some sporadic slowdowns on your network. How frequently do these timeouts occur?
Re: occasional "socket timeout after 10 seconds"
Several times during the day. But is a ramdom issue, i can have a timeout from a remote server and 5 minutes later, a timeout from a server located 4 feets from my desk (literraly).
Before implement XI, i had the same boxes monitored by Nagios Core and this issue was not present then.
Before implement XI, i had the same boxes monitored by Nagios Core and this issue was not present then.
-
bolson
Re: occasional "socket timeout after 10 seconds"
Hello lpereira,
The key to resolving this will be to determine if these timeouts correspond to elevated ping times. Looking at the ping.txt file you sent me indicates that on that particular host, you have ping times that would likely result in sporadic or random timeouts. If we can identify a host that has NO elevated ping times but is experiencing the timeout issue we'll have to pursue another possible cause. Ping the host which is 4 feet from your desk. Let it run and look at it from time to time and let us know what you observe.
Thank you!
The key to resolving this will be to determine if these timeouts correspond to elevated ping times. Looking at the ping.txt file you sent me indicates that on that particular host, you have ping times that would likely result in sporadic or random timeouts. If we can identify a host that has NO elevated ping times but is experiencing the timeout issue we'll have to pursue another possible cause. Ping the host which is 4 feet from your desk. Let it run and look at it from time to time and let us know what you observe.
Thank you!