Page 1 of 2
occasional "socket timeout after 10 seconds"
Posted: Tue Aug 22, 2017 10:16 am
by caterpillartce
Hello,
We have been having this issue that some of the hosts being monitored randomly have alerts that "socket timeout after 10 seconds". This happens to both check_nrpe and check_nt services. And the hosts are all Windows servers. I'd like to understand why this is happening and any fix to it?
Thanks!
Re: occasional "socket timeout after 10 seconds"
Posted: Tue Aug 22, 2017 1:51 pm
by bolson
Please send a copy of your profile, you can download it by going to Admin > System Config > System Profile and click the Download Profile button in the top right corner.
Re: occasional "socket timeout after 10 seconds"
Posted: Tue Aug 22, 2017 1:57 pm
by caterpillartce
I PM'ed you the profile file.
Thanks!
Re: occasional "socket timeout after 10 seconds"
Posted: Tue Aug 22, 2017 2:12 pm
by bolson
Thank you for the profile. You have a MySQL table which is reporting an incorrect key error. Please execute the following from the command line and let us know if whether it resolves your issue:
Code: Select all
/usr/local/nagiosxi/scripts/repair_databases.sh
Re: occasional "socket timeout after 10 seconds"
Posted: Wed Aug 23, 2017 9:36 am
by bolson
Hello caterpillartce,
Did this resolve your issue?
Re: occasional "socket timeout after 10 seconds"
Posted: Wed Aug 23, 2017 10:23 am
by caterpillartce
It does not happen everyday. So far no timeout message yet, but it had happened in the past that sometimes we got a few timeouts everyday and other times there would be none for a few days.
Re: occasional "socket timeout after 10 seconds"
Posted: Wed Aug 23, 2017 2:06 pm
by tgriep
It could be that at random times the windows server becomes busy and it cannot respond in time before the default timeout for the plugin kicks in and displays the timeout message.
But, the default time out for those plugins can be increased bu going to the Core Config Manager > Commands menu, find those commands and add the -t option to the command like the example below.
That will increase the timeout to 60 seconds and hopefully fix the intermittent error message. Try that and let us now how it works out.
Re: occasional "socket timeout after 10 seconds"
Posted: Wed Aug 23, 2017 2:25 pm
by caterpillartce
so the incorrect key in the table does not have much to do with the timeouts? I did the fix yesterday and can increase the timeout too, but wanted to know which one is the fix if it does get fixed.
Re: occasional "socket timeout after 10 seconds"
Posted: Thu Aug 24, 2017 9:58 am
by scottwilkerson
caterpillartce wrote:so the incorrect key in the table does not have much to do with the timeouts? I did the fix yesterday and can increase the timeout too, but wanted to know which one is the fix if it does get fixed.
It's hard to really say conclusively. There are several reason that can cause this issue, it could be network congestion, too many threads being used on the monitoring server, or a isolated problem at the far end.
The timeout is basically saying it made the request, but didn't hear back from the remote server in the time allotted.
This generally isn't a problem on the Nagios server, and if intermittent I would tend to look at connectivity problems.
Re: occasional "socket timeout after 10 seconds"
Posted: Thu Aug 24, 2017 12:49 pm
by caterpillartce
Thank you for the explanation. We also occasionally receive alerts like "No data was received from host" or "could not fetch information from server". Again this happens randomly, but most often to servers being monitored overseas. Will increasing timeout threshold by using -t in the commands decrease the occurrence of those alerts too? Thanks