occasional "socket timeout after 10 seconds"
-
- Posts: 117
- Joined: Mon Jul 11, 2016 11:22 am
occasional "socket timeout after 10 seconds"
Hello,
We have been having this issue that some of the hosts being monitored randomly have alerts that "socket timeout after 10 seconds". This happens to both check_nrpe and check_nt services. And the hosts are all Windows servers. I'd like to understand why this is happening and any fix to it?
Thanks!
We have been having this issue that some of the hosts being monitored randomly have alerts that "socket timeout after 10 seconds". This happens to both check_nrpe and check_nt services. And the hosts are all Windows servers. I'd like to understand why this is happening and any fix to it?
Thanks!
Re: occasional "socket timeout after 10 seconds"
Please send a copy of your profile, you can download it by going to Admin > System Config > System Profile and click the Download Profile button in the top right corner.
-
- Posts: 117
- Joined: Mon Jul 11, 2016 11:22 am
Re: occasional "socket timeout after 10 seconds"
I PM'ed you the profile file.
Thanks!
Thanks!
Re: occasional "socket timeout after 10 seconds"
Thank you for the profile. You have a MySQL table which is reporting an incorrect key error. Please execute the following from the command line and let us know if whether it resolves your issue:
Code: Select all
/usr/local/nagiosxi/scripts/repair_databases.sh
Re: occasional "socket timeout after 10 seconds"
Hello caterpillartce,
Did this resolve your issue?
Did this resolve your issue?
-
- Posts: 117
- Joined: Mon Jul 11, 2016 11:22 am
Re: occasional "socket timeout after 10 seconds"
It does not happen everyday. So far no timeout message yet, but it had happened in the past that sometimes we got a few timeouts everyday and other times there would be none for a few days.
Re: occasional "socket timeout after 10 seconds"
It could be that at random times the windows server becomes busy and it cannot respond in time before the default timeout for the plugin kicks in and displays the timeout message.
But, the default time out for those plugins can be increased bu going to the Core Config Manager > Commands menu, find those commands and add the -t option to the command like the example below.
That will increase the timeout to 60 seconds and hopefully fix the intermittent error message. Try that and let us now how it works out.
But, the default time out for those plugins can be increased bu going to the Core Config Manager > Commands menu, find those commands and add the -t option to the command like the example below.
Code: Select all
-t 60
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
- Posts: 117
- Joined: Mon Jul 11, 2016 11:22 am
Re: occasional "socket timeout after 10 seconds"
so the incorrect key in the table does not have much to do with the timeouts? I did the fix yesterday and can increase the timeout too, but wanted to know which one is the fix if it does get fixed.
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: occasional "socket timeout after 10 seconds"
It's hard to really say conclusively. There are several reason that can cause this issue, it could be network congestion, too many threads being used on the monitoring server, or a isolated problem at the far end.caterpillartce wrote:so the incorrect key in the table does not have much to do with the timeouts? I did the fix yesterday and can increase the timeout too, but wanted to know which one is the fix if it does get fixed.
The timeout is basically saying it made the request, but didn't hear back from the remote server in the time allotted.
This generally isn't a problem on the Nagios server, and if intermittent I would tend to look at connectivity problems.
-
- Posts: 117
- Joined: Mon Jul 11, 2016 11:22 am
Re: occasional "socket timeout after 10 seconds"
Thank you for the explanation. We also occasionally receive alerts like "No data was received from host" or "could not fetch information from server". Again this happens randomly, but most often to servers being monitored overseas. Will increasing timeout threshold by using -t in the commands decrease the occurrence of those alerts too? Thanks