occasional "socket timeout after 10 seconds"

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
caterpillartce
Posts: 117
Joined: Mon Jul 11, 2016 11:22 am

occasional "socket timeout after 10 seconds"

Post by caterpillartce »

Hello,

We have been having this issue that some of the hosts being monitored randomly have alerts that "socket timeout after 10 seconds". This happens to both check_nrpe and check_nt services. And the hosts are all Windows servers. I'd like to understand why this is happening and any fix to it?

Thanks!
bolson

Re: occasional "socket timeout after 10 seconds"

Post by bolson »

Please send a copy of your profile, you can download it by going to Admin > System Config > System Profile and click the Download Profile button in the top right corner.
caterpillartce
Posts: 117
Joined: Mon Jul 11, 2016 11:22 am

Re: occasional "socket timeout after 10 seconds"

Post by caterpillartce »

I PM'ed you the profile file.

Thanks!
bolson

Re: occasional "socket timeout after 10 seconds"

Post by bolson »

Thank you for the profile. You have a MySQL table which is reporting an incorrect key error. Please execute the following from the command line and let us know if whether it resolves your issue:

Code: Select all

/usr/local/nagiosxi/scripts/repair_databases.sh
bolson

Re: occasional "socket timeout after 10 seconds"

Post by bolson »

Hello caterpillartce,

Did this resolve your issue?
caterpillartce
Posts: 117
Joined: Mon Jul 11, 2016 11:22 am

Re: occasional "socket timeout after 10 seconds"

Post by caterpillartce »

It does not happen everyday. So far no timeout message yet, but it had happened in the past that sometimes we got a few timeouts everyday and other times there would be none for a few days.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: occasional "socket timeout after 10 seconds"

Post by tgriep »

It could be that at random times the windows server becomes busy and it cannot respond in time before the default timeout for the plugin kicks in and displays the timeout message.
But, the default time out for those plugins can be increased bu going to the Core Config Manager > Commands menu, find those commands and add the -t option to the command like the example below.

Code: Select all

-t 60
That will increase the timeout to 60 seconds and hopefully fix the intermittent error message. Try that and let us now how it works out.
Be sure to check out our Knowledgebase for helpful articles and solutions!
caterpillartce
Posts: 117
Joined: Mon Jul 11, 2016 11:22 am

Re: occasional "socket timeout after 10 seconds"

Post by caterpillartce »

so the incorrect key in the table does not have much to do with the timeouts? I did the fix yesterday and can increase the timeout too, but wanted to know which one is the fix if it does get fixed.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: occasional "socket timeout after 10 seconds"

Post by scottwilkerson »

caterpillartce wrote:so the incorrect key in the table does not have much to do with the timeouts? I did the fix yesterday and can increase the timeout too, but wanted to know which one is the fix if it does get fixed.
It's hard to really say conclusively. There are several reason that can cause this issue, it could be network congestion, too many threads being used on the monitoring server, or a isolated problem at the far end.

The timeout is basically saying it made the request, but didn't hear back from the remote server in the time allotted.

This generally isn't a problem on the Nagios server, and if intermittent I would tend to look at connectivity problems.
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
caterpillartce
Posts: 117
Joined: Mon Jul 11, 2016 11:22 am

Re: occasional "socket timeout after 10 seconds"

Post by caterpillartce »

Thank you for the explanation. We also occasionally receive alerts like "No data was received from host" or "could not fetch information from server". Again this happens randomly, but most often to servers being monitored overseas. Will increasing timeout threshold by using -t in the commands decrease the occurrence of those alerts too? Thanks
Locked