Page 1 of 1

Random Nagios Time-out or Description/table: no response

Posted: Fri Oct 05, 2018 8:55 am
by lrocca
Hi everyone,
we are using Nagios XI and we are having a issue with some checks and the problems are random. We are having these type of errors:
ERROR: Alarm signal (Nagios time-out)
ERROR: Description/Type table : No response from remote host

the community string is the same on all the hosts that we are monitoring (600), the snmpd is running and the firewall is disabled.
sometimes during the day we have these errors but if we manually force the single check it return OK. we have increased the nagios services time to 180 but with no luck.

anyone have any suggestions? Thanks!

Re: Random Nagios Time-out or Description/table: no response

Posted: Fri Oct 05, 2018 11:17 am
by ssax
Generally the Alarm signal (Nagios time-out) error is related to the low timeout of 5 seconds that some of the snmp plugins have set. A lot of times a SNMP query can take over 5 seconds. You would need to look at what plugins your services are using and more than likely you'll need to edit your commands or services to add in a -t 30 to them so that the plugin doesn't use it's default timeout of 5 seconds.

The Description/Type table : No response from remote host issue is likely the remote device that is unable to respond for some reason. You will need to investigate the remote devices further to see why they aren't responding during that time.

Re: Random Nagios Time-out or Description/table: no response

Posted: Mon Oct 08, 2018 2:44 am
by lrocca
ssax wrote:Generally the Alarm signal (Nagios time-out) error is related to the low timeout of 5 seconds that some of the snmp plugins have set. A lot of times a SNMP query can take over 5 seconds. You would need to look at what plugins your services are using and more than likely you'll need to edit your commands or services to add in a -t 30 to them so that the plugin doesn't use it's default timeout of 5 seconds.

The Description/Type table : No response from remote host issue is likely the remote device that is unable to respond for some reason. You will need to investigate the remote devices further to see why they aren't responding during that time.
Thanks for your answer! We will try to add the -t to all the checks that presents that error and will let you know. For the description/type error we saw that a lot of times that we have the error it's because the host's memory or cpu are running at 90/100%.

Re: Random Nagios Time-out or Description/table: no response

Posted: Mon Oct 08, 2018 12:03 pm
by ssax
You could also look at increasing your max_check_attempts and/or retry_interval in order to reduce the false positives on them.