Page 1 of 1

Can anyone explain

Posted: Fri Dec 30, 2016 3:48 pm
by ruffsense
I have a couple of OL servers which i monitor with different timeouts. But still i get sometimes snmp unknow and after a couple of seconds it is gone. This is with different service checks.

Re: Can anyone explain

Posted: Fri Dec 30, 2016 3:51 pm
by avandemore
What does the log say?

Re: Can anyone explain

Posted: Fri Dec 30, 2016 4:01 pm
by ruffsense
[1483053156] SERVICE ALERT: USER21;CPU Load;UNKNOWN;SOFT;1;No answer from host
[1483053300] SERVICE ALERT: WGCluster;Received Packets;UNKNOWN;SOFT;1;UNKNOWN - Non-numeric value found: Timeout: No Response from x.x.x.x.
[1483056509] SERVICE ALERT: XS0009;Hardware;UNKNOWN;SOFT;1;Compaq/HP Agent Check: ERROR: No snmp response from x.x.x.x (alarm)

Re: Can anyone explain

Posted: Tue Jan 03, 2017 11:05 am
by rkennedy
What are your current values set for timing out on checks on the check_snmp command running on these services, and what is your Nagios check timeout set to? I've seen SNMP take as long as 5 minutes to respond at times, so you may just need to increase this threshold.

Re: Can anyone explain

Posted: Tue Jan 03, 2017 3:46 pm
by ruffsense
rkennedy wrote:What are your current values set for timing out on checks on the check_snmp command running on these services, and what is your Nagios check timeout set to? I've seen SNMP take as long as 5 minutes to respond at times, so you may just need to increase this threshold.
thres is 5 minutes, 10 minutes and even 15 minutes

Re: Can anyone explain

Posted: Tue Jan 03, 2017 4:11 pm
by ssax
Those errors indicate that at some times the checks do not receive a response (whether it's a network issue, a remote SNMP daemon issue, a network connectivity issue, etc). They go away because of the max_check_attempts and the retry_interval options. Do you see anything in the remote devices logs? How does the load look on the remote device? Do you have an IPS/Firewall device in-between that could be blocking it?