Can anyone explain
Can anyone explain
I have a couple of OL servers which i monitor with different timeouts. But still i get sometimes snmp unknow and after a couple of seconds it is gone. This is with different service checks.
I don't insult, I diagnose.
-
avandemore
- Posts: 1597
- Joined: Tue Sep 27, 2016 4:57 pm
Re: Can anyone explain
[1483053156] SERVICE ALERT: USER21;CPU Load;UNKNOWN;SOFT;1;No answer from host
[1483053300] SERVICE ALERT: WGCluster;Received Packets;UNKNOWN;SOFT;1;UNKNOWN - Non-numeric value found: Timeout: No Response from x.x.x.x.
[1483056509] SERVICE ALERT: XS0009;Hardware;UNKNOWN;SOFT;1;Compaq/HP Agent Check: ERROR: No snmp response from x.x.x.x (alarm)
I don't insult, I diagnose.
Re: Can anyone explain
What are your current values set for timing out on checks on the check_snmp command running on these services, and what is your Nagios check timeout set to? I've seen SNMP take as long as 5 minutes to respond at times, so you may just need to increase this threshold.
Former Nagios Employee
Re: Can anyone explain
thres is 5 minutes, 10 minutes and even 15 minutesrkennedy wrote:What are your current values set for timing out on checks on the check_snmp command running on these services, and what is your Nagios check timeout set to? I've seen SNMP take as long as 5 minutes to respond at times, so you may just need to increase this threshold.
I don't insult, I diagnose.
Re: Can anyone explain
Those errors indicate that at some times the checks do not receive a response (whether it's a network issue, a remote SNMP daemon issue, a network connectivity issue, etc). They go away because of the max_check_attempts and the retry_interval options. Do you see anything in the remote devices logs? How does the load look on the remote device? Do you have an IPS/Firewall device in-between that could be blocking it?