Can anyone explain

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
User avatar
ruffsense
Posts: 140
Joined: Thu Apr 11, 2013 12:40 am

Can anyone explain

Post by ruffsense »

I have a couple of OL servers which i monitor with different timeouts. But still i get sometimes snmp unknow and after a couple of seconds it is gone. This is with different service checks.
I don't insult, I diagnose.
avandemore
Posts: 1597
Joined: Tue Sep 27, 2016 4:57 pm

Re: Can anyone explain

Post by avandemore »

What does the log say?
Previous Nagios employee
User avatar
ruffsense
Posts: 140
Joined: Thu Apr 11, 2013 12:40 am

Re: Can anyone explain

Post by ruffsense »

[1483053156] SERVICE ALERT: USER21;CPU Load;UNKNOWN;SOFT;1;No answer from host
[1483053300] SERVICE ALERT: WGCluster;Received Packets;UNKNOWN;SOFT;1;UNKNOWN - Non-numeric value found: Timeout: No Response from x.x.x.x.
[1483056509] SERVICE ALERT: XS0009;Hardware;UNKNOWN;SOFT;1;Compaq/HP Agent Check: ERROR: No snmp response from x.x.x.x (alarm)
I don't insult, I diagnose.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Can anyone explain

Post by rkennedy »

What are your current values set for timing out on checks on the check_snmp command running on these services, and what is your Nagios check timeout set to? I've seen SNMP take as long as 5 minutes to respond at times, so you may just need to increase this threshold.
Former Nagios Employee
User avatar
ruffsense
Posts: 140
Joined: Thu Apr 11, 2013 12:40 am

Re: Can anyone explain

Post by ruffsense »

rkennedy wrote:What are your current values set for timing out on checks on the check_snmp command running on these services, and what is your Nagios check timeout set to? I've seen SNMP take as long as 5 minutes to respond at times, so you may just need to increase this threshold.
thres is 5 minutes, 10 minutes and even 15 minutes
I don't insult, I diagnose.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Can anyone explain

Post by ssax »

Those errors indicate that at some times the checks do not receive a response (whether it's a network issue, a remote SNMP daemon issue, a network connectivity issue, etc). They go away because of the max_check_attempts and the retry_interval options. Do you see anything in the remote devices logs? How does the load look on the remote device? Do you have an IPS/Firewall device in-between that could be blocking it?
Locked