[Solved] Nagios and random snmp error
[Solved] Nagios and random snmp error
Hi!
I have next setup which works well until i added aditional 250 checks (sum of all 492).
Configuration: Nagios Core 3.5.0 + Ndo2db--->connected to WAGO 750-880
Last week i was looking at event log and found a lot of this lines:
[14-04-2014 13:53:40] SERVICE ALERT: Sys_room1;Fire alarm;OK;SOFT;2;OK :0 state
[14-04-2014 13:52:45] SERVICE ALERT: Sys_room1;Fire alarm;UNKNOWN;SOFT;1;SNMP REQUEST ERROR : No response from remote host '192.168.21.1'.
Strange thing here is that if i force same check after i get error, right vaule returns. Pattern from this errors is unknown.
If anyone would give me some pointers where to look i would be very happy.
Thank you.
I have next setup which works well until i added aditional 250 checks (sum of all 492).
Configuration: Nagios Core 3.5.0 + Ndo2db--->connected to WAGO 750-880
Last week i was looking at event log and found a lot of this lines:
[14-04-2014 13:53:40] SERVICE ALERT: Sys_room1;Fire alarm;OK;SOFT;2;OK :0 state
[14-04-2014 13:52:45] SERVICE ALERT: Sys_room1;Fire alarm;UNKNOWN;SOFT;1;SNMP REQUEST ERROR : No response from remote host '192.168.21.1'.
Strange thing here is that if i force same check after i get error, right vaule returns. Pattern from this errors is unknown.
If anyone would give me some pointers where to look i would be very happy.
Thank you.
Last edited by notic on Tue May 06, 2014 6:31 am, edited 1 time in total.
-
- -fno-stack-protector
- Posts: 4366
- Joined: Mon Nov 19, 2012 12:10 pm
Re: Nagios and random snmp error
How often is this happening? What kind of load and disk io do you have on the system when this is happening?
Code: Select all
top | head -n 2
free -m
iostat
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Re: Nagios and random snmp error
hi!
Thank you.
This is completely random error for random service. Let's say that i have 250 checks from one host (wago), 10% random checks will fail. It fails thru all day.
Configuration is IBM X-Blade with 4 proc+3GB RAM+SCSCI 40GB Hard drive
iostat
free -m
top | head -n 2
Thank you.
This is completely random error for random service. Let's say that i have 250 checks from one host (wago), 10% random checks will fail. It fails thru all day.
Configuration is IBM X-Blade with 4 proc+3GB RAM+SCSCI 40GB Hard drive
iostat
Code: Select all
Linux 2.6.32-431.11.2.el6.x86_64 (nagios) 04/15/2014 _x86_64_ (4 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
5.82 0.00 1.53 0.37 0.00 92.28
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
vda 59.36 17.86 826.48 1493650 69119066
dm-0 104.23 17.76 826.48 1485154 69119088
dm-1 0.00 0.03 0.00 2376 0
Code: Select all
total used free shared buffers cached
Mem: 2887 1579 1307 0 163 957
-/+ buffers/cache: 457 2429
Swap: 3023 0 3023
Code: Select all
top - 07:24:04 up 23:16, 1 user, load average: 0.54, 0.58, 0.53
Tasks: 134 total, 4 running, 130 sleeping, 0 stopped, 0 zombie
-
- -fno-stack-protector
- Posts: 4366
- Joined: Mon Nov 19, 2012 12:10 pm
Re: Nagios and random snmp error
OK, so we don't have a whole lot of checks, less than 500 is pretty light. You don't have excessive disk, memory, or cpu utilization. What plugin are you using to check these devices?
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Re: Nagios and random snmp error
Hi.
Plugins that are used for this host are: check_snmp and check_centreon_snmp_value. Both checks gives me random errors. Funny thing is that same checks work fine on other hosts.
I
Plugins that are used for this host are: check_snmp and check_centreon_snmp_value. Both checks gives me random errors. Funny thing is that same checks work fine on other hosts.
I
-
- -fno-stack-protector
- Posts: 4366
- Joined: Mon Nov 19, 2012 12:10 pm
Re: Nagios and random snmp error
Um, so are you using a nagios based monitoring solution? Based on past times I have seen this, it is usually either a ulimit for max connections, or an issue on the way to the remote device like the packet getting dropped. It is udp afterall.
Code: Select all
ulimit -a
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Re: Nagios and random snmp error
Hi!
i set up new configuration with Nagios Core on same hardware conf as before. Same number of checks, same errors. I tried to use only check_snmp this time.
[root@nagios ~]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 22951
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 22951
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
i set up new configuration with Nagios Core on same hardware conf as before. Same number of checks, same errors. I tried to use only check_snmp this time.
[root@nagios ~]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 22951
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 22951
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
-
- -fno-stack-protector
- Posts: 4366
- Joined: Mon Nov 19, 2012 12:10 pm
Re: Nagios and random snmp error
Considering it is happening with a new system, and your ulimits seem well within reason, it really would seem to be more of an issue with that device. How long have you been monitoring this device with snmp? How many checks are you running against it at one time, say within a 5 minute window? What kind of device, model and manufacturer is it?
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Re: Nagios and random snmp error
Hi,
device is brand new and it's wago 750-880 eth controller. Right now i am running against it 115 snmp checks...but next week i will need to add 90 new checks to it.
Right now checks are in 3 time groups(important "realtime" values,states,"constants"). 50checks = 3min,30checks= 20min, 35checks = 70 min window.
Funny thing is that i am runing almost 200 checks against each cisco catalyst switch without any problems.
device is brand new and it's wago 750-880 eth controller. Right now i am running against it 115 snmp checks...but next week i will need to add 90 new checks to it.
Right now checks are in 3 time groups(important "realtime" values,states,"constants"). 50checks = 3min,30checks= 20min, 35checks = 70 min window.
Funny thing is that i am runing almost 200 checks against each cisco catalyst switch without any problems.
-
- -fno-stack-protector
- Posts: 4366
- Joined: Mon Nov 19, 2012 12:10 pm
Re: Nagios and random snmp error
Considering your cisco devices handle it just fine, that leaves other networking devices as being in working order. I honestly think you may be hitting a limit on what that device can process at one time. However looking at the documentation they provide for it, there may be an oid that we can walk and get more information.
Code: Select all
snmpwalk -v 2c -c [community string] -t 30 -O n [host\IP] 1.3.6.1.2.1.11
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.