Nagios Support Forum

Posted: **Thu May 03, 2012 8:07 am**

I just deployed a new server upgrade a nagios 2.x system to nagios 3.x. It's monitoring 1300 hosts and about 3400 services. The old operating system is AIX, but I'm replacing it with Linux. Checks are scheduled for every 5 minutes and once it gets rolling I'm getting random services flapping, typically about 150 are in a flapping state at the same time, but its completely random.

The ones that are flapping are going to unknown, unable to determine plugin output. I've verified from the command line that I'm seeing the same thing

Code: Select all

/bin/ping -n -U -w 30 -c 5 xxx.yyy.zzz
connect: No buffer space available

/bin/ping -n -U -w 30 -c 5 xxx.yyy.zzz
PING xxx.yyy.zzz (x.x.x.x) 56(84) bytes of data.
64 bytes from x.x.x.x: icmp_seq=1 ttl=254 time=0.456 ms
64 bytes from x.x.x.x: icmp_seq=2 ttl=254 time=0.398 ms

I was thinking the system is running out of available TCP connections, but looking at netstat and the files in /proc/sys/net/core the number of connections look relatively low. I'm not real familiar with this area of Linux so I certainly could be missing something.

My first thought was memory, but it seems fine

Code: Select all

Mem:   4043832k total,  3048080k used,   995752k free,   161512k buffers
Swap:  6094840k total,       76k used,  6094764k free,   482860k cached

Anyone ever seen this?

Posted: **Thu May 03, 2012 3:23 pm**

I figured it out. I noticed these messages in /var/log/messages

Code: Select all

kernel: Neighbour table overflow.

Following the info here, http://www.gnulinuxclub.org/index.php?o ... &Itemid=49 I was able to fix the issue.

the fix:

Code: Select all

[root@xxxxx ~]# cat /proc/sys/net/ipv4/neigh/default/gc_thresh*
128
512
1024
[root@xxxxx ~]# echo 512 > /proc/sys/net/ipv4/neigh/default/gc_thresh1
[root@xxxxx ~]# echo 2048 > /proc/sys/net/ipv4/neigh/default/gc_thresh2
[root@xxxxx ~]# echo 4096 > /proc/sys/net/ipv4/neigh/default/gc_thresh3
[root@xxxxx ~]# cat /proc/sys/net/ipv4/neigh/default/gc_thresh*
512
2048
4096

Within 2 minutes of making this change all of the errors cleared. Since the /proc file system is volatile I rebooted to see if the problem re-appeared and it did.

Adding these lines to /etc/sysctl.conf will make the changes persist through a reboot.

Code: Select all

net.ipv4.neigh.default.gc_thresh1 = 512
net.ipv4.neigh.default.gc_thresh2 = 2048
net.ipv4.neigh.default.gc_thresh3 = 4096

Posted: **Thu May 03, 2012 7:18 pm**

I vaguely remember having the same or a similar problem when I migrated from SuSE to CentOS... if I remember correctly the work around that I used was to swap from check_ping to check_icmp which didn't consume whichever resource it was that I was running out of :p, I think in general check_icmp is just more resource efficient.

Thanks for sharing your work around!

Nagios Support Forum

No buffer space available

No buffer space available

Re: No buffer space availalbe

Re: No buffer space availalbe