Page 1 of 2
NRPE FreeBSD limits
Posted: Thu Dec 04, 2014 11:11 am
by tjay
Hi guys,
It seem i am hitting some limits in either FreeBSD or NRPE.
I have about 10 commands configured in NRPE but when i set the server live in the nagios config it will be hit and miss over if i get the result.
I have tracked the issue down the the client machine causing the issue and found the following in the /var/log/messages
Code: Select all
Dec 4 14:27:13 client-1 kernel: sonewconn: pcb 0xfffff803adeb0ab8: Listen queue overflow: 8 already in queue awaiting acceptance
Dec 4 14:44:38 client-1 kernel: sonewconn: pcb 0xfffff803adeb0ab8: Listen queue overflow: 8 already in queue awaiting acceptance
Dec 4 14:45:12 client-1 kernel: sonewconn: pcb 0xfffff803425f6310: Listen queue overflow: 8 already in queue awaiting acceptance
Dec 4 14:52:13 client-1 kernel: sonewconn: pcb 0xfffff803425f6310: Listen queue overflow: 8 already in queue awaiting acceptance
Dec 4 15:18:21 client-1 kernel: sonewconn: pcb 0xfffff803425f6310: Listen queue overflow: 8 already in queue awaiting acceptance
Dec 4 15:43:41 client-1 kernel: sonewconn: pcb 0xfffff801af1457a8: Listen queue overflow: 8 already in queue awaiting acceptance
I have increase some of the kernel buffers to try and get over this with sysctl
Code: Select all
kern.ipc.somaxconn
net.inet.tcp.sendspace
net.inet.tcp.recvspace
net.inet.udp.recvspace
It seems that the max queue length for NRPE is set at five, meaning i can only run five checks on NRPE at once
netstat -LAan
Code: Select all
Current listen queue sizes (qlen/incqlen/maxqlen)
Tcpcb Proto Listen Local Address
fffff80342156000 tcp4 0/0/5 *.5666
fffff801af7b5000 tcp6 0/0/5 *.5666
fffff801afaadc00 tcp4 0/0/50 127.0.0.1.8080
fffff801af7b6400 tcp4 0/0/128 *.8081
fffff8000e087400 tcp4 0/0/128 *.80
fffff8000e7cc000 tcp4 0/0/10 127.0.0.1.25
fffff8000e7b5000 tcp4 0/0/128 127.0.0.1.5432
fffff801f4b19c00 tcp6 0/0/128 ::1.5432
fffff8000e7c6800 tcp4 0/0/100 *.9090
fffff8000e7cc400 tcp4 0/0/128 *.22
fffff8000e7cc800 tcp6 0/0/128 *.22
unix 0/0/128 /tmp/.s.PGSQL.5432
unix 0/0/4 /var/run/devd.pipe
Does anyone have any suggestions on how i can get around this limitation, either through FreeBSD or NRPE
Re: NRPE FreeBSD limits
Posted: Thu Dec 04, 2014 3:02 pm
by sreinhardt
I can't say I've seen this one before. Is this a current version of nrpe? Is it from ports or straight from source?
Re: NRPE FreeBSD limits
Posted: Thu Dec 04, 2014 8:44 pm
by tjay
sreinhardt wrote:I can't say I've seen this one before. Is this a current version of nrpe? Is it from ports or straight from source?
This from ports, it was installed with pkg
Code: Select all
NRPE - Nagios Remote Plugin Executor
Copyright (c) 1999-2008 Ethan Galstad ([email protected])
Version: 2.15
Last Modified: 09-06-2013
License: GPL v2 with exemptions (-l for more info)
SSL/TLS Available: Anonymous DH Mode, OpenSSL 0.9.6 or higher required
TCP Wrappers Available
Re: NRPE FreeBSD limits
Posted: Fri Dec 05, 2014 1:50 pm
by emislivec
tjay wrote:I have about 10 commands configured in NRPE but when i set the server live in the nagios config it will be hit and miss over if i get the result.
What do you mean by "miss over if i get the result"? Do the check_nrpe checks timeout? Have you been able to test the check commands manually?
Are you running NRPE with xinetd or as a standalone daemon? NRPE's listen queue size only matters when run as a daemon.
tjay wrote:Code: Select all
Dec 4 15:43:41 client-1 kernel: sonewconn: pcb 0xfffff801af1457a8: Listen queue overflow: 8 already in queue awaiting acceptance
tjay wrote:It seems that the max queue length for NRPE is set at five, meaning i can only run five checks on NRPE at once
netstat -LAan
Code: Select all
Current listen queue sizes (qlen/incqlen/maxqlen)
Tcpcb Proto Listen Local Address
fffff80342156000 tcp4 0/0/5 *.5666
fffff801af7b5000 tcp6 0/0/5 *.5666
fffff801afaadc00 tcp4 0/0/50 127.0.0.1.8080
fffff801af7b6400 tcp4 0/0/128 *.8081
fffff8000e087400 tcp4 0/0/128 *.80
fffff8000e7cc000 tcp4 0/0/10 127.0.0.1.25
fffff8000e7b5000 tcp4 0/0/128 127.0.0.1.5432
fffff801f4b19c00 tcp6 0/0/128 ::1.5432
fffff8000e7c6800 tcp4 0/0/100 *.9090
fffff8000e7cc400 tcp4 0/0/128 *.22
fffff8000e7cc800 tcp6 0/0/128 *.22
unix 0/0/128 /tmp/.s.PGSQL.5432
unix 0/0/4 /var/run/devd.pipe
Does anyone have any suggestions on how i can get around this limitation, either through FreeBSD or NRPE
The kernel messages are for another process listening on a socket with a wait queue of size 8. What is the load like on the host?
NRPE has a default of 5 that can be changed with listen_queue_size in nrpe.cfg, but this has no effect under xinetd. Nagios shouldn't be running 10 checks on the same host at the same time, it should spread them out. Also, at least five connections should be able to be queued before the queue fills. I'm not sure if this is a problem with NRPE itself.
What is the output of this from your nagios server as root:
Code: Select all
/usr/local/nagios/libexec/check_nrpe -H <problem host address>
Re: NRPE FreeBSD limits
Posted: Fri Dec 05, 2014 5:07 pm
by tjay
emislivec wrote:
What do you mean by "miss over if i get the result"? Do the check_nrpe checks timeout? Have you been able to test the check commands manually?
Are you running NRPE with xinetd or as a standalone daemon? NRPE's listen queue size only matters when run as a daemon.
The NRPE daemon is running standalone.
And what i mean by hit and miss is that sometime 1-3 of the checks on that host will get a one of the following errors:
Code: Select all
CHECK_NRPE: Error - Could not complete SSL handshake.
(Return code of 141 is out of bounds)
but when i force a check again it will complete fine and return OK.
If tail -f /var/log/messages when the checks are being run the
Listen queue overflow error comes up as soon as the check fails.
emislivec wrote:
The kernel messages are for another process listening on a socket with a wait queue of size 8. What is the load like on the host?
My understanding is that the kernel message is saying that there are 8 in the queue at the moment, not that 8 is the limit. The load on the machine is very low as it is a new machine waiting to be put into production
emislivec wrote:
NRPE has a default of 5 that can be changed with listen_queue_size in nrpe.cfg, but this has no effect under xinetd. Nagios shouldn't be running 10 checks on the same host at the same time, it should spread them out. Also, at least five connections should be able to be queued before the queue fills. I'm not sure if this is a problem with NRPE itself.
What is the output of this from your nagios server as root:
Code: Select all
/usr/local/nagios/libexec/check_nrpe -H <problem host address>
The manual check of the services with the check_nrpe command returns OK but when i add it to Nagios and it is scheduled, some of the tests will fail.
What is the risk of increasing the listen_queue_size? And is there a rule of thumb for setting that like it will have to machine the total amount of checks on the machine?
Re: NRPE FreeBSD limits
Posted: Mon Dec 08, 2014 11:28 am
by emislivec
tjay wrote:And what i mean by hit and miss is...
I had read that wrong ("..be hit, and miss over..."), this makes more sense.
I'm still unclear why you're seeing the errors you are. If the system load is low, nrpe should be able to accept the connections fast enough to avoid a backlog.
Increasing the listen queue size would consume a bit more system resources. 5 is fairly conservative, you should be able to increase this to the number of checks run against the host (this seems a good rule of thumb).
Re: NRPE FreeBSD limits
Posted: Wed Dec 10, 2014 12:08 pm
by tjay
emislivec wrote:tjay wrote:And what i mean by hit and miss is...
I had read that wrong ("..be hit, and miss over..."), this makes more sense.
I'm still unclear why you're seeing the errors you are. If the system load is low, nrpe should be able to accept the connections fast enough to avoid a backlog.
Increasing the listen queue size would consume a bit more system resources. 5 is fairly conservative, you should be able to increase this to the number of checks run against the host (this seems a good rule of thumb).
I have added the following line to be nrpe.cfg
and restarted nrpe but i am still getting the issues and getting
Code: Select all
nrpe[76268]: Unknown option specified in config file '/usr/local/etc/nrpe.cfg' - Line 18
in my
/var/log/messages
Re: NRPE FreeBSD limits
Posted: Wed Dec 10, 2014 12:25 pm
by tjay
Looks like the
listen_queue_size options was added to 2.16
https://github.com/NagiosEnterprises/nr ... 33add157f4
But the port for pkg for nrpe2 is only at 2.15
http://www.freshports.org/net-mgmt/nrpe/
Also seems to be the case with CentOS as well
Code: Select all
yum list install nrpe
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: centos.mirror.constant.com
* epel: mirror.vorboss.net
* extras: centos.mirror.constant.com
* updates: centos.mirror.constant.com
Installed Packages
nrpe.x86_64 2.15-2.el6 @epel
Re: NRPE FreeBSD limits
Posted: Wed Dec 10, 2014 5:19 pm
by sreinhardt
You are correct, that is the same version on Cent, although I cannot say that I have ever seen this issue in Cent. Are you able to compile from source? I think you are going to find either compilation of 2.16 or modification of 2.15 and compilation of that is needed to correct this.
Re: NRPE FreeBSD limits
Posted: Thu Dec 11, 2014 4:14 am
by tjay
sreinhardt wrote:You are correct, that is the same version on Cent, although I cannot say that I have ever seen this issue in Cent. Are you able to compile from source? I think you are going to find either compilation of 2.16 or modification of 2.15 and compilation of that is needed to correct this.
I have cloned the repo on github
https://github.com/NagiosEnterprises/nrpe and compiled it.
I have set it up on the FreeBSD machine i am having issues with and it seems to be doing better
Code: Select all
netstat -LAan
Current listen queue sizes (qlen/incqlen/maxqlen)
Tcpcb Proto Listen Local Address
fffff8000e7b5c00 tcp4 0/0/15 *.5666
fffff8031a13f800 tcp6 0/0/15 *.5666
I have enabled the machine on my nagios instance as well and it seems to cope better now.
Do we know when 2.16 is going to be released?
Also does anyone know the official repo for NRPE?
If i google it i get
http://exchange.nagios.org/directory/Ad ... or/details
but it looks like
https://github.com/NagiosEnterprises/nrpe is more up to date.