Page 1 of 1

NSCA leaking stale processes for user 'nagios'

Posted: Wed Aug 10, 2016 2:40 pm
by eclypse
The NSCA addon in our Nagios XI instance is leaking stale processes, like the one below:

Code: Select all

nagios    3461 12679  0 Aug05 ?        00:00:00 nsca -c /usr/local/nagios/etc/nsca.cfg --inetd 

Code: Select all

# ps -ef | grep nsca | wc -l 
160 
As described in this old unresolved Debian bug report below, running a 'strace' on the process ID causes the process to terminate.

https://bugs.debian.org/cgi-bin/bugrepo ... bug=631447

Is this a known issue? Is there a newer nsca binary/configuration that resolves this?

Here is my xinetd nsca config file (/etc/xinetd.d/nsca):

Code: Select all

# default: on
# description: NSCA (Nagios Service Check Acceptor)
service nsca
{
        flags           = REUSE
        socket_type     = stream        
        wait            = no
        user            = nagios
        group           = nagios
        server          = /usr/local/nagios/bin/nsca
        server_args     = -c /usr/local/nagios/etc/nsca.cfg --inetd
        log_on_failure  += USERID
        disable         = no
        #only_from      = 127.0.0.1
        instances       = UNLIMITED
        per_source      = UNLIMITED
        cps             = 5000 0 
}
Here are the important bits from /usr/local/nagios/etc/nsca.cfg:

Code: Select all

pid_file=/var/run/nsca.pid
server_port=5667
nsca_user=nagios
nsca_group=nagios
debug=0
command_file=/usr/local/nagios/var/rw/nagios.cmd
alternate_dump_file=/usr/local/nagios/var/rw/nsca.dump
aggregate_writes=0
append_to_file=0
max_packet_age=30
password=myfakepassword
decryption_method=1

Re: NSCA leaking stale processes for user 'nagios'

Posted: Wed Aug 10, 2016 5:04 pm
by ssax
The latest version stable 2.9.1, you can check your send_nsca version with this:

Code: Select all

/usr/local/nagios/libexec/send_nsca -V
You should be able to compile a new send_nsca from here: (it's version 2.9.2-RC1)

https://github.com/NagiosEnterprises/ns ... a-2-9-2RC1

Re: NSCA leaking stale processes for user 'nagios'

Posted: Wed Aug 10, 2016 5:05 pm
by tgriep
Was it the same host that caused all of the connections to the XI server?

Re: NSCA leaking stale processes for user 'nagios'

Posted: Fri Aug 12, 2016 2:52 pm
by eclypse
ssax wrote:The latest version stable 2.9.1, you can check your send_nsca version with this:

Code: Select all

/usr/local/nagios/libexec/send_nsca -V
You should be able to compile a new send_nsca from here: (it's version 2.9.2-RC1)

https://github.com/NagiosEnterprises/ns ... a-2-9-2RC1

Thanks, I downloaded and compiled the new versions of nsca and send_nsca. I have updated /usr/local/nagios/bin/nsca on my Nagios host with the 2.9.2-RC1 version . The clients I found are still using a fairly old version of send_nsca (2.7.2). Things seem to be working OK so far. I'll look into upgrading the clients if I continue to see issues.

I suspect this entry in the ChangeLog is what may resolve the issue at hand, though I wasn't able to pinpoint the actual piece of code change:

Code: Select all

2.9.x - xx/xx/xxxx
------------------
- Race condition when opening command file (mvela / John Frickson)
tgriep wrote:Was it the same host that caused all of the connections to the XI server?
I'm not quire sure. I have ~11 hosts that are all running the same passive check through their local cronjobs. They all run every 10 minutes.

Re: NSCA leaking stale processes for user 'nagios'

Posted: Mon Aug 15, 2016 10:06 am
by lmiltchev
Things seem to be working OK so far. I'll look into upgrading the clients if I continue to see issues.
Sounds good! Let us know if it is all right to close this thread.

Re: NSCA leaking stale processes for user 'nagios'

Posted: Tue Aug 30, 2016 3:07 pm
by eclypse
Returning from a 2 week vacation and I see we have accumulated >300 nsca processes, so it would appear the problem still persists in the newer version of the nsca daemon.

Code: Select all

# ps -ef | grep nsca | wc -l
312

Re: NSCA leaking stale processes for user 'nagios'

Posted: Wed Aug 31, 2016 12:10 pm
by tgriep
Is it the same remote host keeping the connection open or was it different ones?
Did you upgrade all of the clients as well?