Nagios Support Forum

Posted: **Thu Oct 13, 2016 11:15 am**

Hello,

I have a number of windows servers running nsclient that report their status back to Nagios via NSCA. Xinetd is handling NSCA connections on the Nagios side and what seems to be happen is a few process that handle incoming connections become 'stuck'.

Here's the log messages of a normal connection:

Code: Select all

Oct 13 00:03:17 nagiosxi xinetd[52000]: START: nsca pid=48086 from=::ffff:X.X.X.X
Oct 13 00:03:17 nagiosxi nsca[48086]: Handling the connection...
Oct 13 00:03:18 nagiosxi nsca[48086]: Time difference in packet: 0 seconds for host VERESEN-AD01
Oct 13 00:03:18 nagiosxi nsca[48086]: SERVICE CHECK -> Host Name: 'X', Service Description: 'Memory Usage', Return Code: '0', Output: ': '
Oct 13 00:03:18 nagiosxi nsca[48086]: Attempting to write to nagios command pipe
Oct 13 00:03:18 nagiosxi nsca[48086]: End of connection...
Oct 13 00:03:18 nagiosxi xinetd[52000]: EXIT: nsca status=0 pid=48086 duration=1(sec)

As you can see xintetd sees the incoming connection and passes off to NSCA, once NSCA detects the end of the connection xinetd terminates the process.

What I'm noticing fairly periodically is this:

Code: Select all

Oct 13 03:00:07 nagiosxi xinetd[52000]: START: nsca pid=48075 from=::ffff:X.X.X.X
Oct 13 03:00:07 nagiosxi nsca[48075]: Handling the connection...

This connection came in 5-6 hours ago relative to the server but that PID is still running on the Nagios server:

Code: Select all

nagios   48075  0.0  0.0   8500   768 ?        Ss   03:00   0:00 nsca -c /usr/local/nagios/etc/nsca.cfg --inetd

These PID's that don't terminate start to build up on the Nagios server and eventually cause issues unless flushed out. There doesn't seem to be a correlation to IP's and these PID's. The IP's that open an errant xinetd process work without issue when opening other xinetd processes. This is leaving me scratching my head as to what to do, below are the relevant CFG files - keep in mind when going over them, I do have a lot of windows servers reporting back to Nagios across various subnets.

Code: Select all

/etc/xinetd.d/nsca
# default: on
# description: NSCA (Nagios Service Check Acceptor)
service nsca
{
       	flags           = REUSE
        socket_type     = stream
       	wait            = no
        user            = nagios
	group		= nagios
       	server          = /usr/local/nagios/bin/nsca
        server_args     = -c /usr/local/nagios/etc/nsca.cfg --inetd
       	log_on_failure  += USERID
        disable         = no
	instances 	= unlimited
	per_source      = unlimited
}

Code: Select all

/etc/xinetd.conf
#
# This is the master xinetd configuration file. Settings in the
# default section will be inherited by all service configurations
# unless explicitly overridden in the service configuration. See
# xinetd.conf in the man pages for a more detailed explanation of
# these attributes.

defaults
{
# The next two items are intended to be a quick access place to
# temporarily enable or disable services.
#
#	enabled		=
#	disabled	=

# Define general logging characteristics.
	log_type	= SYSLOG daemon info
	log_on_failure	= HOST
	log_on_success	= PID HOST DURATION EXIT

# Define access restriction defaults
#
#	no_access	=
#	only_from	=
#	max_load	= 0
	cps		= 50 10
	instances	= 50
	per_source	= 50

# Address and networking defaults
#
#	bind		=
#	mdns		= yes
	v6only		= no

# setup environmental attributes
#
#	passenv		=
	groups		= yes
	umask		= 002

# Generally, banners are not used. This sets up their global defaults
#
#	banner		=
#	banner_fail	=
#	banner_success	=
}

includedir /etc/xinetd.d

Thanks for any input

Posted: **Thu Oct 13, 2016 2:28 pm**

You can try editing /usr/local/nagios/etc/nsca.cfg > debug=1 and restart xinetd. Wait for or induce one of these stalled connections then review the log. Let us know what you see.

Posted: **Wed Oct 19, 2016 12:27 pm**

Hi,

This setting was already enabled so it would seem that the output I'm getting from my logs are as verbose as possible. Is there perhaps a debug option elsewhere I can try enabling to help shed some light on this?

Thanks!

Posted: **Wed Oct 19, 2016 4:57 pm**

If you're keen on doing some testing, this branch should fix the issue: NagiosEnterprises/nsca

Can you install, test and report your results?

Posted: **Wed Oct 26, 2016 2:14 pm**

Thanks,

I'll give it a try and let you know the results.

Posted: **Wed Oct 26, 2016 2:21 pm**

Great, would be nice to get confirmation on this newer version.

Posted: **Thu Nov 03, 2016 10:09 am**

Hello,

After replacing the nsca binary with the updated one provided it seems the issue is still occurring. Could this simply be a capacity issue? We have passive checks coming in from thousands of different services.

Posted: **Thu Nov 03, 2016 10:29 am**

Can you attach your /usr/local/nagios/etc/nsca.cfg and /var/log/messages? Preferably a messages file which contains logs during such an incident.

It's possibility capacity related. There are a finite amount of sockets/connections a system can make, but defaults are generally pretty high. You'd really have to be hitting this with a lot of concurrent connections to hit some type of ceiling there.

Posted: **Fri Nov 04, 2016 1:05 pm**

It's quite possible it's a connection limit, we have hundreds if not 1000 service checks coming in via NSCA. I have bundled the requested information but it's too large to attach on the board, I've PM'd you a URL you can use to access the data.

Thanks!

Posted: **Fri Nov 04, 2016 2:12 pm**

It's generic, but it may be time to take a look at https://labs.nagios.com/2012/01/30/nagi ... g-disk-io/

Please let us know if you have any questions about anything in that document.

Nagios Support Forum

Xinetd NSCA issue

Xinetd NSCA issue

Re: Xinetd NSCA issue

Re: Xinetd NSCA issue

Re: Xinetd NSCA issue

Re: Xinetd NSCA issue

Re: Xinetd NSCA issue

Re: Xinetd NSCA issue

Re: Xinetd NSCA issue

Re: Xinetd NSCA issue

Re: Xinetd NSCA issue