Xinetd NSCA issue
Posted: Thu Oct 13, 2016 11:15 am
Hello,
I have a number of windows servers running nsclient that report their status back to Nagios via NSCA. Xinetd is handling NSCA connections on the Nagios side and what seems to be happen is a few process that handle incoming connections become 'stuck'.
Here's the log messages of a normal connection:
As you can see xintetd sees the incoming connection and passes off to NSCA, once NSCA detects the end of the connection xinetd terminates the process.
What I'm noticing fairly periodically is this:
This connection came in 5-6 hours ago relative to the server but that PID is still running on the Nagios server:
These PID's that don't terminate start to build up on the Nagios server and eventually cause issues unless flushed out. There doesn't seem to be a correlation to IP's and these PID's. The IP's that open an errant xinetd process work without issue when opening other xinetd processes. This is leaving me scratching my head as to what to do, below are the relevant CFG files - keep in mind when going over them, I do have a lot of windows servers reporting back to Nagios across various subnets.
Thanks for any input 
I have a number of windows servers running nsclient that report their status back to Nagios via NSCA. Xinetd is handling NSCA connections on the Nagios side and what seems to be happen is a few process that handle incoming connections become 'stuck'.
Here's the log messages of a normal connection:
Code: Select all
Oct 13 00:03:17 nagiosxi xinetd[52000]: START: nsca pid=48086 from=::ffff:X.X.X.X
Oct 13 00:03:17 nagiosxi nsca[48086]: Handling the connection...
Oct 13 00:03:18 nagiosxi nsca[48086]: Time difference in packet: 0 seconds for host VERESEN-AD01
Oct 13 00:03:18 nagiosxi nsca[48086]: SERVICE CHECK -> Host Name: 'X', Service Description: 'Memory Usage', Return Code: '0', Output: ': '
Oct 13 00:03:18 nagiosxi nsca[48086]: Attempting to write to nagios command pipe
Oct 13 00:03:18 nagiosxi nsca[48086]: End of connection...
Oct 13 00:03:18 nagiosxi xinetd[52000]: EXIT: nsca status=0 pid=48086 duration=1(sec)
What I'm noticing fairly periodically is this:
Code: Select all
Oct 13 03:00:07 nagiosxi xinetd[52000]: START: nsca pid=48075 from=::ffff:X.X.X.X
Oct 13 03:00:07 nagiosxi nsca[48075]: Handling the connection...
Code: Select all
nagios 48075 0.0 0.0 8500 768 ? Ss 03:00 0:00 nsca -c /usr/local/nagios/etc/nsca.cfg --inetd
Code: Select all
/etc/xinetd.d/nsca
# default: on
# description: NSCA (Nagios Service Check Acceptor)
service nsca
{
flags = REUSE
socket_type = stream
wait = no
user = nagios
group = nagios
server = /usr/local/nagios/bin/nsca
server_args = -c /usr/local/nagios/etc/nsca.cfg --inetd
log_on_failure += USERID
disable = no
instances = unlimited
per_source = unlimited
}
Code: Select all
/etc/xinetd.conf
#
# This is the master xinetd configuration file. Settings in the
# default section will be inherited by all service configurations
# unless explicitly overridden in the service configuration. See
# xinetd.conf in the man pages for a more detailed explanation of
# these attributes.
defaults
{
# The next two items are intended to be a quick access place to
# temporarily enable or disable services.
#
# enabled =
# disabled =
# Define general logging characteristics.
log_type = SYSLOG daemon info
log_on_failure = HOST
log_on_success = PID HOST DURATION EXIT
# Define access restriction defaults
#
# no_access =
# only_from =
# max_load = 0
cps = 50 10
instances = 50
per_source = 50
# Address and networking defaults
#
# bind =
# mdns = yes
v6only = no
# setup environmental attributes
#
# passenv =
groups = yes
umask = 002
# Generally, banners are not used. This sets up their global defaults
#
# banner =
# banner_fail =
# banner_success =
}
includedir /etc/xinetd.d