Page 1 of 4
NSCA 2.9 client problem
Posted: Wed Mar 20, 2019 10:52 am
by lhozzan
Hello.
I would like to ask you to support with upgrade. Our installation default use NSCA (server & clients) v 2.7, but this version is missing on newer CentOS7. Here is only NSCA server v 2.9.
So, after upgrade to our monitoring server (CentOS6 -> CentOS7, NSCA 2.7 -> NSCA 2.9) everything works fine with NSCA clients v2.7.
But, when i want upgrade NSCA clients from v 2.7 to 2.9, many of connections on state CLOSE-WAIT rising on Nagios server machine (see screenshot). This upgrade i do only on one server! If i do it on few servers, rising much faster. It seems, that there is a bug, where many of connections are created, but only few closed.
In fact, i am facing with problem, when NSCA server and NSCA client is on same version and dont working as i expected.
Have you any idea with possible problem? Thank you for your effort.
Re: NSCA 2.9 client problem
Posted: Thu Mar 21, 2019 4:34 pm
by cdienger
Are these both 2.9.2? I've yet to reproduce it yet, but would like you to run the following on the XI side while data is sent from the nsca client:
strace -pPID
where PID is the pid associated with the process opening port 5667. You can find this with:
netstat -nap | grep 5667
Re: NSCA 2.9 client problem
Posted: Fri Mar 22, 2019 3:43 am
by lhozzan
Are these both 2.9.2? I've yet to reproduce it yet, but would like you to run the following on the XI side while data is sent from the nsca client:
Yes, this problem occured only when NSCA server v2.9 recieve notification from NSCA client v2.9.
Actually we running with NSCA server v2.9 and NSCA clients v2.7 (on CentOS7 we install it from epel6 repository, i am sure, that this isnt good way).
I let wrote this traces to files, because we have large instalation and only minutes cost few MB logs. Share it for you in my
GoogleDrive. For compare i catch correct behavior from 2.7 clients (strace_nsca27.log) and next upgrade one client to v2.9 (strace_nsca29.log). After this i must downgrade it back to v2.7.
Thank you for your effort.
Re: NSCA 2.9 client problem
Posted: Fri Mar 22, 2019 3:24 pm
by cdienger
What was the IP of the sending client? It would probably be a good idea to get a tcpdump from both the server and client as well while try to send data from the client.
Code: Select all
yum -y install tcpdump
tcpdump -s 0 -i any port 5667 -w output.pcap
Start a tcpdump on both machines, run the send_nsca command, then use CTRL+C to stop it and provide the pcap files from both machines.
Re: NSCA 2.9 client problem
Posted: Mon Mar 25, 2019 6:57 am
by lhozzan
Hello.
Server IP is 45.33.80.18 and client IP is 178.79.165.168. Add requested files to my
GoogleDrive.
Is there anything else, what can i do for you? Where do you assume the problem?
Thank you for your effort.
EDIT:
This i spot in server syslog:
Code: Select all
Mar 24 20:49:07 localhost nsca[13344]: Network server accept failure (24: Too many open files)
Yes, NSCA stop accepting notifications from clients and Nagios sent us huge amount of alert messages. After restart NSCA looks OK.
I spot this problem on testing environment, where default file descriptors was 1024. In production we have rised this value to 2048. It seems, that isnt enought. Rising again to 4096, hope that help.
Why NSCA server v2.9 open too many files?
Re: NSCA 2.9 client problem
Posted: Mon Mar 25, 2019 3:51 pm
by cdienger
Why NSCA server v2.9 open too many files?
I'm not sure at this point. Please keep an eye on the system though and let us know if you need to increase it further.
Re: NSCA 2.9 client problem
Posted: Tue Mar 26, 2019 4:18 am
by lhozzan
Hello.
Please keep an eye on the system though and let us know if you need to increase it further.
OK, we have agreed.
Have you idea, why NSCA client v2.9 rising many of connections on state CLOSE-WAIT? Or have you estimated time of solution? Need any information from our side? Is anything you need from our side to fastering solution?
Thank you for your effort.
Re: NSCA 2.9 client problem
Posted: Tue Mar 26, 2019 2:28 pm
by cdienger
I was able to get similar results if a host makes a lot of connections. To work around it I increased the per_host option in the /etc/xinetd.d/nsca:
Code: Select all
service nsca
{
per_source = 999
flags = REUSE
socket_type = stream
wait = no
user = nagios
group = nagios
server = /usr/local/nagios/bin/nsca
server_args = -c /usr/local/nagios/etc/nsca.cfg --inetd
log_on_failure += USERID
disable = no
only_from = 192.168.55.4
}
Make sure to restart xinetd after making the change:
service xinetd restart
Re: NSCA 2.9 client problem
Posted: Wed Mar 27, 2019 5:34 am
by lhozzan
Hello.
Thank you for workaround. We have implemented NSCA as a standalone service, not a part of xinetd.
OK, i attempt to do this and let you know later.
Many thanks for your support.
Re: NSCA 2.9 client problem
Posted: Wed Mar 27, 2019 12:11 pm
by cdienger
Sounds good. Keep us posted!