Nagios Support Forum

Posted: **Wed Mar 20, 2019 10:52 am**

Hello.

I would like to ask you to support with upgrade. Our installation default use NSCA (server & clients) v 2.7, but this version is missing on newer CentOS7. Here is only NSCA server v 2.9.

So, after upgrade to our monitoring server (CentOS6 -> CentOS7, NSCA 2.7 -> NSCA 2.9) everything works fine with NSCA clients v2.7.

But, when i want upgrade NSCA clients from v 2.7 to 2.9, many of connections on state CLOSE-WAIT rising on Nagios server machine (see screenshot). This upgrade i do only on one server! If i do it on few servers, rising much faster. It seems, that there is a bug, where many of connections are created, but only few closed.

In fact, i am facing with problem, when NSCA server and NSCA client is on same version and dont working as i expected.

Have you any idea with possible problem? Thank you for your effort.

Posted: **Thu Mar 21, 2019 4:34 pm**

Are these both 2.9.2? I've yet to reproduce it yet, but would like you to run the following on the XI side while data is sent from the nsca client:

strace -pPID

where PID is the pid associated with the process opening port 5667. You can find this with:

netstat -nap | grep 5667

Posted: **Fri Mar 22, 2019 3:43 am**

Are these both 2.9.2? I've yet to reproduce it yet, but would like you to run the following on the XI side while data is sent from the nsca client:

Yes, this problem occured only when NSCA server v2.9 recieve notification from NSCA client v2.9.
Actually we running with NSCA server v2.9 and NSCA clients v2.7 (on CentOS7 we install it from epel6 repository, i am sure, that this isnt good way).

I let wrote this traces to files, because we have large instalation and only minutes cost few MB logs. Share it for you in my GoogleDrive. For compare i catch correct behavior from 2.7 clients (strace_nsca27.log) and next upgrade one client to v2.9 (strace_nsca29.log). After this i must downgrade it back to v2.7.

Thank you for your effort.

Posted: **Fri Mar 22, 2019 3:24 pm**

What was the IP of the sending client? It would probably be a good idea to get a tcpdump from both the server and client as well while try to send data from the client.

Code: Select all

yum -y install tcpdump
tcpdump -s 0 -i any port 5667 -w output.pcap

Start a tcpdump on both machines, run the send_nsca command, then use CTRL+C to stop it and provide the pcap files from both machines.

Posted: **Mon Mar 25, 2019 6:57 am**

Hello.
Server IP is 45.33.80.18 and client IP is 178.79.165.168. Add requested files to my GoogleDrive.
Is there anything else, what can i do for you? Where do you assume the problem?
Thank you for your effort.

EDIT:
This i spot in server syslog:

Code: Select all

Mar 24 20:49:07 localhost nsca[13344]: Network server accept failure (24: Too many open files)

Yes, NSCA stop accepting notifications from clients and Nagios sent us huge amount of alert messages. After restart NSCA looks OK.
I spot this problem on testing environment, where default file descriptors was 1024. In production we have rised this value to 2048. It seems, that isnt enought. Rising again to 4096, hope that help.

Why NSCA server v2.9 open too many files?

Posted: **Mon Mar 25, 2019 3:51 pm**

Why NSCA server v2.9 open too many files?

I'm not sure at this point. Please keep an eye on the system though and let us know if you need to increase it further.

Posted: **Tue Mar 26, 2019 4:18 am**

Hello.

Please keep an eye on the system though and let us know if you need to increase it further.

OK, we have agreed.

Have you idea, why NSCA client v2.9 rising many of connections on state CLOSE-WAIT? Or have you estimated time of solution? Need any information from our side? Is anything you need from our side to fastering solution?

Thank you for your effort.

Posted: **Tue Mar 26, 2019 2:28 pm**

I was able to get similar results if a host makes a lot of connections. To work around it I increased the per_host option in the /etc/xinetd.d/nsca:

Code: Select all

service nsca
{
        per_source      = 999
        flags           = REUSE
        socket_type     = stream
        wait            = no
        user            = nagios
        group           = nagios
        server          = /usr/local/nagios/bin/nsca
        server_args     = -c /usr/local/nagios/etc/nsca.cfg --inetd
        log_on_failure  += USERID
        disable         = no
        only_from       = 192.168.55.4
}

Make sure to restart xinetd after making the change:

service xinetd restart

Posted: **Wed Mar 27, 2019 5:34 am**

Hello.

Thank you for workaround. We have implemented NSCA as a standalone service, not a part of xinetd.

OK, i attempt to do this and let you know later.

Many thanks for your support.

Posted: **Wed Mar 27, 2019 12:11 pm**

Sounds good. Keep us posted!

Nagios Support Forum

NSCA 2.9 client problem

NSCA 2.9 client problem

Re: NSCA 2.9 client problem

Re: NSCA 2.9 client problem

Re: NSCA 2.9 client problem

Re: NSCA 2.9 client problem

Re: NSCA 2.9 client problem

Re: NSCA 2.9 client problem

Re: NSCA 2.9 client problem

Re: NSCA 2.9 client problem

Re: NSCA 2.9 client problem