NSCA close/POLLNVAL/accept bug - patch attached

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
mib
Posts: 6
Joined: Sun Apr 13, 2014 11:58 pm

NSCA close/POLLNVAL/accept bug - patch attached

Post by mib »

nsca-2.7.2-uom.patch
(1.28 KiB) Downloaded 224 times
I believe I've found a significant bug in the nsca daemon. Under heavy connection load we are seeing this hang nsca (forever) every few days, and more frequently freeze until the remote disconnects.

In summary: nsca closes client connection file descriptors on error or eof and relies on poll returning POLLNVAL to clean up after them (in particular to remove their handlers and take them out of the poll set). If in the same loop accept() returns the same FD as a recently closed FD the wrong handler will be run on it.

This can result in calling recv() on an FD with no data to read and O_NONBLOCK not set.

Awiddershiem's commit to nsca-aw here here works around this bug by setting O_NONBLOCK earlier, but that code is quite divergent from the nsca core now.

Attached (I hope) is a minimal-change patch that fixes the behavior. It's not the best patch, which would be to redesign nsca to manage the handlers and remove the one-shot behavior where the loop clears them each time (which results in amusing comments such as "DO NOT REMOVE! 01/29/2007 single process daemon will fail if this is removed").

This patch is against 2.7.2 but it should apply cleanly to the current r2763. It includes a fix to another problem I don't think can actually be triggered, where the poll loop could theoretically run a handler for a just-added FD because npfds++ is run inside the loop which tests against <npfds.

I'm happy to go through the bug code path in more detail if anyone wants, it was quite tricky to track down.
Last edited by mib on Tue Sep 30, 2014 7:09 pm, edited 1 time in total.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: NSCA close/POLLNVAL/accept bug - patch attached

Post by tmcdonald »

Thanks for the patch! While the support forums get seen a lot by the support staff, our developers are busy high up on a snow-covered mountain peak in a monastery working on their code and they don't often visit the forums. A great place to submit bug reports (or better yet, bug reports with patches included) would be http://tracker.nagios.com for our Nagios XI product, and http://tracker.nagios.org for our Nagios Core product.
Former Nagios employee
mib
Posts: 6
Joined: Sun Apr 13, 2014 11:58 pm

Re: NSCA close/POLLNVAL/accept bug - patch attached

Post by mib »

Is NSCA part of Nagios Core? When I downloaded the Nagios Core tarball it wasn't included.
mib
Posts: 6
Joined: Sun Apr 13, 2014 11:58 pm

Re: NSCA close/POLLNVAL/accept bug - patch attached

Post by mib »

I have logged this on the tracker as:

http://tracker.nagios.org/view.php?id=644
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: NSCA close/POLLNVAL/accept bug - patch attached

Post by sreinhardt »

Thanks for the tracker bug! No core and nsca are separate products and are not inclusive of each other. They do work together, but one is not needed for the other to operate in any way.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
mib
Posts: 6
Joined: Sun Apr 13, 2014 11:58 pm

Re: NSCA close/POLLNVAL/accept bug - patch attached

Post by mib »

My question is really:

Who, if anyone, is maintaining NSCA and would fix this bug?

Given that it appears the commercial Nagios XI doesn't use NSCA, and NSCA is not part of nagios-core. Two-year-old bug reports on the NSCA part of the tracker are still in "NEW" state.
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: NSCA close/POLLNVAL/accept bug - patch attached

Post by sreinhardt »

Actually, we do use nsca in several of our commercial products, and do maintain it as issues become apparent. Most likely either myself or one of the other C devs will take a look and modify or apply the patch as needed. I have some plugins work that is going to take precedence, but I will absolutely take a look at this patch and implementation of it if no one else beats me to it.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
mib
Posts: 6
Joined: Sun Apr 13, 2014 11:58 pm

Re: NSCA close/POLLNVAL/accept bug - patch attached

Post by mib »

Fantastic, thanks for your prompt responses and assistance.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: NSCA close/POLLNVAL/accept bug - patch attached

Post by abrist »

No problem. We actually maintain the github repo for nsca: https://github.com/NagiosEnterprises/nsca
It would be a great help if you could submit your patch with your explanation there.
Thanks!
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
mib
Posts: 6
Joined: Sun Apr 13, 2014 11:58 pm

Re: NSCA close/POLLNVAL/accept bug - patch attached

Post by mib »

The minimal work-around patch is in this commit:

https://github.com/mbattersby/nsca/comm ... 468d03f514

for which I also sent a pull request.
Locked