Page 1 of 1

[Nagios-devel] Bug in ndoutils / ndo2db

Posted: Tue Dec 16, 2008 4:05 pm
by Guest
This is a multi-part message in MIME format.

------_=_NextPart_001_01C95F98.2B416559
Content-Type: multipart/alternative;
boundary="----_=_NextPart_002_01C95F98.2B416559"


------_=_NextPart_002_01C95F98.2B416559
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Hello List,

I believe, I found a bug in ndo2db.c in ndoutils.

Problem is:=20
ndo2db forks a child for each accepted connection.
The Parent process calls waitpid _once_ for each received SIGCHLD.

But *nix does not guarantee the delievery of every SIGCHLD if multiple =
Signals of the same type occur.
--> increasing number of zombies.

Usually this no problem because the number of forks and waits is quite =
small and the probability of lost SIGCHLDs is even smaller (practically =
zero).
In our setup clients close the connection frequently and we have over =
500 Nagios instances reporting to one central ndo2db, which raises the =
number of forks, waits and losses of SIGCHLDs to a significant level.

This patch repeats the call to waitpid until no finished children =
(zombies) are left.

Regards,
Joey5337 / Tilo Renz

------_=_NextPart_002_01C95F98.2B416559
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable






Bug in ndoutils / ndo2db




Hello List,

I believe, I found a bug in ndo2db.c in ndoutils.

Problem is:
ndo2db forks a child for each accepted connection.
The Parent process calls waitpid _once_ for each received SIGCHLD.

But *nix does not guarantee the delievery of every SIGCHLD if multiple =
Signals of the same type occur.
--> increasing number of zombies.

Usually this no problem because the number of forks and waits is quite =
small and the probability of lost SIGCHLDs is even smaller (practically =
zero).
In our setup clients close the connection frequently and we have over =
500 Nagios instances reporting to one central ndo2db, which raises the =
number of forks, waits and losses of SIGCHLDs to a significant =
level.

This patch repeats the call to waitpid until no finished children =
(zombies) are left.

Regards,
Joey5337 / Tilo Renz





------_=_NextPart_002_01C95F98.2B416559--

------_=_NextPart_001_01C95F98.2B416559
Content-Type: text/x-patch;
name="Lost_SIGCHLD.diff"
Content-Transfer-Encoding: base64
Content-Description: Lost_SIGCHLD.diff
Content-Disposition: attachment;
filename="Lost_SIGCHLD.diff"

ZGlmZiAtVSAzIC1IIC1FIC1kIC1yIC0tIG5kb3V0aWxzLTEuNGI3L3NyYy9uZG8yZGIuYyBuZG91
dGlscy0xLjRiNy1Mb3N0X1NJR0NITEQvc3JjL25kbzJkYi5jCi0tLSBuZG91dGlscy0xLjRiNy9z
cmMvbmRvMmRiLmMJMjAwNy0xMC0zMSAxOToxNzowNS4wMDAwMDAwMDAgKzAxMDAKKysrIG5kb3V0
aWxzLTEuNGI3LUxvc3RfU0lHQ0hMRC9zcmMvbmRvMmRiLmMJMjAwOC0xMi0xNiAxNjo1Mjo0MS4w
MDAwMDAwMDAgKzAxMDAKQEAgLTYyMiwxMCArNjIyLDE1IEBACiAKIAkvKiBjbGVhbnVwIGNoaWxk
cmVuIHRoYXQgZXhpdCwgc28gd2UgZG9uJ3QgaGF2ZSB6b21iaWVzICovCiAJaWYoc2lnPT1TSUdD
SExEKXsKLQkJd2FpdHBpZCgtMSxOVUxMLFdOT0hBTkcpOwotCQlyZXR1cm47CisJCXBpZF90IGNo
aWxkcGlkID0gMDsKKwkJLyogcGVyaGFwcyBhIFNJR0NITEQgZ290IGxvc3QgLSByZXBlYXQgdW50
aWwgbm8gem9tYmllcyBsZWZ0ICovCisJCWRvIHsKKwkJCWNoaWxkcGlkID0gd2FpdHBpZCgtMSxO
VUxMLFdOT0hBTkcpOyAKKwkJCX0gd2hpbGUgKGNoaWxkcGlkID4gMCk7CisJCQlyZXR1cm47CiAJ
ICAgICAgICB9CiAKKwkvKiBwcm9ncmFtIHRlcm1pbmF0aW9uIC0gY2F1Z3RoIFNJR1FVSVQsIFNJ
R1RFUk0sIFNJR0lOVCwgU0lHU0VHViBvciBTSUdGUEUgKi8KIAkvKiBjbGVhbnVwIHRoZSBzb2Nr
ZXQgKi8KIAluZG8yZGJfY2xlYW51cF9zb2NrZXQoKTsKIAo=

------_=_NextPart_001_01C95F98.2B416559--





This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]