[Nagios-devel] Possible patch to cure CGI's not finding data for

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

[Nagios-devel] Possible patch to cure CGI's not finding data for

Post by Guest »

--_002_E675211DF23BC34BBD0B455DB3AF4087275CF1F6MBX02ldschurcho_
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Our status.dat file is about 37MB. We occasionally will find that valid ser=
vices are not showing up from a status.cgi or extinfo.cgi page. This result=
s in people getting confused or they know the problem and refresh the page =
to get the REAL data they need. Since the status.dat file is written to a t=
emp file which is moved into place once the file is closed, it should not h=
ave partial contents. But, in our case at least, we were seeing results fro=
m the CGI's as if the file were only partially written. The problem with th=
e current implementation is that it is possible that the file gets closed, =
but the contents are not completely flushed to disk when it is moved into r=
eplace the old file. In testing this phenomenon I took a service from the e=
nd of the status.dat file and looked at a CGI page as quickly as I could fo=
r many iterations. I found that about every 30th time (my average) the page=
acted as if the service didn't exist.

That seems to be quite a high number of instances for the page to fail, so =
I added an fflush() before the fclose() and an fsync() right after the fclo=
se(). This virtually guarantees that the file is completely written before =
the temp file is moved in to replace the outdated file. After making the ch=
ange I was never able to get a failed page in more than 200 iterations of v=
iewing the same page.

The other files that could be a problem (and for completeness sake) are ret=
ention.dat, comments.dat and downtime.dat. So I applied the same principle =
change to each of these.

I'm attaching a patch file that was done against our 2.7 version. I looked =
in the 3.0 code and it was not substantially different. The line numbers ar=
e different, though the context is the same, but the patch doesn't work on =
3.0. I'm quite sure that a similar fix will work properly for 3.0.

If anyone else is having this problem, you might want to try this patch and=
see if it fixes your problems as well. It is probably a good candidate for=
a bug fix if it is found to be a valuable modification. I don't know if sm=
aller installations of Nagios are having any issues like this or not, but I=
suspect it is possible since actually flushing to the disk is handled by t=
he OS on it's own timetable unless forced with fsync().

If you try this modification, please let me know of any issues you have.

Cary Petterborg


NOTICE: This email message is for the sole use of the intended recipient(s=
) and may contain confidential and privileged information. Any unauthorized=
review, use, disclosure or distribution is prohibited. If you are not the =
intended recipient, please contact the sender by reply email and destroy al=
l copies of the original message.



--_002_E675211DF23BC34BBD0B455DB3AF4087275CF1F6MBX02ldschurcho_
Content-Type: text/x-patch; name="flush-sync-patch.diff"
Content-Description: flush-sync-patch.diff
Content-Disposition: attachment; filename="flush-sync-patch.diff"; size=2594;
creation-date="Wed, 15 Jul 2009 17:19:56 GMT";
modification-date="Wed, 15 Jul 2009 17:19:56 GMT"
Content-Transfer-Encoding: base64

SW5kZXg6IHhkYXRhL3hyZGRlZmF1bHQuYwo9PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSB4ZGF0YS94cmRkZWZhdWx0
LmMJKHJldmlzaW9uIDUwOSkKKysrIHhkYXRhL3hyZGRlZmF1bHQuYwkocmV2aXNpb24gNTEwKQpA
QCAtODEsOCArODEsOCBAQAogCWNoYXIgKmlucHV0PU5VTEw7CiAJY2hhciAqdGVtcF9wdHI7CiAJ
bW1hcGZpbGUgKnRoZWZpbGU7Ci0JCQkJCQkJICAgICAgCiAKKwogCS8qIGluaXRpYWxpemUgdGhl
IGxvY2F0aW9uIG9mIHRoZSByZXRlbnRpb24gZmlsZSAqLwogCXN0cm5jcHkoeHJkZGVmYXVsdF9y
ZXRlbnRpb25fZmlsZSxERUZBVUxUX1JFVEVOVElPTl9GSUxFLHNpemVvZih4cmRkZWZhdWx0X3Jl
dGVudGlvbl9maWxlKS0xKTsKIAlzdHJuY3B5KHhyZGRlZmF1bHRfdGVtcF9maWxlLERFRkFVTFRf
VEVNUF9GSUxFLHNpemVvZih4cmRkZWZhdWx0X3RlbXBfZmlsZSktMSk7CkBAIC0zNDQsNyArMzQ0
LDkgQEAKIAkJZnByaW50ZihmcCwiXHR9XG5cbiIpOwogCSAgICAgICAgfQogCisJZmZsdXNoKGZw
KTsKIAlmY2xvc2UoZnApOworCWZzeW5jKGZkKTsKIAogCS8qIG1vdmUgdGhlIHRlbXAgZmlsZS

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked