[Nagios-devel] Possible patch to cure CGI's not finding data for
Posted: Wed Jul 15, 2009 10:37 pm
--_002_E675211DF23BC34BBD0B455DB3AF4087275CF1F6MBX02ldschurcho_
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Our status.dat file is about 37MB. We occasionally will find that valid ser=
vices are not showing up from a status.cgi or extinfo.cgi page. This result=
s in people getting confused or they know the problem and refresh the page =
to get the REAL data they need. Since the status.dat file is written to a t=
emp file which is moved into place once the file is closed, it should not h=
ave partial contents. But, in our case at least, we were seeing results fro=
m the CGI's as if the file were only partially written. The problem with th=
e current implementation is that it is possible that the file gets closed, =
but the contents are not completely flushed to disk when it is moved into r=
eplace the old file. In testing this phenomenon I took a service from the e=
nd of the status.dat file and looked at a CGI page as quickly as I could fo=
r many iterations. I found that about every 30th time (my average) the page=
acted as if the service didn't exist.
That seems to be quite a high number of instances for the page to fail, so =
I added an fflush() before the fclose() and an fsync() right after the fclo=
se(). This virtually guarantees that the file is completely written before =
the temp file is moved in to replace the outdated file. After making the ch=
ange I was never able to get a failed page in more than 200 iterations of v=
iewing the same page.
The other files that could be a problem (and for completeness sake) are ret=
ention.dat, comments.dat and downtime.dat. So I applied the same principle =
change to each of these.
I'm attaching a patch file that was done against our 2.7 version. I looked =
in the 3.0 code and it was not substantially different. The line numbers ar=
e different, though the context is the same, but the patch doesn't work on =
3.0. I'm quite sure that a similar fix will work properly for 3.0.
If anyone else is having this problem, you might want to try this patch and=
see if it fixes your problems as well. It is probably a good candidate for=
a bug fix if it is found to be a valuable modification. I don't know if sm=
aller installations of Nagios are having any issues like this or not, but I=
suspect it is possible since actually flushing to the disk is handled by t=
he OS on it's own timetable unless forced with fsync().
If you try this modification, please let me know of any issues you have.
Cary Petterborg
NOTICE: This email message is for the sole use of the intended recipient(s=
) and may contain confidential and privileged information. Any unauthorized=
review, use, disclosure or distribution is prohibited. If you are not the =
intended recipient, please contact the sender by reply email and destroy al=
l copies of the original message.
--_002_E675211DF23BC34BBD0B455DB3AF4087275CF1F6MBX02ldschurcho_
Content-Type: text/x-patch; name="flush-sync-patch.diff"
Content-Description: flush-sync-patch.diff
Content-Disposition: attachment; filename="flush-sync-patch.diff"; size=2594;
creation-date="Wed, 15 Jul 2009 17:19:56 GMT";
modification-date="Wed, 15 Jul 2009 17:19:56 GMT"
Content-Transfer-Encoding: base64
SW5kZXg6IHhkYXRhL3hyZGRlZmF1bHQuYwo9PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSB4ZGF0YS94cmRkZWZhdWx0
LmMJKHJldmlzaW9uIDUwOSkKKysrIHhkYXRhL3hyZGRlZmF1bHQuYwkocmV2aXNpb24gNTEwKQpA
QCAtODEsOCArODEsOCBAQAogCWNoYXIgKmlucHV0PU5VTEw7CiAJY2hhciAqdGVtcF9wdHI7CiAJ
bW1hcGZpbGUgKnRoZWZpbGU7Ci0JCQkJCQkJICAgICAgCiAKKwogCS8qIGluaXRpYWxpemUgdGhl
IGxvY2F0aW9uIG9mIHRoZSByZXRlbnRpb24gZmlsZSAqLwogCXN0cm5jcHkoeHJkZGVmYXVsdF9y
ZXRlbnRpb25fZmlsZSxERUZBVUxUX1JFVEVOVElPTl9GSUxFLHNpemVvZih4cmRkZWZhdWx0X3Jl
dGVudGlvbl9maWxlKS0xKTsKIAlzdHJuY3B5KHhyZGRlZmF1bHRfdGVtcF9maWxlLERFRkFVTFRf
VEVNUF9GSUxFLHNpemVvZih4cmRkZWZhdWx0X3RlbXBfZmlsZSktMSk7CkBAIC0zNDQsNyArMzQ0
LDkgQEAKIAkJZnByaW50ZihmcCwiXHR9XG5cbiIpOwogCSAgICAgICAgfQogCisJZmZsdXNoKGZw
KTsKIAlmY2xvc2UoZnApOworCWZzeW5jKGZkKTsKIAogCS8qIG1vdmUgdGhlIHRlbXAgZmlsZS
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Our status.dat file is about 37MB. We occasionally will find that valid ser=
vices are not showing up from a status.cgi or extinfo.cgi page. This result=
s in people getting confused or they know the problem and refresh the page =
to get the REAL data they need. Since the status.dat file is written to a t=
emp file which is moved into place once the file is closed, it should not h=
ave partial contents. But, in our case at least, we were seeing results fro=
m the CGI's as if the file were only partially written. The problem with th=
e current implementation is that it is possible that the file gets closed, =
but the contents are not completely flushed to disk when it is moved into r=
eplace the old file. In testing this phenomenon I took a service from the e=
nd of the status.dat file and looked at a CGI page as quickly as I could fo=
r many iterations. I found that about every 30th time (my average) the page=
acted as if the service didn't exist.
That seems to be quite a high number of instances for the page to fail, so =
I added an fflush() before the fclose() and an fsync() right after the fclo=
se(). This virtually guarantees that the file is completely written before =
the temp file is moved in to replace the outdated file. After making the ch=
ange I was never able to get a failed page in more than 200 iterations of v=
iewing the same page.
The other files that could be a problem (and for completeness sake) are ret=
ention.dat, comments.dat and downtime.dat. So I applied the same principle =
change to each of these.
I'm attaching a patch file that was done against our 2.7 version. I looked =
in the 3.0 code and it was not substantially different. The line numbers ar=
e different, though the context is the same, but the patch doesn't work on =
3.0. I'm quite sure that a similar fix will work properly for 3.0.
If anyone else is having this problem, you might want to try this patch and=
see if it fixes your problems as well. It is probably a good candidate for=
a bug fix if it is found to be a valuable modification. I don't know if sm=
aller installations of Nagios are having any issues like this or not, but I=
suspect it is possible since actually flushing to the disk is handled by t=
he OS on it's own timetable unless forced with fsync().
If you try this modification, please let me know of any issues you have.
Cary Petterborg
NOTICE: This email message is for the sole use of the intended recipient(s=
) and may contain confidential and privileged information. Any unauthorized=
review, use, disclosure or distribution is prohibited. If you are not the =
intended recipient, please contact the sender by reply email and destroy al=
l copies of the original message.
--_002_E675211DF23BC34BBD0B455DB3AF4087275CF1F6MBX02ldschurcho_
Content-Type: text/x-patch; name="flush-sync-patch.diff"
Content-Description: flush-sync-patch.diff
Content-Disposition: attachment; filename="flush-sync-patch.diff"; size=2594;
creation-date="Wed, 15 Jul 2009 17:19:56 GMT";
modification-date="Wed, 15 Jul 2009 17:19:56 GMT"
Content-Transfer-Encoding: base64
SW5kZXg6IHhkYXRhL3hyZGRlZmF1bHQuYwo9PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSB4ZGF0YS94cmRkZWZhdWx0
LmMJKHJldmlzaW9uIDUwOSkKKysrIHhkYXRhL3hyZGRlZmF1bHQuYwkocmV2aXNpb24gNTEwKQpA
QCAtODEsOCArODEsOCBAQAogCWNoYXIgKmlucHV0PU5VTEw7CiAJY2hhciAqdGVtcF9wdHI7CiAJ
bW1hcGZpbGUgKnRoZWZpbGU7Ci0JCQkJCQkJICAgICAgCiAKKwogCS8qIGluaXRpYWxpemUgdGhl
IGxvY2F0aW9uIG9mIHRoZSByZXRlbnRpb24gZmlsZSAqLwogCXN0cm5jcHkoeHJkZGVmYXVsdF9y
ZXRlbnRpb25fZmlsZSxERUZBVUxUX1JFVEVOVElPTl9GSUxFLHNpemVvZih4cmRkZWZhdWx0X3Jl
dGVudGlvbl9maWxlKS0xKTsKIAlzdHJuY3B5KHhyZGRlZmF1bHRfdGVtcF9maWxlLERFRkFVTFRf
VEVNUF9GSUxFLHNpemVvZih4cmRkZWZhdWx0X3RlbXBfZmlsZSktMSk7CkBAIC0zNDQsNyArMzQ0
LDkgQEAKIAkJZnByaW50ZihmcCwiXHR9XG5cbiIpOwogCSAgICAgICAgfQogCisJZmZsdXNoKGZw
KTsKIAlmY2xvc2UoZnApOworCWZzeW5jKGZkKTsKIAogCS8qIG1vdmUgdGhlIHRlbXAgZmlsZS
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]