Page 1 of 2

*BIG BUG*

Posted: Mon May 23, 2011 10:17 am
by niebais
OS: Centos 5.5
Nagios Version: 2011R1.2

Symptoms:
when clicking on the button: Open Service Problems or All Service Problems nothing appears when it shows errors.

How to duplicate:
Errors will appear in the "Service Status Summary"

I click on "Open Service Problems" or "All Service Problems" and no services appear. It is just white and blank with nothing and the page is fully loaded.

I attached the image so you can see what is happening.

Please help me get this one fixed quickly, it is serious.

Re: *BIG BUG*

Posted: Mon May 23, 2011 11:14 am
by niebais
Ok,
So here's something that jolted it back into submission, but it happened on Friday and I'm worried because it happened again today.

I did the following:
service nagios stop
service ndo2db stop
service mysqld stop
cd /usr/local/nagiosxi/scripts
./repairmysql.sh nagios *

The script came up with some DB errors, plus mentioned that I shouldn't use the -q option in some places. So here's what I modified the script to (line 44):
$cmd --safe-recover -r $t - then I reran it.

Next I killed ALL the Nagios daemons and I stopped postgresql.
Then I started up Nagios and applied the configuration. Things came back after that.

Re: *BIG BUG*

Posted: Mon May 23, 2011 11:36 am
by mguthrie
If you see this occur again, run:

tail -f /var/log/httpd/error_log

and then load that page.

We'll run some tests on our end and see if we can recreate this.

Re: *BIG BUG*

Posted: Mon May 23, 2011 11:39 am
by niebais
Thanks, I hope it doesn't happen again.

Re: *BIG BUG*

Posted: Tue May 24, 2011 9:28 am
by mguthrie
If there was corruption in one of the tables, the backend call for the xml may have timed out, returning false, and caused the blank page. So far we haven't recreated anything similar unless we stop the mysqld process. If this issue shows up again let us know though.

Re: *BIG BUG*

Posted: Fri May 27, 2011 10:06 am
by niebais
Thanks I will, so far it's been stable since I went through the "DB cleanup document" for Nagios XI. I had to truncate the logentries table and another one, but ever since then the load dropped on our server as well and I don't even know what I'm missing yet.

Re: *BIG BUG*

Posted: Fri May 27, 2011 10:23 am
by admin
Thanks for the report on this!

Re: *BIG BUG*

Posted: Fri Jun 17, 2011 9:45 am
by niebais
Ok,
This problem keeps happening once per week. Here's what I see in the logs when this problem happens:

Code: Select all

[Fri Jun 17 08:40:40 2011] [error] [client 10.35.42.161] PHP Warning:  DOMDocument::load(http://myserver/nagiosxi/backend/?cmd=getservicestatus&username=myhelpgroup&ticket=sugqs3qv6vlsd7j3o3c7990biruhr94n7jf0l8vo7ilio6ke8tih5gpmbl56t6r2) [<a href='domdocument.load'>domdocument.load</a>]: failed to open stream: HTTP request failed!  in /usr/local/nagiosxi/html/includes/components/helpdeskmap/alerts.php on line 148, referer: http://myserver/nagiosxi/includes/components/helpdeskmap/
[Fri Jun 17 08:40:40 2011] [error] [client 10.35.42.161] PHP Warning:  DOMDocument::load() [<a href='domdocument.load'>domdocument.load</a>]: I/O warning : failed to load external entity "http://myserver/nagiosxi/backend/?cmd=getservicestatus&username=myhelpgroup&ticket=sugqs3qv6vlsd7j3o3c7990biruhr94n7jf0l8vo7ilio6ke8tih5gpmbl56t6r2" in /usr/local/nagiosxi/html/includes/components/helpdeskmap/alerts.php on line 148, referer: http://myserver/nagiosxi/includes/components/helpdeskmap/
[Fri Jun 17 08:40:41 2011] [notice] child pid 8901 exit signal Segmentation fault (11)
[Fri Jun 17 08:40:41 2011] [error] [client 10.35.42.161] PHP Notice:  Undefined index:  2556 in /usr/local/nagiosxi/html/includes/components/helpdeskmap/alerts.php on line 114, referer: http://myserver/nagiosxi/includes/components/helpdeskmap/
[Fri Jun 17 08:40:41 2011] [error] [client 10.35.42.161] PHP Notice:  Undefined index:  2556 in /usr/local/nagiosxi/html/includes/components/helpdeskmap/alerts.php on line 114, referer: http://myserver/nagiosxi/includes/components/helpdeskmap/
[Fri Jun 17 08:40:42 2011] [notice] child pid 10649 exit signal Segmentation fault (11)
It appears to be related to the database getting out of sync somehow. However, it segfaults when this occurs. Not a great scenario. Any ideas on how to prevent this from ocurring in the future. It also looks like it might be related to the backendapi component that we have in place.

Re: *BIG BUG*

Posted: Fri Jun 17, 2011 10:38 am
by mguthrie
We did just release version 2011R1.4 yesterday, which has some revisions to the nagios init script. I'm wondering if that might take care of this issue.

Re: *BIG BUG*

Posted: Fri Jun 17, 2011 11:15 am
by niebais
I'll update it on this coming Tuesday. I hope it does :)