Nagios Support Forum

Posted: **Tue Jul 29, 2014 3:34 pm**

We are having a critical problem with our production server and are looking for your help.

The Monitoring engine status bogged down gradually got worse until this a.m. when it stopped scheduling host and service checks altogether. It looks to be a NagiosXI problem when we login to core the checks are running. At this time, the top command show that the ndo2db was taking up 98%-100% of the cpu.

We found several support cases in the NagiosXi support forum that reported a similar problem and tried restarting httpd, nagios, mysqld, postgresql with no success.

As a disclaimer, we upgraded yesterday to the Interface Table v 0.05-1 and pnp4nagios 0.6.21. (unsupported we know, but we have installed and run successfully on 2 test instances)

thanks,
Penny Karr | IT Infrastructure Monitoring
Harvard Vanguard Medical Associates, an Affiliate of Atrius Health
254 Second Avenue | Needham, MA 02494
P (781) 292-1853 | F (781 292-1980 | http://www.harvardvanguard.org
Email: [email protected]

Posted: **Wed Jul 30, 2014 10:41 am**

Thanks for the heads up, on the changes, I doubt they have much if any effect. This is something that we have been made aware of recently, as you mentioned with the other posts. Did you have a chance to restart ndo2db, I think that was the only service I did not see mentioned as restarted and seems to be the very temporary resolution.

Posted: **Wed Jul 30, 2014 11:00 am**

Hi,
Yes we have restarted ndo2db too. I just did it again for good measure but it did not work. Monitoring engine is still not running. Other ideas, please.

Posted: **Wed Jul 30, 2014 5:05 pm**

Hi Penny, Let's take a look at a few log files and things to make sure that even if XI is displaying things incorrectly, core is running and checking things properly. In the process we can pickup why the separation is happening. Could you send an email to [email protected] so that I can take the ticket and work with you on it more quickly? Please send the following in the /tmp/support.log file:

Code: Select all

tail -n 250 /usr/local/nagios/var/nagios.log >> /tmp/support.log
tail -n 250 /var/log/mysqld.log >> /tmp/support.log
tail -n 250 /var/log/httpd/error_log >> /tmp/support.log
tail -n 250 /var/log/httpd/access_log >> /tmp/support.log
tail -n 250 /var/log/httpd/ssl_error_log >> /tmp/support.log
tail -n 250 /var/log/httpd/ssl_access_log >> /tmp/support.log

Posted: **Thu Jul 31, 2014 8:11 am**

Hi Spenser,
I've opened a ticket and included the support .log file with it.

As an update, we have rebooted the server then yesterday I applied a patched version of utils-backend.inc that was recommended for a similar issue by tmcdonald. Unfortunately, it didn't work.

Please let me know if you need anything else.

thanks,
Penny

Nagios Support Forum

help! monitoring engine stopped can't restart

help! monitoring engine stopped can't restart

Re: help! monitoring engine stopped can't restart

Re: help! monitoring engine stopped can't restart

Re: help! monitoring engine stopped can't restart

Re: help! monitoring engine stopped can't restart