Page 1 of 1

Monitoring Engine Stops and Will Not Start

Posted: Mon Aug 18, 2014 9:52 am
by mrochelle
I'm posting this event for any feedback and/or recommendations. I've experienced the problem of the monitoring engine stopping with the following symptom 4 or 5 times across 2 of my nagiosXI servers. The problem event happens 10 - 15 minutes right after an apply configuration update:
Monitoring Engine Problem.JPG
Also, the process ndo2db will be at 99% CPU.
All attempts to restart the monitoring engine and the ndo2db will end up with the above symptom.
To recover, I restore from a recent backup prior to the configuration change. Unfortunately last night, I had table corruption with my local backups and had to restore from my latest VM image.
I've research the forum for a fix or recommendations but the listings that I've found with similar symptoms were usually resolved outside the forum.
Comments or thoughts are welcome from others that have experienced or resolved this problem permanently.
Marcus

Code: Select all

System:
Nagios XI Version : 2014R1.4
 nagprod01.<domain>.com 2.6.32-279.11.1.el6.x86_64 x86_64
 CentOS release 6.3 (Final)
 Gnome is not installed

Apache Information
PHP Version: 5.3.3
 Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; MS-RTC LM 8; .NET4.0C; .NET4.0E; .NET CLR 1.1.4322; AskTbORJ/5.15.9.29495)
 Server Name: 
 Server Address: 
 Server Port: 80

Date/Time
PHP Timezone: America/Chicago 
 PHP Time: Mon, 18 Aug 2014 09:33:20 -0500
 System Time: Mon, 18 Aug 2014 09:33:20 -0500

Nagios XI Data
License ends in: NSMOSV

 nagios (pid 52266) is running...
 NPCD running (pid 2264).
 ndo2db (pid 2527) is running...
 CPU Load 15: 2.52 
 Total Hosts: 1919 
 Total Services: 10062 

Re: Monitoring Engine Stops and Will Not Start

Posted: Mon Aug 18, 2014 10:54 am
by abrist
There are a number of causes for these symptoms. Sometimes it is due to crashed tables, gigantic tables, no disk space, failed upgrade, incorrect ndo settings, too high of load, not enough hardware resources, etc.

Do you have nagios.log/syslog logs from one of the problematic restarts?

Re: Monitoring Engine Stops and Will Not Start

Posted: Mon Aug 18, 2014 11:23 am
by BanditBBS
I am having the exact same issue. I wait 15 minutes and NDO2DB usage drops back to normal and everything is then fine. This has been happening for a couple weeks for me and I have another thread open about it. everything seems fine so far since I upgraded to 1.4. It has only been a couple hours though and the true test will see how everything is after a week of being used.

Re: Monitoring Engine Stops and Will Not Start

Posted: Mon Aug 18, 2014 11:58 am
by mrochelle
After the VM restore, I don't have any logs but I'll do a better job next time of capturing data. Since the last failure started 07:45PM Sunday night on my primary Nagios server, all focus was on getting it running again. However, I ran /usr/local/nagiosxi/scripts/backup_xi.sh to get a good backup after the image restore and got the following error:
Backing up MySQL databases...
mysqldump: Got error: 144: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed when using LOCK TABLES
Error backing up MySQL database 'nagios' - check the password in this script!
Also when I had the problem and was attempting to perform a regular restore, I got a similar error on the MySQL database 'nagios'. I suspect this table crash was the source of the problem in hind sight.

Re: Monitoring Engine Stops and Will Not Start

Posted: Mon Aug 18, 2014 4:46 pm
by abrist
mrochelle: Are you still having issues with the db?
If so, have you attempted to repair the db?
http://assets.nagios.com/downloads/nagi ... tabase.pdf

Re: Monitoring Engine Stops and Will Not Start

Posted: Mon Aug 18, 2014 5:30 pm
by mrochelle
I cleaned up the DB problem using the recommended commands. I'm confident the problem is resolved. Thanks for asking.