Monitoring Engine Stops and Will Not Start

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
User avatar
mrochelle
Posts: 238
Joined: Fri May 04, 2012 11:20 am
Location: Heart of America

Monitoring Engine Stops and Will Not Start

Post by mrochelle »

I'm posting this event for any feedback and/or recommendations. I've experienced the problem of the monitoring engine stopping with the following symptom 4 or 5 times across 2 of my nagiosXI servers. The problem event happens 10 - 15 minutes right after an apply configuration update:
Monitoring Engine Problem.JPG
Also, the process ndo2db will be at 99% CPU.
All attempts to restart the monitoring engine and the ndo2db will end up with the above symptom.
To recover, I restore from a recent backup prior to the configuration change. Unfortunately last night, I had table corruption with my local backups and had to restore from my latest VM image.
I've research the forum for a fix or recommendations but the listings that I've found with similar symptoms were usually resolved outside the forum.
Comments or thoughts are welcome from others that have experienced or resolved this problem permanently.
Marcus

Code: Select all

System:
Nagios XI Version : 2014R1.4
 nagprod01.<domain>.com 2.6.32-279.11.1.el6.x86_64 x86_64
 CentOS release 6.3 (Final)
 Gnome is not installed

Apache Information
PHP Version: 5.3.3
 Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; MS-RTC LM 8; .NET4.0C; .NET4.0E; .NET CLR 1.1.4322; AskTbORJ/5.15.9.29495)
 Server Name: 
 Server Address: 
 Server Port: 80

Date/Time
PHP Timezone: America/Chicago 
 PHP Time: Mon, 18 Aug 2014 09:33:20 -0500
 System Time: Mon, 18 Aug 2014 09:33:20 -0500

Nagios XI Data
License ends in: NSMOSV

 nagios (pid 52266) is running...
 NPCD running (pid 2264).
 ndo2db (pid 2527) is running...
 CPU Load 15: 2.52 
 Total Hosts: 1919 
 Total Services: 10062 
You do not have the required permissions to view the files attached to this post.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Monitoring Engine Stops and Will Not Start

Post by abrist »

There are a number of causes for these symptoms. Sometimes it is due to crashed tables, gigantic tables, no disk space, failed upgrade, incorrect ndo settings, too high of load, not enough hardware resources, etc.

Do you have nagios.log/syslog logs from one of the problematic restarts?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Monitoring Engine Stops and Will Not Start

Post by BanditBBS »

I am having the exact same issue. I wait 15 minutes and NDO2DB usage drops back to normal and everything is then fine. This has been happening for a couple weeks for me and I have another thread open about it. everything seems fine so far since I upgraded to 1.4. It has only been a couple hours though and the true test will see how everything is after a week of being used.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
User avatar
mrochelle
Posts: 238
Joined: Fri May 04, 2012 11:20 am
Location: Heart of America

Re: Monitoring Engine Stops and Will Not Start

Post by mrochelle »

After the VM restore, I don't have any logs but I'll do a better job next time of capturing data. Since the last failure started 07:45PM Sunday night on my primary Nagios server, all focus was on getting it running again. However, I ran /usr/local/nagiosxi/scripts/backup_xi.sh to get a good backup after the image restore and got the following error:
Backing up MySQL databases...
mysqldump: Got error: 144: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed when using LOCK TABLES
Error backing up MySQL database 'nagios' - check the password in this script!
Also when I had the problem and was attempting to perform a regular restore, I got a similar error on the MySQL database 'nagios'. I suspect this table crash was the source of the problem in hind sight.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Monitoring Engine Stops and Will Not Start

Post by abrist »

mrochelle: Are you still having issues with the db?
If so, have you attempted to repair the db?
http://assets.nagios.com/downloads/nagi ... tabase.pdf
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
mrochelle
Posts: 238
Joined: Fri May 04, 2012 11:20 am
Location: Heart of America

Re: Monitoring Engine Stops and Will Not Start

Post by mrochelle »

I cleaned up the DB problem using the recommended commands. I'm confident the problem is resolved. Thanks for asking.
Locked