Monitoring Engine Process failing to start

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Monitoring Engine Process failing to start

Post by BanditBBS »

The left three are green checks, the right three are blue exclamation marks. Under monitoring engine process is shows stopped and I can not get it to start.

EDIT: Don't know what happened, but it eventually restarted, was not working for 30 minutes or so....weird.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Monitoring Engine Process failing to start

Post by BanditBBS »

I just needed to apply changes again and its doing it again. We've only added a few host and service groups and made a few minor changes to service. This all of a sudden started happening today and I am seeing no errors.

I even rebooted the server as last resort and it is taking forever to start the monitoring engine.

Yeah, it took 15 minutes to show all 6 as green.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Monitoring Engine Process failing to start

Post by sreinhardt »

Well those check marks are updated via cron, but that should be run at least every 30 seconds. While they were showing as blue, was the nagios process started and appearing to check? I assume you tried, but write\verify not working or showing any errors either?
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Monitoring Engine Process failing to start

Post by BanditBBS »

sreinhardt wrote:Well those check marks are updated via cron, but that should be run at least every 30 seconds. While they were showing as blue, was the nagios process started and appearing to check? I assume you tried, but write\verify not working or showing any errors either?
nagios process was up, but if you look at the Monitoring Engine Event Queue dashlet, everything just stacks up and hosts are all greyed out.

Write verify shows zero errors. Everything functions fine once it starts after 15 minutes.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Monitoring Engine Process failing to start

Post by sreinhardt »

How large of an environment are we talking, both hosts\service counts and system resources? 15 min seems like quite the delay for something that should be pretty instantaneous in most cases.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Monitoring Engine Process failing to start

Post by BanditBBS »

sreinhardt wrote:How large of an environment are we talking, both hosts\service counts and system resources? 15 min seems like quite the delay for something that should be pretty instantaneous in most cases.
This was working fine last night...all of a sudden started doing this today.

131 Hosts 1436 Services, so rather small. The server has 32GB ram and 8 cores.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Monitoring Engine Process failing to start

Post by sreinhardt »

small... more like complete and total overkill with those hardware specs. OK there goes that idea. Would you be willing to tail the nagios log durring a restart and send it over?

Code: Select all

tail -f /usr/local/nagios/var/nagios.log 2>&1 | tee -a /tmp/nagios.log &
service nagios restart
killall tail 
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Monitoring Engine Process failing to start

Post by BanditBBS »

Here you go:

Code: Select all

[1406668437] Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.
[1406668439] Nagios 4.0.7 starting... (PID=29516)
[1406668439] Local time is Tue Jul 29 16:13:59 CDT 2014
[1406668439] LOG VERSION: 2.0
[1406668439] qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
[1406668439] qh: core query handler registered
[1406668439] nerd: Channel hostchecks registered successfully
[1406668439] nerd: Channel servicechecks registered successfully
[1406668439] nerd: Channel opathchecks registered successfully
[1406668439] nerd: Fully initialized and ready to rock!
[1406668439] wproc: Successfully registered manager as @wproc with query handler
[1406668439] wproc: Registry request: name=Core Worker 29519;pid=29519
[1406668439] wproc: Registry request: name=Core Worker 29518;pid=29518
[1406668439] wproc: Registry request: name=Core Worker 29520;pid=29520
[1406668439] wproc: Registry request: name=Core Worker 29525;pid=29525
[1406668439] wproc: Registry request: name=Core Worker 29521;pid=29521
[1406668439] wproc: Registry request: name=Core Worker 29524;pid=29524
[1406668439] wproc: Registry request: name=Core Worker 29522;pid=29522
[1406668439] wproc: Registry request: name=Core Worker 29527;pid=29527
[1406668439] wproc: Registry request: name=Core Worker 29528;pid=29528
[1406668439] wproc: Registry request: name=Core Worker 29526;pid=29526
[1406668439] wproc: Registry request: name=Core Worker 29523;pid=29523
[1406668439] wproc: Registry request: name=Core Worker 29529;pid=29529
[1406668439] ndomod: NDOMOD 2.0.0 (02-28-2014) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
[1406668439] ndomod: Successfully connected to data sink.  0 queued items to flush.
[1406668439] ndomod registered for process data
[1406668439] ndomod registered for log data'
[1406668439] ndomod registered for system command data'
[1406668439] ndomod registered for event handler data'
[1406668439] ndomod registered for notification data'
[1406668439] ndomod registered for comment data'
[1406668439] ndomod registered for downtime data'
[1406668439] ndomod registered for flapping data'
[1406668439] ndomod registered for program status data'
[1406668439] ndomod registered for host status data'
[1406668439] ndomod registered for service status data'
[1406668439] ndomod registered for adaptive program data'
[1406668439] ndomod registered for adaptive host data'
[1406668439] ndomod registered for adaptive service data'
[1406668439] ndomod registered for external command data'
[1406668439] ndomod registered for aggregated status data'
[1406668439] ndomod registered for retention data'
[1406668439] ndomod registered for contact data'
[1406668439] ndomod registered for contact notification data'
[1406668439] ndomod registered for acknowledgement data'
[1406668439] ndomod registered for state change data'
[1406668439] ndomod registered for contact status data'
[1406668439] ndomod registered for adaptive contact data'
[1406668439] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
[1406668439] Warning: Host 'CDM Checks - Linux' has no default contacts or contactgroups defined!
[1406668439] Successfully launched command file worker with pid 29535
Once everything was green(15 mins) service check stuff started showing up.

EDIT: All of a sudden now it starts faster than the page can reload.....I just need a drink!

EDIT2: And hours later it starts take 10+ minutes to restart again. I'm beginning to think this is a load thing, not sure about anything at this point.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Monitoring Engine Process failing to start

Post by tmcdonald »

You probably won't have any meaningful data during the long restarts, but do your localhost CPU load graphs show any patterns before the failures?
Former Nagios employee
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Monitoring Engine Process failing to start

Post by BanditBBS »

Sort of.....
chart.jpeg
Load was never over 2.5 also, so I should have plenty of horsepower.
You do not have the required permissions to view the files attached to this post.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Locked