Monitoring Engine Process failing to start
Monitoring Engine Process failing to start
The left three are green checks, the right three are blue exclamation marks. Under monitoring engine process is shows stopped and I can not get it to start.
EDIT: Don't know what happened, but it eventually restarted, was not working for 30 minutes or so....weird.
EDIT: Don't know what happened, but it eventually restarted, was not working for 30 minutes or so....weird.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Re: Monitoring Engine Process failing to start
I just needed to apply changes again and its doing it again. We've only added a few host and service groups and made a few minor changes to service. This all of a sudden started happening today and I am seeing no errors.
I even rebooted the server as last resort and it is taking forever to start the monitoring engine.
Yeah, it took 15 minutes to show all 6 as green.
I even rebooted the server as last resort and it is taking forever to start the monitoring engine.
Yeah, it took 15 minutes to show all 6 as green.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
-
sreinhardt
- -fno-stack-protector
- Posts: 4366
- Joined: Mon Nov 19, 2012 12:10 pm
Re: Monitoring Engine Process failing to start
Well those check marks are updated via cron, but that should be run at least every 30 seconds. While they were showing as blue, was the nagios process started and appearing to check? I assume you tried, but write\verify not working or showing any errors either?
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Re: Monitoring Engine Process failing to start
nagios process was up, but if you look at the Monitoring Engine Event Queue dashlet, everything just stacks up and hosts are all greyed out.sreinhardt wrote:Well those check marks are updated via cron, but that should be run at least every 30 seconds. While they were showing as blue, was the nagios process started and appearing to check? I assume you tried, but write\verify not working or showing any errors either?
Write verify shows zero errors. Everything functions fine once it starts after 15 minutes.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
-
sreinhardt
- -fno-stack-protector
- Posts: 4366
- Joined: Mon Nov 19, 2012 12:10 pm
Re: Monitoring Engine Process failing to start
How large of an environment are we talking, both hosts\service counts and system resources? 15 min seems like quite the delay for something that should be pretty instantaneous in most cases.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Re: Monitoring Engine Process failing to start
This was working fine last night...all of a sudden started doing this today.sreinhardt wrote:How large of an environment are we talking, both hosts\service counts and system resources? 15 min seems like quite the delay for something that should be pretty instantaneous in most cases.
131 Hosts 1436 Services, so rather small. The server has 32GB ram and 8 cores.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
-
sreinhardt
- -fno-stack-protector
- Posts: 4366
- Joined: Mon Nov 19, 2012 12:10 pm
Re: Monitoring Engine Process failing to start
small... more like complete and total overkill with those hardware specs. OK there goes that idea. Would you be willing to tail the nagios log durring a restart and send it over?
Code: Select all
tail -f /usr/local/nagios/var/nagios.log 2>&1 | tee -a /tmp/nagios.log &
service nagios restart
killall tail Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Re: Monitoring Engine Process failing to start
Here you go:
Once everything was green(15 mins) service check stuff started showing up.
EDIT: All of a sudden now it starts faster than the page can reload.....I just need a drink!
EDIT2: And hours later it starts take 10+ minutes to restart again. I'm beginning to think this is a load thing, not sure about anything at this point.
Code: Select all
[1406668437] Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.
[1406668439] Nagios 4.0.7 starting... (PID=29516)
[1406668439] Local time is Tue Jul 29 16:13:59 CDT 2014
[1406668439] LOG VERSION: 2.0
[1406668439] qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
[1406668439] qh: core query handler registered
[1406668439] nerd: Channel hostchecks registered successfully
[1406668439] nerd: Channel servicechecks registered successfully
[1406668439] nerd: Channel opathchecks registered successfully
[1406668439] nerd: Fully initialized and ready to rock!
[1406668439] wproc: Successfully registered manager as @wproc with query handler
[1406668439] wproc: Registry request: name=Core Worker 29519;pid=29519
[1406668439] wproc: Registry request: name=Core Worker 29518;pid=29518
[1406668439] wproc: Registry request: name=Core Worker 29520;pid=29520
[1406668439] wproc: Registry request: name=Core Worker 29525;pid=29525
[1406668439] wproc: Registry request: name=Core Worker 29521;pid=29521
[1406668439] wproc: Registry request: name=Core Worker 29524;pid=29524
[1406668439] wproc: Registry request: name=Core Worker 29522;pid=29522
[1406668439] wproc: Registry request: name=Core Worker 29527;pid=29527
[1406668439] wproc: Registry request: name=Core Worker 29528;pid=29528
[1406668439] wproc: Registry request: name=Core Worker 29526;pid=29526
[1406668439] wproc: Registry request: name=Core Worker 29523;pid=29523
[1406668439] wproc: Registry request: name=Core Worker 29529;pid=29529
[1406668439] ndomod: NDOMOD 2.0.0 (02-28-2014) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
[1406668439] ndomod: Successfully connected to data sink. 0 queued items to flush.
[1406668439] ndomod registered for process data
[1406668439] ndomod registered for log data'
[1406668439] ndomod registered for system command data'
[1406668439] ndomod registered for event handler data'
[1406668439] ndomod registered for notification data'
[1406668439] ndomod registered for comment data'
[1406668439] ndomod registered for downtime data'
[1406668439] ndomod registered for flapping data'
[1406668439] ndomod registered for program status data'
[1406668439] ndomod registered for host status data'
[1406668439] ndomod registered for service status data'
[1406668439] ndomod registered for adaptive program data'
[1406668439] ndomod registered for adaptive host data'
[1406668439] ndomod registered for adaptive service data'
[1406668439] ndomod registered for external command data'
[1406668439] ndomod registered for aggregated status data'
[1406668439] ndomod registered for retention data'
[1406668439] ndomod registered for contact data'
[1406668439] ndomod registered for contact notification data'
[1406668439] ndomod registered for acknowledgement data'
[1406668439] ndomod registered for state change data'
[1406668439] ndomod registered for contact status data'
[1406668439] ndomod registered for adaptive contact data'
[1406668439] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
[1406668439] Warning: Host 'CDM Checks - Linux' has no default contacts or contactgroups defined!
[1406668439] Successfully launched command file worker with pid 29535
EDIT: All of a sudden now it starts faster than the page can reload.....I just need a drink!
EDIT2: And hours later it starts take 10+ minutes to restart again. I'm beginning to think this is a load thing, not sure about anything at this point.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Re: Monitoring Engine Process failing to start
You probably won't have any meaningful data during the long restarts, but do your localhost CPU load graphs show any patterns before the failures?
Former Nagios employee
Re: Monitoring Engine Process failing to start
Sort of.....
Load was never over 2.5 also, so I should have plenty of horsepower.
You do not have the required permissions to view the files attached to this post.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github