Page 1 of 5
Blank Services and Active checks disabled
Posted: Sat Sep 12, 2015 6:57 pm
by CFT6Server
I have been noticing a deteriorating issue with commit changes. It's taking longer and longer for services to show after I apply configuration changes. Today I think it finally gave in. Previously I might have to wait a minute or so before services within a host shows after applying configuration. Today it never came back. So the services on all hosts remains blank. The services are there in CCM though. I also noticed that the active checks shows up with blue icons.
system ok.JPG
Usually this goes away once the services are back, but seems like it is stuck now. Did I reach some limit here? I tried looking through the logs and doesn't seem like there are any issues. I did noticed these messages in /var/log/messages and adjust the kernel.
Code: Select all
Sep 12 16:27:01 kdcnagxi01 ndo2db: Warning: Retrying message send. This can occur because you have too few messages allowed or too few total bytes allowed in message queues. You are currently using 128000 of 15736 messages and 131072000 of 131072000 bytes in the queue. See README for kernel tuning options.
Sep 12 16:27:03 kdcnagxi01 ndo2db: Message sent to queue.
Sep 12 16:27:03 kdcnagxi01 ndo2db: Warning: queue send error, retrying...
Sep 12 16:27:04 kdcnagxi01 ndo2db: Message sent to queue.
But that doesn't seem to help.
I am now seeing this at the end of the nagios.log file
Code: Select all
Sep 12 16:41:19 kdcnagxi01 nagios: Successfully launched command file worker with pid 21332
Sep 12 16:41:19 kdcnagxi01 nagios: Caught SIGSEGV, shutting down...
Seems like the nagios process isn't staying running.
Re: Blank Services and Active checks disabled
Posted: Mon Sep 14, 2015 8:56 am
by tmcdonald
What version of XI are you running? 2014R1.1 introduced some improvements on the CCM Apply Config process that greatly improve the time taken.
Re: Blank Services and Active checks disabled
Posted: Mon Sep 14, 2015 12:01 pm
by CFT6Server
We are currently using 2014R2.6. Since this was gradual, I suspect it could be the size of our database or services checks? (wild stab here)
Re: Blank Services and Active checks disabled
Posted: Mon Sep 14, 2015 2:25 pm
by tgriep
You may have to edit the kernel message queues on your system. Take a look at this link and see if that resolves it for you.
Code: Select all
https://support.nagios.com/wiki/index.php/Nagios_XI:FAQs#Upgrade_to_2011R3.x_Issues
Another thing to check to see if you have a PHP timeout when you are applying the configuration. This link describes how to fix that.
Code: Select all
https://support.nagios.com/wiki/index.php/Nagios_XI:FAQs#Apply_Configuration_Page_Stalls_Out.2C_Never_Completes
Let us know if this resolves it for you.
Re: Blank Services and Active checks disabled
Posted: Mon Sep 14, 2015 3:51 pm
by CFT6Server
I've adjusted the timeout on php.ini to 120 previously already and kernel queue was increased already as well but no luck. I just added another service and looks to be stuck again.
Code: Select all
[1442263448] Successfully launched command file worker with pid 22507
[1442263448] Caught SIGSEGV, shutting down...
Code: Select all
# service nagios status
nagios is not running
It does seems to get worst with more services added and seems gradual to this point now. Applying configuration completes successfully and no problem there. Just the services will remain blank for all hosts. Checking services under CCM shows that they are there.
Re: Blank Services and Active checks disabled
Posted: Mon Sep 14, 2015 4:04 pm
by jdalrymple
I've found an interesting statistic to be the one between the time that the nagios parent process is launched and the 1st child:
Code: Select all
[jdalrymple@localhost ~]$ sudo ps -ef | grep nagios.cfg | grep -v grep
nagios 57630 1 3 06:13 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 57640 57630 3 06:13 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
In my case it started at the exact same time (0613) because it's a small environment. What does yours look like?
Re: Blank Services and Active checks disabled
Posted: Mon Sep 14, 2015 4:06 pm
by tgriep
Maybe the nagios process isn't shutting down all of the way and needs more time to do so.
Take a look at this link and see if this helps.
https://support.nagios.com/wiki/index.p ... ely_manner
Re: Blank Services and Active checks disabled
Posted: Mon Sep 14, 2015 4:07 pm
by WillemDH
How many hosts / services do you have if I may ask?
Re: Blank Services and Active checks disabled
Posted: Mon Sep 14, 2015 4:36 pm
by jdalrymple
WillemDH wrote:How many hosts / services do you have if I may ask?
I assume you're asking me. My installs are transient. This one is about 4 hours old. It has 1 host and 8 services

Re: Blank Services and Active checks disabled
Posted: Mon Sep 14, 2015 4:54 pm
by CFT6Server
jdalrymple wrote:I've found an interesting statistic to be the one between the time that the nagios parent process is launched and the 1st child:
Code: Select all
[jdalrymple@localhost ~]$ sudo ps -ef | grep nagios.cfg | grep -v grep
nagios 57630 1 3 06:13 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 57640 57630 3 06:13 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
In my case it started at the exact same time (0613) because it's a small environment. What does yours look like?
I can't get this because Nagios won't stay running.