Blank Services and Active checks disabled

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Blank Services and Active checks disabled

Post by CFT6Server »

I have been noticing a deteriorating issue with commit changes. It's taking longer and longer for services to show after I apply configuration changes. Today I think it finally gave in. Previously I might have to wait a minute or so before services within a host shows after applying configuration. Today it never came back. So the services on all hosts remains blank. The services are there in CCM though. I also noticed that the active checks shows up with blue icons.
system ok.JPG
Usually this goes away once the services are back, but seems like it is stuck now. Did I reach some limit here? I tried looking through the logs and doesn't seem like there are any issues. I did noticed these messages in /var/log/messages and adjust the kernel.

Code: Select all

Sep 12 16:27:01 kdcnagxi01 ndo2db: Warning: Retrying message send. This can occur because you have too few messages allowed or too few total bytes allowed in message queues. You are currently using 128000 of 15736 messages and 131072000 of 131072000 bytes in the queue. See README for kernel tuning options.
Sep 12 16:27:03 kdcnagxi01 ndo2db: Message sent to queue.
Sep 12 16:27:03 kdcnagxi01 ndo2db: Warning: queue send error, retrying...
Sep 12 16:27:04 kdcnagxi01 ndo2db: Message sent to queue.
But that doesn't seem to help.

I am now seeing this at the end of the nagios.log file

Code: Select all

Sep 12 16:41:19 kdcnagxi01 nagios: Successfully launched command file worker with pid 21332
Sep 12 16:41:19 kdcnagxi01 nagios: Caught SIGSEGV, shutting down...
Seems like the nagios process isn't staying running.
You do not have the required permissions to view the files attached to this post.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Blank Services and Active checks disabled

Post by tmcdonald »

What version of XI are you running? 2014R1.1 introduced some improvements on the CCM Apply Config process that greatly improve the time taken.
Former Nagios employee
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Blank Services and Active checks disabled

Post by CFT6Server »

We are currently using 2014R2.6. Since this was gradual, I suspect it could be the size of our database or services checks? (wild stab here)
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Blank Services and Active checks disabled

Post by tgriep »

You may have to edit the kernel message queues on your system. Take a look at this link and see if that resolves it for you.

Code: Select all

https://support.nagios.com/wiki/index.php/Nagios_XI:FAQs#Upgrade_to_2011R3.x_Issues
Another thing to check to see if you have a PHP timeout when you are applying the configuration. This link describes how to fix that.

Code: Select all

https://support.nagios.com/wiki/index.php/Nagios_XI:FAQs#Apply_Configuration_Page_Stalls_Out.2C_Never_Completes
Let us know if this resolves it for you.
Be sure to check out our Knowledgebase for helpful articles and solutions!
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Blank Services and Active checks disabled

Post by CFT6Server »

I've adjusted the timeout on php.ini to 120 previously already and kernel queue was increased already as well but no luck. I just added another service and looks to be stuck again.

Code: Select all

[1442263448] Successfully launched command file worker with pid 22507
[1442263448] Caught SIGSEGV, shutting down...

Code: Select all

# service nagios status
nagios is not running
It does seems to get worst with more services added and seems gradual to this point now. Applying configuration completes successfully and no problem there. Just the services will remain blank for all hosts. Checking services under CCM shows that they are there.
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Blank Services and Active checks disabled

Post by jdalrymple »

I've found an interesting statistic to be the one between the time that the nagios parent process is launched and the 1st child:

Code: Select all

[jdalrymple@localhost ~]$ sudo ps -ef | grep nagios.cfg | grep -v grep
nagios    57630      1  3 06:13 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    57640  57630  3 06:13 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
In my case it started at the exact same time (0613) because it's a small environment. What does yours look like?
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Blank Services and Active checks disabled

Post by tgriep »

Maybe the nagios process isn't shutting down all of the way and needs more time to do so.
Take a look at this link and see if this helps.
https://support.nagios.com/wiki/index.p ... ely_manner
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Blank Services and Active checks disabled

Post by WillemDH »

How many hosts / services do you have if I may ask?
Nagios XI 5.8.1
https://outsideit.net
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Blank Services and Active checks disabled

Post by jdalrymple »

WillemDH wrote:How many hosts / services do you have if I may ask?

I assume you're asking me. My installs are transient. This one is about 4 hours old. It has 1 host and 8 services :)
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Blank Services and Active checks disabled

Post by CFT6Server »

jdalrymple wrote:I've found an interesting statistic to be the one between the time that the nagios parent process is launched and the 1st child:

Code: Select all

[jdalrymple@localhost ~]$ sudo ps -ef | grep nagios.cfg | grep -v grep
nagios    57630      1  3 06:13 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    57640  57630  3 06:13 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
In my case it started at the exact same time (0613) because it's a small environment. What does yours look like?
I can't get this because Nagios won't stay running.
Locked