Blank Services and Active checks disabled

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Blank Services and Active checks disabled

Post by CFT6Server »

tgriep wrote:What is puzzling is this line from your output.txt file.
/etc/init.d/nagios: line 141: kill: (19280) - No such process
Do you have multiple people accessing your system and applying the config?

Could you run the following and post back the results?

Code: Select all

cat /usr/local/nagios/var/nagios.lock
ps -ef |grep bin/nagios
And the following line is missing from the /etc/sudoers file. Can you add it and see that helps?

Code: Select all

NAGIOSXI ALL = NOPASSWD:/usr/local/nagiosxi/scripts/reset_config_perms.sh
The output.txt was taken when the service was already stuck, so when I apply the configuration. Nagios process was already not running.

Code: Select all

# cat /usr/local/nagios/var/nagios.lock
3818

Code: Select all

# ps -ef |grep bin/nagios
root     16061 15935  0 09:45 pts/0    00:00:00 grep bin/nagios
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Blank Services and Active checks disabled

Post by CFT6Server »

jdalrymple wrote:
CFT6Server wrote:Really seeking some guidance here guys from Nagios. I cannot make any changes without running into this issue and starting to impact us. Any ideas?
One thing that I'm a bit vague on - how are you (ever) getting it back up? It sounds like you're unable to get the service up right now?

To answer your earlier question about what the reconfigure script does:

1) look for files that require updating
2) create and download the appropriate files from nagiosql
3) config verify
4) nagios restart

When you're trying to start the nagios service it just tanks and nothing is showing up useful in nagios.log? Is ndo2db starting and running OK? Anything in dmesg or /var/log/messages?

What do you get if you try to start nagios interactively?

Code: Select all

/usr/local/nagios/bin/nagios -c /usr/local/nagios/etc/nagios.cfg

The only way I can get this back up and running is to either try reboot and then constantly trying to start the nagios server until it stays running. I do not have any solid workaround to get this up and running every time. Each time once it fails, it takes a long time before I can get it running again. So it really is hindering us because I cannot make any changes without losing the XI box and the performance data for an hour or two at least. A simple reboot does not work and any restart of the server means Nagios stops running.

So far, NOTHING shows on the logs and we've looked at various things. (Reference nagios.log files can be found in this thread). I've tried to review the mysqld logs and nothing. I've looked through most of the logs in /var/log, /usr/local/nagios/var and /usr/local/nagiosxi/var and I just can't seem to find the smoking gun.

One thing I do noticed is that, when applying configuration, nagios starts up, I see mysqld and ndo2db running, shortly after ndo2db stops, nagios stops also.
The way I've been testing is to just restart nagios service by running "service nagios start". I can try the command provided above.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Blank Services and Active checks disabled

Post by ssax »

Try removing the lock file:

Code: Select all

rm /usr/local/nagios/var/nagios.lock
Then try to start nagios:

Code: Select all

service nagios start
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Blank Services and Active checks disabled

Post by jdalrymple »

Did we get your mysqld.log cleaned up? It's not really clear what transpired after you started seeing the crashed tables.
Have you tried launching nagios interactively to see if anything shows up?

Code: Select all

/usr/local/nagios/bin/nagios -c /usr/local/nagios/etc/nagios.cfg
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Blank Services and Active checks disabled

Post by CFT6Server »

I get the following: /usr/local/nagios/bin/nagios: invalid option -- 'c'

I did run it doing:
/usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg

I see this in the logs

Code: Select all

Successfully launched command file worker with pid 19045
Segmentation fault
and Nagios process does not start. I really need some escalation on this issue, I feel like we haven't got a good grasp of what is happening and we are at a bit of a halt and its been a while.

FYI - this does not fix the issue
NAGIOSXI ALL = NOPASSWD:/usr/local/nagiosxi/scripts/reset_config_perms.sh

So the main symptoms here is that when this. On a normal XI system, when you apply configuration, the list of services disappear and returns pretty quickly. In our implementation, this has been on a decline. With more checks added, it is taking longer and longer for services to come back. Now it is not coming back at all. So what is it that it is doing during this that's timing out. This is where I need some of your expertise to help figure out what that might be.

How does ndo2db interact with mysql and nagios? perhaps something we can look at that?
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Blank Services and Active checks disabled

Post by lmiltchev »

I believe jdalrymple meant to ask you to run this:

Code: Select all

/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
Do you get any errors when you try to start nagios this way?

Is opening an email support ticket an option for you? If it is, email us at [email protected]. Type "Blank Services and Active checks disabled" in the email's subject field and provide a URL link to this thread in the email's body. Thank you!
Be sure to check out our Knowledgebase for helpful articles and solutions!
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Blank Services and Active checks disabled

Post by CFT6Server »

Thanks. I've done that. Email sent.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Blank Services and Active checks disabled

Post by ssax »

Locking because this has been moved into a ticket.

Thank you
Locked