Blank Services and Active checks disabled

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Blank Services and Active checks disabled

Post by CFT6Server »

Memory and CPU doesnt seem to be an issue here and reboots are not bringing this back now. This has become a bit of a high priority as we are now losing perf data and basically nothing is working until we get the service running.

Code: Select all

top - 14:06:46 up  1:54,  1 user,  load average: 0.12, 0.52, 0.44
Tasks: 312 total,   2 running, 310 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.2%us,  0.2%sy,  0.0%ni, 99.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8060788k total,  5003352k used,  3057436k free,    74716k buffers
Swap:  2064380k total,     1360k used,  2063020k free,  3752276k cached

Code: Select all

# free -m
             total       used       free     shared    buffers     cached
Mem:          7871       4905       2966         12         73       3664
-/+ buffers/cache:       1168       6703
Swap:         2015          1       2014

The only messages that comes up before are all the warning messages regarding the intervals that I PM'd you on. So I waited a long time then manually started Nagios service and it is staying running now.

So I think there could be something happening in the DB side perhaps? So when the configuration is applied, something is happening, and until that is done the service will not stay running. I don't know the whole workflow of Nagios XI, so will look to you guys to comment.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Blank Services and Active checks disabled

Post by tgriep »

When an apply config is run, it logs the output to the cmdsubsys.log file. Can you run the following tail command and apply the config and PM me the output of the tail command?

Code: Select all

tail -f /usr/local/nagiosxi/var/cmdsubsys.log
Be sure to check out our Knowledgebase for helpful articles and solutions!
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Blank Services and Active checks disabled

Post by CFT6Server »

Sent via PM. But the logs looks clean I think. Similar outputs.
I tried to tail all logs in /usr/local/nagios/var and /var/log and I don't seem to be seeing any errors that come up when I apply configuration.

When trying multiple times to restart nagios service and it comes back, this is the output for the ps query you wanted.

Code: Select all

# ps -ef | grep nagios.cfg | grep -v grep
nagios   19337     1 19 22:20 ?        00:00:05 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   19363 19337  0 22:20 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Blank Services and Active checks disabled

Post by tgriep »

The only thing I can think of is that the verification of the config files are taking a long time with the warnings and that could cause it to not restart cleanly.
Try resolving the warnings and see if that speeds it up for you.

You can go through this document to maximize XI performance by creating a RAM disk and offloading MYSQL database and the other tips in there.
https://assets.nagios.com/downloads/nag ... ios-XI.pdf

Could you post this file so we can review it?

Code: Select all

/etc/init.d/nagios
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Blank Services and Active checks disabled

Post by WillemDH »

When was the last time you were able to apply? Would restore a backup from before that time be possible? If it's a vm you could restore to second vm nagios_clone and test?
Nagios XI 5.8.1
https://outsideit.net
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Blank Services and Active checks disabled

Post by CFT6Server »

Honestly I can't remember, we have multiple resources adding hosts and services. I just noticed the gradual downhill of this. Rolling backwards might be a bit tough as I can't track what's added and changed.
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Blank Services and Active checks disabled

Post by CFT6Server »

tgriep wrote:The only thing I can think of is that the verification of the config files are taking a long time with the warnings and that could cause it to not restart cleanly.
Try resolving the warnings and see if that speeds it up for you.

You can go through this document to maximize XI performance by creating a RAM disk and offloading MYSQL database and the other tips in there.
https://assets.nagios.com/downloads/nag ... ios-XI.pdf

Could you post this file so we can review it?

Code: Select all

/etc/init.d/nagios

Here it is.
nagios.txt
You do not have the required permissions to view the files attached to this post.
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Blank Services and Active checks disabled

Post by CFT6Server »

I will go over the performance side, but it doesn't add up. I am not having any performance or resource hits, nagios service doesn't stay running and we can't find out why. I basically just keep trying to restart it and eventually it runs, but logs aren't showing anything. If this was a performance issue, it shouldn't prevent the service from running. I've observed the usage and it is not even taxed at all when applying the configuration.

Can someone help explain how the whole work flow and process works when I hit apply configuration? Maybe we can go through each step to find out where it is failing or hanging up on. I was trying to review the size of the cache and precache files but they doesn't seem too big, but perhaps in the Nagios world it is? But without full knowledge of the process, it will be hard to troubleshoot further. Especially when we are exhausting on what the logs are showing us.
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Blank Services and Active checks disabled

Post by CFT6Server »

So I cleared it down to 7 warnings and no luck. Still the same issue.

I am starting to see this in the mysqld logs:

Code: Select all

150916 15:45:01 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 15:50:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 15:55:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:00:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:05:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:10:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:15:01 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:20:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:25:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:30:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:35:01 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:40:01 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:40:05 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:45:01 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:50:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:55:01 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
Last edited by CFT6Server on Wed Sep 16, 2015 6:56 pm, edited 1 time in total.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Blank Services and Active checks disabled

Post by Box293 »

When you hit Apply Config, it runs the script reconfigure_nagios.sh. You can run this script yourself:

Code: Select all

cd /usr/local/nagiosxi/scripts
reconfigure_nagios.sh
Can you please run the script and post the output here.

When you run this script at the CLI, does nagios remain running, or does it die like it has normally been?

Can you please upload the file:
/etc/sudoers
Also, any files in /etc/sudoers.d/
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked