Blank Services and Active checks disabled

CFT6Server · Post by **CFT6Server** » Tue Sep 15, 2015 4:09 pm

Memory and CPU doesnt seem to be an issue here and reboots are not bringing this back now. This has become a bit of a high priority as we are now losing perf data and basically nothing is working until we get the service running.

Code: Select all

top - 14:06:46 up  1:54,  1 user,  load average: 0.12, 0.52, 0.44
Tasks: 312 total,   2 running, 310 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.2%us,  0.2%sy,  0.0%ni, 99.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8060788k total,  5003352k used,  3057436k free,    74716k buffers
Swap:  2064380k total,     1360k used,  2063020k free,  3752276k cached

Code: Select all

# free -m
             total       used       free     shared    buffers     cached
Mem:          7871       4905       2966         12         73       3664
-/+ buffers/cache:       1168       6703
Swap:         2015          1       2014

The only messages that comes up before are all the warning messages regarding the intervals that I PM'd you on. So I waited a long time then manually started Nagios service and it is staying running now.

So I think there could be something happening in the DB side perhaps? So when the configuration is applied, something is happening, and until that is done the service will not stay running. I don't know the whole workflow of Nagios XI, so will look to you guys to comment.

Post by **tgriep** » Tue Sep 15, 2015 4:40 pm

When an apply config is run, it logs the output to the cmdsubsys.log file. Can you run the following tail command and apply the config and PM me the output of the tail command?

Code: Select all

tail -f /usr/local/nagiosxi/var/cmdsubsys.log

CFT6Server · Post by **CFT6Server** » Tue Sep 15, 2015 10:37 pm

Sent via PM. But the logs looks clean I think. Similar outputs.
I tried to tail all logs in /usr/local/nagios/var and /var/log and I don't seem to be seeing any errors that come up when I apply configuration.

When trying multiple times to restart nagios service and it comes back, this is the output for the ps query you wanted.

Code: Select all

# ps -ef | grep nagios.cfg | grep -v grep
nagios   19337     1 19 22:20 ?        00:00:05 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   19363 19337  0 22:20 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

Post by **tgriep** » Wed Sep 16, 2015 8:48 am

The only thing I can think of is that the verification of the config files are taking a long time with the warnings and that could cause it to not restart cleanly.
Try resolving the warnings and see if that speeds it up for you.

You can go through this document to maximize XI performance by creating a RAM disk and offloading MYSQL database and the other tips in there.
https://assets.nagios.com/downloads/nag ... ios-XI.pdf

Could you post this file so we can review it?

Code: Select all

/etc/init.d/nagios

Post by **WillemDH** » Wed Sep 16, 2015 11:38 am

When was the last time you were able to apply? Would restore a backup from before that time be possible? If it's a vm you could restore to second vm nagios_clone and test?

CFT6Server · Post by **CFT6Server** » Wed Sep 16, 2015 12:36 pm

Honestly I can't remember, we have multiple resources adding hosts and services. I just noticed the gradual downhill of this. Rolling backwards might be a bit tough as I can't track what's added and changed.

CFT6Server · Post by **CFT6Server** » Wed Sep 16, 2015 1:06 pm

tgriep wrote:The only thing I can think of is that the verification of the config files are taking a long time with the warnings and that could cause it to not restart cleanly.
Try resolving the warnings and see if that speeds it up for you.

You can go through this document to maximize XI performance by creating a RAM disk and offloading MYSQL database and the other tips in there.
https://assets.nagios.com/downloads/nag ... ios-XI.pdf

Could you post this file so we can review it?
Code: Select all
/etc/init.d/nagios

Here it is.

nagios.txt

CFT6Server · Post by **CFT6Server** » Wed Sep 16, 2015 1:28 pm

I will go over the performance side, but it doesn't add up. I am not having any performance or resource hits, nagios service doesn't stay running and we can't find out why. I basically just keep trying to restart it and eventually it runs, but logs aren't showing anything. If this was a performance issue, it shouldn't prevent the service from running. I've observed the usage and it is not even taxed at all when applying the configuration.

Can someone help explain how the whole work flow and process works when I hit apply configuration? Maybe we can go through each step to find out where it is failing or hanging up on. I was trying to review the size of the cache and precache files but they doesn't seem too big, but perhaps in the Nagios world it is? But without full knowledge of the process, it will be hard to troubleshoot further. Especially when we are exhausting on what the logs are showing us.

CFT6Server · Post by **CFT6Server** » Wed Sep 16, 2015 6:52 pm

So I cleared it down to 7 warnings and no luck. Still the same issue.

I am starting to see this in the mysqld logs:

Code: Select all

150916 15:45:01 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 15:50:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 15:55:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:00:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:05:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:10:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:15:01 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:20:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:25:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:30:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:35:01 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:40:01 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:40:05 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:45:01 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:50:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:55:01 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed

Post by **Box293** » Wed Sep 16, 2015 6:54 pm

When you hit Apply Config, it runs the script reconfigure_nagios.sh. You can run this script yourself:

Code: Select all

cd /usr/local/nagiosxi/scripts
reconfigure_nagios.sh

Can you please run the script and post the output here.

When you run this script at the CLI, does nagios remain running, or does it die like it has normally been?

Can you please upload the file:
/etc/sudoers
Also, any files in /etc/sudoers.d/

Nagios Support Forum

Blank Services and Active checks disabled

Re: Blank Services and Active checks disabled

Re: Blank Services and Active checks disabled

Re: Blank Services and Active checks disabled

Re: Blank Services and Active checks disabled

Re: Blank Services and Active checks disabled

Re: Blank Services and Active checks disabled

Re: Blank Services and Active checks disabled

Re: Blank Services and Active checks disabled

Re: Blank Services and Active checks disabled

Re: Blank Services and Active checks disabled