Page 3 of 5
Re: Blank Services and Active checks disabled
Posted: Tue Sep 15, 2015 4:09 pm
by CFT6Server
Memory and CPU doesnt seem to be an issue here and reboots are not bringing this back now. This has become a bit of a high priority as we are now losing perf data and basically nothing is working until we get the service running.
Code: Select all
top - 14:06:46 up 1:54, 1 user, load average: 0.12, 0.52, 0.44
Tasks: 312 total, 2 running, 310 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.2%us, 0.2%sy, 0.0%ni, 99.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8060788k total, 5003352k used, 3057436k free, 74716k buffers
Swap: 2064380k total, 1360k used, 2063020k free, 3752276k cached
Code: Select all
# free -m
total used free shared buffers cached
Mem: 7871 4905 2966 12 73 3664
-/+ buffers/cache: 1168 6703
Swap: 2015 1 2014
The only messages that comes up before are all the warning messages regarding the intervals that I PM'd you on. So I waited a long time then manually started Nagios service and it is staying running now.
So I think there could be something happening in the DB side perhaps? So when the configuration is applied, something is happening, and until that is done the service will not stay running. I don't know the whole workflow of Nagios XI, so will look to you guys to comment.
Re: Blank Services and Active checks disabled
Posted: Tue Sep 15, 2015 4:40 pm
by tgriep
When an apply config is run, it logs the output to the cmdsubsys.log file. Can you run the following tail command and apply the config and PM me the output of the tail command?
Code: Select all
tail -f /usr/local/nagiosxi/var/cmdsubsys.log
Re: Blank Services and Active checks disabled
Posted: Tue Sep 15, 2015 10:37 pm
by CFT6Server
Sent via PM. But the logs looks clean I think. Similar outputs.
I tried to tail all logs in /usr/local/nagios/var and /var/log and I don't seem to be seeing any errors that come up when I apply configuration.
When trying multiple times to restart nagios service and it comes back, this is the output for the ps query you wanted.
Code: Select all
# ps -ef | grep nagios.cfg | grep -v grep
nagios 19337 1 19 22:20 ? 00:00:05 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19363 19337 0 22:20 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
Re: Blank Services and Active checks disabled
Posted: Wed Sep 16, 2015 8:48 am
by tgriep
The only thing I can think of is that the verification of the config files are taking a long time with the warnings and that could cause it to not restart cleanly.
Try resolving the warnings and see if that speeds it up for you.
You can go through this document to maximize XI performance by creating a RAM disk and offloading MYSQL database and the other tips in there.
https://assets.nagios.com/downloads/nag ... ios-XI.pdf
Could you post this file so we can review it?
Re: Blank Services and Active checks disabled
Posted: Wed Sep 16, 2015 11:38 am
by WillemDH
When was the last time you were able to apply? Would restore a backup from before that time be possible? If it's a vm you could restore to second vm nagios_clone and test?
Re: Blank Services and Active checks disabled
Posted: Wed Sep 16, 2015 12:36 pm
by CFT6Server
Honestly I can't remember, we have multiple resources adding hosts and services. I just noticed the gradual downhill of this. Rolling backwards might be a bit tough as I can't track what's added and changed.
Re: Blank Services and Active checks disabled
Posted: Wed Sep 16, 2015 1:06 pm
by CFT6Server
tgriep wrote:The only thing I can think of is that the verification of the config files are taking a long time with the warnings and that could cause it to not restart cleanly.
Try resolving the warnings and see if that speeds it up for you.
You can go through this document to maximize XI performance by creating a RAM disk and offloading MYSQL database and the other tips in there.
https://assets.nagios.com/downloads/nag ... ios-XI.pdf
Could you post this file so we can review it?
Here it is.
nagios.txt
Re: Blank Services and Active checks disabled
Posted: Wed Sep 16, 2015 1:28 pm
by CFT6Server
I will go over the performance side, but it doesn't add up. I am not having any performance or resource hits, nagios service doesn't stay running and we can't find out why. I basically just keep trying to restart it and eventually it runs, but logs aren't showing anything. If this was a performance issue, it shouldn't prevent the service from running. I've observed the usage and it is not even taxed at all when applying the configuration.
Can someone help explain how the whole work flow and process works when I hit apply configuration? Maybe we can go through each step to find out where it is failing or hanging up on. I was trying to review the size of the cache and precache files but they doesn't seem too big, but perhaps in the Nagios world it is? But without full knowledge of the process, it will be hard to troubleshoot further. Especially when we are exhausting on what the logs are showing us.
Re: Blank Services and Active checks disabled
Posted: Wed Sep 16, 2015 6:52 pm
by CFT6Server
So I cleared it down to 7 warnings and no luck. Still the same issue.
I am starting to see this in the mysqld logs:
Code: Select all
150916 15:45:01 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 15:50:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 15:55:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:00:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:05:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:10:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:15:01 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:20:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:25:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:30:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:35:01 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:40:01 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:40:05 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:45:01 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:50:02 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
150916 16:55:01 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_contactnotifications' is marked as crashed and last (automatic?) repair failed
Re: Blank Services and Active checks disabled
Posted: Wed Sep 16, 2015 6:54 pm
by Box293
When you hit Apply Config, it runs the script
reconfigure_nagios.sh. You can run this script yourself:
Code: Select all
cd /usr/local/nagiosxi/scripts
reconfigure_nagios.sh
Can you please run the script and post the output here.
When you run this script at the CLI, does nagios remain running, or does it die like it has normally been?
Can you please upload the file:
/etc/sudoers
Also, any files in
/etc/sudoers.d/