Page 1 of 1
Nagios XI system status goes red & back to green often
Posted: Tue Sep 15, 2015 1:38 pm
by JakeHatMacys
We're running around 3300 hosts on the box and about 10,000 service checks... Is this too much for XI to keep up with? Cause for concern? I'm not seeing anything out of the ordinary in the /var/log/messages file.
Capture1.JPG
Everything seems to be working ticketing wise (using the HP BSM Connector) but it's concerning to see.
Re: Nagios XI system status goes red & back to green often
Posted: Tue Sep 15, 2015 4:06 pm
by ssax
Are you seeing any errors in any of these files?
Code: Select all
/usr/local/nagios/var/nagios.log
/usr/local/nagiosxi/var/feedproc.log
/usr/local/nagiosxi/var/cmdsubsys.log
/usr/local/nagiosxi/var/reportengine.log
/usr/local/nagiosxi/var/eventman.log
I'll take a look at the code and see how they're being checked.
Re: Nagios XI system status goes red & back to green often
Posted: Tue Sep 15, 2015 4:16 pm
by ssax
Also, might want to see if you have anything in /var/log/cron
Re: Nagios XI system status goes red & back to green often
Posted: Wed Sep 16, 2015 10:15 am
by JakeHatMacys
Only ones to bring back anything really:
Code: Select all
$ tail -f /usr/local/nagiosxi/var/feedproc.log
..PHP Warning: exec(): Unable to fork [php -q /usr/local/nagiosxi/scripts/parse_core_eventlog.php] in /usr/local/nagiosxi/cron/feedproc.php on line 88
.
PROCESSED 0 COMMANDS
tail -50 /usr/local/nagiosxi/var/eventman.log
- this log is just barking about the trap sender not being enabled.
Code: Select all
tail -f cron
Sep 16 11:04:01 esu4v412 CROND[17733]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/nom.php > /usr/local/nagiosxi/var/nom.log 2>&1)
Sep 16 11:04:01 esu4v412 CROND[17741]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/local/nagiosxi/var/cmdsubsys.log 2>&1)
Sep 16 11:04:01 esu4v412 CROND[17742]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/local/nagiosxi/var/sysstat.log 2>&1)
Sep 16 11:04:01 esu4v412 CROND[17738]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cleaner.php > /usr/local/nagiosxi/var/cleaner.log 2>&1)
Sep 16 11:04:01 esu4v412 CROND[17737]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/perfdataproc.log 2>&1)
Sep 16 11:04:01 esu4v412 CROND[17739]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/reportengine.php > /usr/local/nagiosxi/var/reportengine.log 2>&1)
Sep 16 11:04:01 esu4v412 CROND[17740]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php > /usr/local/nagiosxi/var/feedproc.log 2>&1)
Sep 16 11:05:01 esu4v412 CROND[19023]: (root) CMD (LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok)
Sep 16 11:10:01 esu4v412 CROND[15256]: (root) CMD (LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok)
Sep 16 11:10:01 esu4v412 CROND[15261]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Code: Select all
tail -50 /usr/local/nagiosxi/var/cmdsubsys.log
............................................................
PROCESSED 0 COMMANDS
We noticed applying the config pegs the CPU. Seems like after that the server just doesn't sit well with itself in general. But if I boot the box and don't apply configuration, it's fine seems like.
It looks like MySQL is eating up a lot of CPU sporadically looking at it with Top. Just an observation, we have 4 cpu cores currently and are going to increase to 8 by next week.
Re: Nagios XI system status goes red & back to green often
Posted: Wed Sep 16, 2015 4:28 pm
by tgriep
Sounds like you could have a corrupt MYSQL database. Take a look at the /var/log/mysqld.log file for any errors.
If there are errors, run this in a shell to repair the database.
Code: Select all
cd /usr/local/nagiosxi/scripts
./repair_databases.sh
See if that helps out.
Re: Nagios XI system status goes red & back to green often
Posted: Thu Sep 17, 2015 10:06 am
by JakeHatMacys
tgriep wrote:Sounds like you could have a corrupt MYSQL database. Take a look at the /var/log/mysqld.log file for any errors.
If there are errors, run this in a shell to repair the database.
Code: Select all
cd /usr/local/nagiosxi/scripts
./repair_databases.sh
See if that helps out.
Didn't see any errors but rand the maint anyway to see if it'd help.
Rebooted the box and this is the current status:
Capture1.JPG
Re: Nagios XI system status goes red & back to green often
Posted: Thu Sep 17, 2015 10:35 am
by WillemDH
Hello,
Are you monitoring your iowait? As it's 22% in the last screenshot... Could you post a screenshot of your cpu iowait over time (7-30 days)? What storage are you on? I had some issues 6 months ago, after some troubleshooting it seemed that our SAN had a wrong config which prevented AST to work properly. After fixing the SAN issue, my problems were solved.
Grtz
Re: Nagios XI system status goes red & back to green often
Posted: Thu Sep 17, 2015 4:50 pm
by tmcdonald
And to address your original question, a good rule of thumb is to optimize at 10k objects, and split checks onto a new server at 20k. Optimization means a ramdisk, offloaded DB, gearman checks, etc.
Re: Nagios XI system status goes red & back to green often
Posted: Mon Sep 21, 2015 10:50 am
by ssax
Locking and marking as resolved.
This turned out to be a crashed nagios_logentries table, we also upped the limits in /etc/security/limits.conf (open files and open process) from 1024 to 4096. Make sure that you check if you have anything defined in /etc/limits.d/ that may be overriding the values you define in /etc/security/limits.conf.