Nagios XI system status goes red & back to green often
-
JakeHatMacys
- Posts: 281
- Joined: Thu Sep 25, 2014 3:21 pm
Nagios XI system status goes red & back to green often
We're running around 3300 hosts on the box and about 10,000 service checks... Is this too much for XI to keep up with? Cause for concern? I'm not seeing anything out of the ordinary in the /var/log/messages file.
Everything seems to be working ticketing wise (using the HP BSM Connector) but it's concerning to see.
Everything seems to be working ticketing wise (using the HP BSM Connector) but it's concerning to see.
You do not have the required permissions to view the files attached to this post.
Re: Nagios XI system status goes red & back to green often
Are you seeing any errors in any of these files?
I'll take a look at the code and see how they're being checked.
Code: Select all
/usr/local/nagios/var/nagios.log
/usr/local/nagiosxi/var/feedproc.log
/usr/local/nagiosxi/var/cmdsubsys.log
/usr/local/nagiosxi/var/reportengine.log
/usr/local/nagiosxi/var/eventman.logRe: Nagios XI system status goes red & back to green often
Also, might want to see if you have anything in /var/log/cron
-
JakeHatMacys
- Posts: 281
- Joined: Thu Sep 25, 2014 3:21 pm
Re: Nagios XI system status goes red & back to green often
Only ones to bring back anything really:
tail -50 /usr/local/nagiosxi/var/eventman.log
- this log is just barking about the trap sender not being enabled.
We noticed applying the config pegs the CPU. Seems like after that the server just doesn't sit well with itself in general. But if I boot the box and don't apply configuration, it's fine seems like.
It looks like MySQL is eating up a lot of CPU sporadically looking at it with Top. Just an observation, we have 4 cpu cores currently and are going to increase to 8 by next week.
Code: Select all
$ tail -f /usr/local/nagiosxi/var/feedproc.log
..PHP Warning: exec(): Unable to fork [php -q /usr/local/nagiosxi/scripts/parse_core_eventlog.php] in /usr/local/nagiosxi/cron/feedproc.php on line 88
.
PROCESSED 0 COMMANDS- this log is just barking about the trap sender not being enabled.
Code: Select all
tail -f cron
Sep 16 11:04:01 esu4v412 CROND[17733]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/nom.php > /usr/local/nagiosxi/var/nom.log 2>&1)
Sep 16 11:04:01 esu4v412 CROND[17741]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/local/nagiosxi/var/cmdsubsys.log 2>&1)
Sep 16 11:04:01 esu4v412 CROND[17742]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/local/nagiosxi/var/sysstat.log 2>&1)
Sep 16 11:04:01 esu4v412 CROND[17738]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cleaner.php > /usr/local/nagiosxi/var/cleaner.log 2>&1)
Sep 16 11:04:01 esu4v412 CROND[17737]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/perfdataproc.log 2>&1)
Sep 16 11:04:01 esu4v412 CROND[17739]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/reportengine.php > /usr/local/nagiosxi/var/reportengine.log 2>&1)
Sep 16 11:04:01 esu4v412 CROND[17740]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php > /usr/local/nagiosxi/var/feedproc.log 2>&1)
Sep 16 11:05:01 esu4v412 CROND[19023]: (root) CMD (LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok)
Sep 16 11:10:01 esu4v412 CROND[15256]: (root) CMD (LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok)
Sep 16 11:10:01 esu4v412 CROND[15261]: (root) CMD (/usr/lib64/sa/sa1 1 1)Code: Select all
tail -50 /usr/local/nagiosxi/var/cmdsubsys.log
............................................................
PROCESSED 0 COMMANDSIt looks like MySQL is eating up a lot of CPU sporadically looking at it with Top. Just an observation, we have 4 cpu cores currently and are going to increase to 8 by next week.
Re: Nagios XI system status goes red & back to green often
Sounds like you could have a corrupt MYSQL database. Take a look at the /var/log/mysqld.log file for any errors.
If there are errors, run this in a shell to repair the database.
See if that helps out.
If there are errors, run this in a shell to repair the database.
Code: Select all
cd /usr/local/nagiosxi/scripts
./repair_databases.shBe sure to check out our Knowledgebase for helpful articles and solutions!
-
JakeHatMacys
- Posts: 281
- Joined: Thu Sep 25, 2014 3:21 pm
Re: Nagios XI system status goes red & back to green often
Didn't see any errors but rand the maint anyway to see if it'd help.tgriep wrote:Sounds like you could have a corrupt MYSQL database. Take a look at the /var/log/mysqld.log file for any errors.
If there are errors, run this in a shell to repair the database.See if that helps out.Code: Select all
cd /usr/local/nagiosxi/scripts ./repair_databases.sh
Rebooted the box and this is the current status:
You do not have the required permissions to view the files attached to this post.
Re: Nagios XI system status goes red & back to green often
Hello,
Are you monitoring your iowait? As it's 22% in the last screenshot... Could you post a screenshot of your cpu iowait over time (7-30 days)? What storage are you on? I had some issues 6 months ago, after some troubleshooting it seemed that our SAN had a wrong config which prevented AST to work properly. After fixing the SAN issue, my problems were solved.
Grtz
Are you monitoring your iowait? As it's 22% in the last screenshot... Could you post a screenshot of your cpu iowait over time (7-30 days)? What storage are you on? I had some issues 6 months ago, after some troubleshooting it seemed that our SAN had a wrong config which prevented AST to work properly. After fixing the SAN issue, my problems were solved.
Grtz
Nagios XI 5.8.1
https://outsideit.net
https://outsideit.net
Re: Nagios XI system status goes red & back to green often
And to address your original question, a good rule of thumb is to optimize at 10k objects, and split checks onto a new server at 20k. Optimization means a ramdisk, offloaded DB, gearman checks, etc.
Former Nagios employee
Re: Nagios XI system status goes red & back to green often
Locking and marking as resolved.
This turned out to be a crashed nagios_logentries table, we also upped the limits in /etc/security/limits.conf (open files and open process) from 1024 to 4096. Make sure that you check if you have anything defined in /etc/limits.d/ that may be overriding the values you define in /etc/security/limits.conf.
This turned out to be a crashed nagios_logentries table, we also upped the limits in /etc/security/limits.conf (open files and open process) from 1024 to 4096. Make sure that you check if you have anything defined in /etc/limits.d/ that may be overriding the values you define in /etc/security/limits.conf.