Nagios Support Forum

Posted: **Tue Sep 15, 2015 1:38 pm**

We're running around 3300 hosts on the box and about 10,000 service checks... Is this too much for XI to keep up with? Cause for concern? I'm not seeing anything out of the ordinary in the /var/log/messages file.

Capture1.JPG

Everything seems to be working ticketing wise (using the HP BSM Connector) but it's concerning to see.

Posted: **Tue Sep 15, 2015 4:06 pm**

Are you seeing any errors in any of these files?

Code: Select all

/usr/local/nagios/var/nagios.log
/usr/local/nagiosxi/var/feedproc.log
/usr/local/nagiosxi/var/cmdsubsys.log
/usr/local/nagiosxi/var/reportengine.log
/usr/local/nagiosxi/var/eventman.log

I'll take a look at the code and see how they're being checked.

Posted: **Tue Sep 15, 2015 4:16 pm**

Also, might want to see if you have anything in /var/log/cron

Posted: **Wed Sep 16, 2015 10:15 am**

Only ones to bring back anything really:

Code: Select all

$ tail -f /usr/local/nagiosxi/var/feedproc.log
..PHP Warning:  exec(): Unable to fork [php -q /usr/local/nagiosxi/scripts/parse_core_eventlog.php] in /usr/local/nagiosxi/cron/feedproc.php on line 88
.
PROCESSED 0 COMMANDS

tail -50 /usr/local/nagiosxi/var/eventman.log

- this log is just barking about the trap sender not being enabled.

Code: Select all

 tail -f cron
Sep 16 11:04:01 esu4v412 CROND[17733]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/nom.php > /usr/local/nagiosxi/var/nom.log 2>&1)
Sep 16 11:04:01 esu4v412 CROND[17741]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/local/nagiosxi/var/cmdsubsys.log 2>&1)
Sep 16 11:04:01 esu4v412 CROND[17742]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/local/nagiosxi/var/sysstat.log 2>&1)
Sep 16 11:04:01 esu4v412 CROND[17738]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cleaner.php > /usr/local/nagiosxi/var/cleaner.log 2>&1)
Sep 16 11:04:01 esu4v412 CROND[17737]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/perfdataproc.log 2>&1)
Sep 16 11:04:01 esu4v412 CROND[17739]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/reportengine.php > /usr/local/nagiosxi/var/reportengine.log 2>&1)
Sep 16 11:04:01 esu4v412 CROND[17740]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php > /usr/local/nagiosxi/var/feedproc.log 2>&1)
Sep 16 11:05:01 esu4v412 CROND[19023]: (root) CMD (LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok)
Sep 16 11:10:01 esu4v412 CROND[15256]: (root) CMD (LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok)
Sep 16 11:10:01 esu4v412 CROND[15261]: (root) CMD (/usr/lib64/sa/sa1 1 1)

Code: Select all

tail -50 /usr/local/nagiosxi/var/cmdsubsys.log
............................................................
PROCESSED 0 COMMANDS

We noticed applying the config pegs the CPU. Seems like after that the server just doesn't sit well with itself in general. But if I boot the box and don't apply configuration, it's fine seems like.

It looks like MySQL is eating up a lot of CPU sporadically looking at it with Top. Just an observation, we have 4 cpu cores currently and are going to increase to 8 by next week.

Posted: **Wed Sep 16, 2015 4:28 pm**

Sounds like you could have a corrupt MYSQL database. Take a look at the /var/log/mysqld.log file for any errors.
If there are errors, run this in a shell to repair the database.

Code: Select all

cd /usr/local/nagiosxi/scripts
./repair_databases.sh

See if that helps out.

Posted: **Thu Sep 17, 2015 10:06 am**

tgriep wrote:Sounds like you could have a corrupt MYSQL database. Take a look at the /var/log/mysqld.log file for any errors.
If there are errors, run this in a shell to repair the database.
Code: Select all
cd /usr/local/nagiosxi/scripts
./repair_databases.sh
See if that helps out.

Didn't see any errors but rand the maint anyway to see if it'd help.

Rebooted the box and this is the current status:

Capture1.JPG

Posted: **Thu Sep 17, 2015 10:35 am**

Hello,

Are you monitoring your iowait? As it's 22% in the last screenshot... Could you post a screenshot of your cpu iowait over time (7-30 days)? What storage are you on? I had some issues 6 months ago, after some troubleshooting it seemed that our SAN had a wrong config which prevented AST to work properly. After fixing the SAN issue, my problems were solved.

Grtz

Posted: **Thu Sep 17, 2015 4:50 pm**

And to address your original question, a good rule of thumb is to optimize at 10k objects, and split checks onto a new server at 20k. Optimization means a ramdisk, offloaded DB, gearman checks, etc.

Posted: **Mon Sep 21, 2015 10:50 am**

Locking and marking as resolved.

This turned out to be a crashed nagios_logentries table, we also upped the limits in /etc/security/limits.conf (open files and open process) from 1024 to 4096. Make sure that you check if you have anything defined in /etc/limits.d/ that may be overriding the values you define in /etc/security/limits.conf.

Nagios Support Forum

Nagios XI system status goes red & back to green often

Nagios XI system status goes red & back to green often

Re: Nagios XI system status goes red & back to green often

Re: Nagios XI system status goes red & back to green often

Re: Nagios XI system status goes red & back to green often

Re: Nagios XI system status goes red & back to green often

Re: Nagios XI system status goes red & back to green often

Re: Nagios XI system status goes red & back to green often

Re: Nagios XI system status goes red & back to green often

Re: Nagios XI system status goes red & back to green often