Nagios XI system status goes red & back to green often

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
JakeHatMacys
Posts: 281
Joined: Thu Sep 25, 2014 3:21 pm

Nagios XI system status goes red & back to green often

Post by JakeHatMacys »

We're running around 3300 hosts on the box and about 10,000 service checks... Is this too much for XI to keep up with? Cause for concern? I'm not seeing anything out of the ordinary in the /var/log/messages file.
Capture1.JPG
Everything seems to be working ticketing wise (using the HP BSM Connector) but it's concerning to see.
You do not have the required permissions to view the files attached to this post.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Nagios XI system status goes red & back to green often

Post by ssax »

Are you seeing any errors in any of these files?

Code: Select all

/usr/local/nagios/var/nagios.log
/usr/local/nagiosxi/var/feedproc.log
/usr/local/nagiosxi/var/cmdsubsys.log
/usr/local/nagiosxi/var/reportengine.log
/usr/local/nagiosxi/var/eventman.log
I'll take a look at the code and see how they're being checked.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Nagios XI system status goes red & back to green often

Post by ssax »

Also, might want to see if you have anything in /var/log/cron
JakeHatMacys
Posts: 281
Joined: Thu Sep 25, 2014 3:21 pm

Re: Nagios XI system status goes red & back to green often

Post by JakeHatMacys »

Only ones to bring back anything really:

Code: Select all

$ tail -f /usr/local/nagiosxi/var/feedproc.log
..PHP Warning:  exec(): Unable to fork [php -q /usr/local/nagiosxi/scripts/parse_core_eventlog.php] in /usr/local/nagiosxi/cron/feedproc.php on line 88
.
PROCESSED 0 COMMANDS
tail -50 /usr/local/nagiosxi/var/eventman.log

- this log is just barking about the trap sender not being enabled.

Code: Select all

 tail -f cron
Sep 16 11:04:01 esu4v412 CROND[17733]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/nom.php > /usr/local/nagiosxi/var/nom.log 2>&1)
Sep 16 11:04:01 esu4v412 CROND[17741]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/local/nagiosxi/var/cmdsubsys.log 2>&1)
Sep 16 11:04:01 esu4v412 CROND[17742]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/local/nagiosxi/var/sysstat.log 2>&1)
Sep 16 11:04:01 esu4v412 CROND[17738]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cleaner.php > /usr/local/nagiosxi/var/cleaner.log 2>&1)
Sep 16 11:04:01 esu4v412 CROND[17737]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/perfdataproc.log 2>&1)
Sep 16 11:04:01 esu4v412 CROND[17739]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/reportengine.php > /usr/local/nagiosxi/var/reportengine.log 2>&1)
Sep 16 11:04:01 esu4v412 CROND[17740]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php > /usr/local/nagiosxi/var/feedproc.log 2>&1)
Sep 16 11:05:01 esu4v412 CROND[19023]: (root) CMD (LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok)
Sep 16 11:10:01 esu4v412 CROND[15256]: (root) CMD (LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok)
Sep 16 11:10:01 esu4v412 CROND[15261]: (root) CMD (/usr/lib64/sa/sa1 1 1)

Code: Select all

tail -50 /usr/local/nagiosxi/var/cmdsubsys.log
............................................................
PROCESSED 0 COMMANDS
We noticed applying the config pegs the CPU. Seems like after that the server just doesn't sit well with itself in general. But if I boot the box and don't apply configuration, it's fine seems like.

It looks like MySQL is eating up a lot of CPU sporadically looking at it with Top. Just an observation, we have 4 cpu cores currently and are going to increase to 8 by next week.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagios XI system status goes red & back to green often

Post by tgriep »

Sounds like you could have a corrupt MYSQL database. Take a look at the /var/log/mysqld.log file for any errors.
If there are errors, run this in a shell to repair the database.

Code: Select all

cd /usr/local/nagiosxi/scripts
./repair_databases.sh
See if that helps out.
Be sure to check out our Knowledgebase for helpful articles and solutions!
JakeHatMacys
Posts: 281
Joined: Thu Sep 25, 2014 3:21 pm

Re: Nagios XI system status goes red & back to green often

Post by JakeHatMacys »

tgriep wrote:Sounds like you could have a corrupt MYSQL database. Take a look at the /var/log/mysqld.log file for any errors.
If there are errors, run this in a shell to repair the database.

Code: Select all

cd /usr/local/nagiosxi/scripts
./repair_databases.sh
See if that helps out.
Didn't see any errors but rand the maint anyway to see if it'd help.

Rebooted the box and this is the current status:
Capture1.JPG
You do not have the required permissions to view the files attached to this post.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Nagios XI system status goes red & back to green often

Post by WillemDH »

Hello,

Are you monitoring your iowait? As it's 22% in the last screenshot... Could you post a screenshot of your cpu iowait over time (7-30 days)? What storage are you on? I had some issues 6 months ago, after some troubleshooting it seemed that our SAN had a wrong config which prevented AST to work properly. After fixing the SAN issue, my problems were solved.

Grtz
Nagios XI 5.8.1
https://outsideit.net
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Nagios XI system status goes red & back to green often

Post by tmcdonald »

And to address your original question, a good rule of thumb is to optimize at 10k objects, and split checks onto a new server at 20k. Optimization means a ramdisk, offloaded DB, gearman checks, etc.
Former Nagios employee
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Nagios XI system status goes red & back to green often

Post by ssax »

Locking and marking as resolved.

This turned out to be a crashed nagios_logentries table, we also upped the limits in /etc/security/limits.conf (open files and open process) from 1024 to 4096. Make sure that you check if you have anything defined in /etc/limits.d/ that may be overriding the values you define in /etc/security/limits.conf.
Locked