Database trouble

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
cbeattie-unitrends
Posts: 84
Joined: Mon Oct 10, 2016 2:51 pm

Database trouble

Post by cbeattie-unitrends »

Nagios initially seemed to be running okay with 700 hosts, but the drive ran out of space (originally 50GB) shortly after we added about 400 more hosts.
I expanded the drive to 150GB, ran database repair, but it doesn't seem to have really recovered. We are up to 1250 hosts and 26K services. Nagios won't stay up overnight because the OOM killer goes to work.
Even at 150GB, the drive is nearly out of space again.

What version of Nagios XI are you using?
5.3.0
Linux Distribution and version?
CentOS 6.8
32 or 64bit?
64-bit, 14 CPUs, 32GB of memory
VMware Image or Manual Install of XI?
Manual install
Are there special configurations on your system, ie; is Gnome installed? Are you using a proxy? Are you using SSL?
No special configuration

I have to run /usr/local/nagiosxi/scripts/repair_databases.sh every day. xi_meta in particular has millions more data records each time:

Code: Select all

- recovering (with sort) MyISAM-table 'xi_meta.MYI'
Data records: 84161452
And its files are huge:

Code: Select all

-rw-rw---- 1 mysql mysql  60G Nov  1 06:42 xi_meta.MYD
-rw-rw---- 1 mysql mysql 1.0K Nov  1 07:00 xi_meta.MYI
-rw-rw---- 1 mysql mysql  54G Nov  1 07:37 xi_meta.TMD
I found another thread from someone else having a similar-sounding problem, so I tried the advice they got, too. Running "mysqlcheck -f -r -u root -pnagiosxi --all-databases --use_frm" takes a long time and shows a lot of things changing from 0 to some number:

Code: Select all

nagiosxi.xi_commands
warning  : Number of rows changed from 0 to 20
status   : OK
nagiosxi.xi_eventqueue
warning  : Number of rows changed from 0 to 823593
status   : OK
nagiosxi.xi_events
warning  : Number of rows changed from 0 to 22528181
status   : OK
nagiosxi.xi_incidents                              OK
I've administered similarly-sized Nagios Core installations (version 3 and version 4) in the past, but Nagios XI's databases are still new to me.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Database trouble

Post by tmcdonald »

Generally speaking, I do not recommend running more than about 20k total checks on a single XI server, so 27k is definitely going to take a hit performance-wise. While there are a ton of variables in play here, my advice is:
  • I would allocate about 500GB at a minimum to handle all of the historical data you will have, logs, configurations, etc.
  • If you do not have a RAM disk in place, now would be a good time to do so: https://assets.nagios.com/downloads/nag ... giosXI.pdf
  • Start looking into tweaking check frequency to cut down on how much is running concurrently (disk checks generally can be done every 15 minutes as opposed to a ping which should be on the minute)
  • You might also consider looking into retention settings for the database, shortening these if necessary - look under Admin -> Performance Settings -> Databases
If you have questions about anything, let us know!
Former Nagios employee
Locked