Page 1 of 3
High Load on NagiosXI server
Posted: Mon Mar 15, 2021 1:14 pm
by Dusan.Mandic
Hello,
i am in the process of upgrading Nagios NRPE agent on all of our monitored hosts to 4.0.2. We recently updated our NagiosXI server to 5.7.1 (NRPE 4.0.3 plugin) and wanted to mitigate all the logging errors. Also, recently I had run the ramdisk script as we were getting file bloat from the servicedata file (ballooned to ~130 GB). I concatenated /dev/null to that file to reclaim space on our server, but now am getting some wild LOAD and MAX SERVICE LATENCY
Noticed about 117% CPU utilization from mysqld in top on the server. Does it just take a while to reoptimize?
Re: High Load on NagiosXI server
Posted: Tue Mar 16, 2021 1:36 pm
by dchurch
Hi!
Since Nagios XI 5.7.1, we've found and fixed a bug that under-utilized an index that lead to poor MySQL performance especially prominent on long-running systems and systems that have a lot of service / host checks. If you're not opposed to upgrading
yet again,
5.8.2 is out now with some performance improvements.
Nagios XI Change Log:
Nagios XI 5.8.2:
- NDO 3.0.6:
- - Increased performance for queries involving comment history and downtimes on large/long-running systems
Read on only if you don't want to update
In lieu of updating, we can do some things to fix some of the larger tables to help mitigate the database performance hit. Run this command:
Code: Select all
echo "select table_name as 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' from information_schema.TABLES where table_schema in ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table
Re: High Load on NagiosXI server
Posted: Wed Mar 17, 2021 9:35 am
by Dusan.Mandic
Here you are
Re: High Load on NagiosXI server
Posted: Wed Mar 17, 2021 4:38 pm
by dchurch
Your xi_auditlog table is over 2GB in size, which could be leading to some slowdown. This could be due to the database maintenance task not automatically running, so let's check on that.
What are the output from the following commands?
Code: Select all
mysql -unagiosxi -pn@gweb nagiosxi <<< 'select min(log_time) from xi_auditlog;'
mysql -unagiosxi -pn@gweb nagiosxi <<< "select * from xi_sysstat where metric = 'dbmaint'"
Re: High Load on NagiosXI server
Posted: Wed Mar 17, 2021 5:08 pm
by Dusan.Mandic
[xxx@xxx~]$ mysql -unagiosxi -pn@gweb nagiosxi <<< "select * from xi_sysstat where metric = 'dbmaint'"
sysstat_id metric value update_time
1 dbmaint a:1:{s:10:"last_check";i:1616018701;} 2021-03-17 17:05:01
[xxx@xxx~]$ sudo mysql -unagiosxi -pn@gweb nagiosxi <<< 'select min(log_time) from xi_auditlog;'
min(log_time)
2020-09-19 02:00:01
Re: High Load on NagiosXI server
Posted: Thu Mar 18, 2021 1:08 pm
by dchurch
Open
/usr/local/nagiosxi/html/config.inc.php and around line 40, change
"max_auditlog_age" => 180, to
"max_auditlog_age" => 30,
For example:
Code: Select all
$cfg['db_info'] = array(
"nagiosxi" => array(
"dbtype" => 'mysql',
"dbserver" => '',
"user" => 'nagiosxi',
"db" => 'nagiosxi',
"charset" => "utf8",
"dbmaint" => array( // variables affecting maintenance of db
"max_auditlog_age" => 30, // max time (in DAYS) to keep audit log entries
Then save the file.
Re: High Load on NagiosXI server
Posted: Fri Mar 19, 2021 3:28 pm
by Dusan.Mandic
Done.
How long will this take to pare down? Still shows around ~2GB
xi_auditlog | 2111.92
Re: High Load on NagiosXI server
Posted: Mon Mar 22, 2021 10:06 am
by dchurch
It should run daily. You'll know if it's run if you run this command and it returns a date less than or equal to 30 days ago.
Code: Select all
mysql -unagiosxi -pn@gweb nagiosxi <<< 'select min(log_time) from xi_auditlog;'
Re: High Load on NagiosXI server
Posted: Mon Mar 22, 2021 11:55 am
by Dusan.Mandic
[xxx@xxx ~]$ mysql -unagiosxi -pn@gweb nagiosxi <<< 'select min(log_time) from xi_auditlog;'
min(log_time)
2020-09-24 02:00:01
Doesn't look like its run since last September?
Re: High Load on NagiosXI server
Posted: Mon Mar 22, 2021 4:12 pm
by dchurch
Huh. The automatic process to delete old entries from that table seems to be not running. It could be that the database can't run, but I won't know that until we do some more investigating.
Try running the database repair script, and let me know if that is successful. Run the following as root from the terminal.
Code: Select all
/usr/local/nagiosxi/scripts/repair_databases.sh
See here for complete instructions:
run the database repair
If that doesn't fix it, please PM me a profile. Get one by going to Admin (top menu) => System Profile (in the left menu), then clicking the blue button.
If you're unable to generate the the profile through the web interface, please try generating it from the command line by running these commands as root:
Code: Select all
rm -rf /usr/local/nagiosxi/var/components/profile*
/usr/local/nagiosxi/scripts/components/getprofile.sh SUPPORT
Then send me the resulting
/usr/local/nagiosxi/var/components/profile.zip file.
If the profile script fails, please include the ENTIRE output.