Database optimization causing application hangs.

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
yo_marc
Posts: 83
Joined: Thu Aug 11, 2016 1:56 pm

Database optimization causing application hangs.

Post by yo_marc »

Hello Nagios Support,

I've got an XI server here (5.8.3 on Centos 7) with 1,200 Hosts and 11,000 Services. Over the last week I have been troubleshooting Nagios UI hangs or freezes. Checked obvious things like cpu load, memory usage, etc. Nothing stood out besides some iowait spiking in the yellow/red range at times - this is relatively new. We have a high API load against the server, so I revoked all API access to see if it would reduce IO. No real improvements. I had been investigating the iowait when...

Yesterday I signed in to see the server in a bad state, where the DB crashed and needed repair. Not sure what happened. Once repaired (repair_databases.sh), I spent the rest of the day witnessing and troubleshooting debilitating UI hangs and spikes of Host and Service Check Latency upwards of 600 seconds. The system was in a bad state.

I was able to trace the issues back to DB Optimization runs - particularly against the Audit Log, Log Entries, and State History tables. Especially the Audit Log. (The /var/lib/mysql/nagiosxi/xi_auditlog.ibd file was over 5gb). During the Optimization runs, the UI would hang and command processing seemed to stop. (The Monitoring Engine Event Queue would get stacked up, then Host/Service Check Latency would report the lag). Optimization was taking about 12 minutes to run on the Audit Log alone.

I stated to trim back the retention period of some of the tables, when I found some related threads here in the support forum:

https://support.nagios.com/forum/viewto ... 5&p=332540
https://support.nagios.com/forum/viewto ... 4&p=332084

I got more aggressive with reducing the retention period, and ended with the following on these tables:

Audit Log - 90 to 14 days
Log Entries - 90 to 14 days
State History - 720 to 180 days

This certainly helped things, but even with the optimization runs trimmed down to only take 1m 30s, I am still seeing occasional UI hangs and Host/Service Check Latency spiking in the 30-80 second range.

With all of that said!

- Are there any suggested DB tweaks, in Nagios or with Maria itself that might help with the current occasional UI and command-processing hangs?
- Are there any upcoming improvements to the area of app/UI performance, and avoiding Host/Service Check Latency during optimization runs?
- Any consideration being made of a rollback to the old separate ndo2db process which seemed to provide better DB performance? (my previous post about latency/performance issues when ndo2db went away: https://support.nagios.com/forum/viewto ... 16&t=60235 -- I have been seeing that behavior since upgrading off of 5.6.x)

I appreciate any help or pointers.

Thanks,
-marc
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Database optimization causing application hangs.

Post by benjaminsmith »

Hi Marc,

There could be an issue here with frequently occurring table corruption leading to performance loss. I would recommend scanning the database logs to see if that's an issue or not (for example: /var/log/mariadb/mariadb.log | grep crashed).

Let's also check the table sizes, please post the output to the following query.

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -uroot -pnagiosxi --table
For larger systems with ndo3, I would recommend increasing the max open connections on the database (if needed). Follow the steps in the kb article below.

Nagios XI - MySQL/MariaDB - Max Connections

Please send over the system profile and we'll check the other logs. If you did not have issues with kernel messages on this system, it may make sense to roll back to ndo2, but let's check the profile first.

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button

Lastly, is this a VM or a physical server?

Thanks,
Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
yo_marc
Posts: 83
Joined: Thu Aug 11, 2016 1:56 pm

Re: Database optimization causing application hangs.

Post by yo_marc »

Thank you for the info and suggestions - And I apologize for my delay. I will follow up with some more detailed info soon.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Database optimization causing application hangs.

Post by benjaminsmith »

Hi @yo_marc,

Thanks for the update, we''ll wait for your reply.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked