Performance issue with nagiosxi !!!

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
rajsshah
Posts: 30
Joined: Thu Dec 06, 2018 8:00 am

Performance issue with nagiosxi !!!

Post by rajsshah »

Hi Team
Due to poor performance issue and regular downtime of nagiosxi we are having doubts on our decision to choose nagiosxi as our monitoring solution. I would appreciate if you can help us in identifying the root cause .

Setup :
Nagios Frontend
Machine A Nagios frontend - RHEL 7.5 , 4 vcpus, 16G memory , 128GB Standard HDD disk attached.
Machine B Nagios frontend - RHEL 7.5 , 4 vcpus, 16G memory , 128GB Standard HDD disk attached.

Machine A & B are in Active-Passive mode using DRBD , pacemeker ( as recommended in nagios official site )

Nagios Backend
Machine C Nagios Backend ( mariadb)- RHEL 7.5 , 4 vcpus, 16G memory , 256GB Premium SSD disk attached. for mariaDB

Total Host configured : 299
Total Services configured : 3119

Problem : Cannot login to nagiosxi, page unresponsive. After rebooting all 3 machines the issue occur again 15-20 mins later.

Observation :
1. On Backend machine , I see high i/o wait ( some time reaching more than 80 % ) . Sometime we cannot even login to machine . All maridb connections are exhausted ( 400 connection )

2. Because of above behavior , fronted active machine some time I see connection exhausted error toward mariadb , some time both machine gets active ( split brain scenario ) . There is no STONITH enabled .

3. I checked the DB table size and xi_meta table was of size 72 GB and nagios_logentries was aroung 12 GB . After spending many days in debugging the issue , we truncated the tables and after that from last 2 days it seems to be working fine .

Question :
1. Truncating table is not a solution whenever the issue comes . Now I know that you will recommend that because of DB issue , the dbmaint Jobs in nagios which optimize and cleans up the table doesn't work . But I would like to also mentioned that many times , after restarting only the DB I also repaired and optimized the complete DB and then started the nagios , but the issue reoccurs after some time again.

2. With the default nagios settings , what is the max size of the xi_meta table you guys expects to grow ???

3. The i/o wait on DB server was mostly I guess because of this HUGE size of xi_meta and logentry table .

4. For enabling STONITH in nagios fronend drbd , pacemaker config , what can be used ?? or your recommendation . Please note the machines are AZURE virtual machines .

Any other suggesstions ??
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Performance issue with nagiosxi !!!

Post by tgriep »

The xi_meta table is used as a temporary table to store data for recently ran commands, and their output.
Under normal circumstances, the processes will clean out the data when it is done with it so that table should not get very large.
But, if there was a short loss of connection to the remote MariaDB server, that could of caused the table to become corrupt and the processes could not clean the old data out causing that table to grow.

The MariaDB connection issue could also cause that issue so try increasing the MariaDB Max Connections even larger than what is currently set.
https://support.nagios.com/kb/article/n ... s-513.html

We would have to see a system profile from the server to check the logs for any other clues on to the issues you are having.
Also, the GUI uses PHP scripts and it the default settings have not been increased, that could cause issues as well.
https://support.nagios.com/kb/article/n ... e-611.html

The logentry table can be set store less history which will decrease the size of it.
Go to the Admin > Performance Settings menu and click on the Databases Tab.
There yo can adjust the settings for the stored MYSQL data.
Decrease this "Max Log Entries Age:" to store less data in the logentries table.
This will effect the Event Log report.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked