XI Load warning 6+ hours

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
AUIInternetOps
Posts: 6
Joined: Fri Jan 11, 2019 8:34 am

XI Load warning 6+ hours

Post by AUIInternetOps »

Please forgive anything we might have left out/not done. This is our first "ticket"/support post since switching to Nagios XI in April 2019.

Red Hat Enterprise Linux Server release 7.6 (Maipo)
Nagios 5.5.8
4 CPU
64 bit
Manual Install
free -m
total used free shared buff/cache available
Mem: 16046 1611 4454 817 9980 13287
Swap: 4095 98 3997

load average: 14.93, 14.76, 14.23

Profile.zip attached

Last week we noticed that the critical load warning for the nagios xi localhost was at ~15% every few days for 2-3 hours and then would drop down. We followed KB https://support.nagios.com/kb/article/n ... s-150.html and it seemed to drop it from 15% for a few hours to just ~6% for much less a time frame for the week.

But today, the load warning of ~15% has been on the board for 6+ hours now. When top is ran, it looks that there is 5+ httpd processes that have anywhere from 30% to 70% CPU each. httpd has not been stopped/restarted so we do not possibly cause further issue without know what it's doing.
Last edited by scottwilkerson on Mon Jan 13, 2020 3:33 pm, edited 1 time in total.
Reason: removed profile shared with support team
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: XI Load warning 6+ hours

Post by scottwilkerson »

Welcome to the forum!

Looking over your profile it appears you have some crashed tables in your database.

Please run the following procedure and lets see if it settles down
https://assets.nagios.com/downloads/nag ... tabase.pdf
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
AUIInternetOps
Posts: 6
Joined: Fri Jan 11, 2019 8:34 am

Re: XI Load warning 6+ hours

Post by AUIInternetOps »

Thank you for the direction. Luckily about 20 minutes after I posted, the alarm finally cleared out. I will overlook the supplied PDF and set to the defined steps tomorrow as it's almost COB for us today.

Thanks again.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: XI Load warning 6+ hours

Post by scottwilkerson »

AUIInternetOps wrote:Thank you for the direction. Luckily about 20 minutes after I posted, the alarm finally cleared out. I will overlook the supplied PDF and set to the defined steps tomorrow as it's almost COB for us today.

Thanks again.
Sounds good!
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
AUIInternetOps
Posts: 6
Joined: Fri Jan 11, 2019 8:34 am

Re: XI Load warning 6+ hours

Post by AUIInternetOps »

Logged in as root, I am attempting a backup (/usr/local/nagiosxi/scripts/backup_xi.sh) prior to running the table repair but I am getting the following error:

Code: Select all

"Error backing up MySQL database 'nagios' - check the password in this script!"
I see in the xi-sys.cfg the "mysqlpass" listed there, I am able to log in to maria/mysql via the comamnd line, and access the nagios database.

Code: Select all

[root@nagiosxi-1 ~]# mysql -u root -p                               
Enter password: 
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 10923456
Server version: 5.5.60-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> use nagios;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
In my KB hunting, I am seeing reference to a line with "themysqlpass" in the backup_xi.sh script(though that doc is from 2017), but I do not see that in our script. Is there another reference to the password that I am not seeing/connecting?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: XI Load warning 6+ hours

Post by scottwilkerson »

You are not going to be able to take a backup if the DB has crashed tables like it does. You must do the repair as it is
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
AUIInternetOps
Posts: 6
Joined: Fri Jan 11, 2019 8:34 am

Re: XI Load warning 6+ hours

Post by AUIInternetOps »

I was also wondering that aspect. Thanks for the confirmation.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: XI Load warning 6+ hours

Post by scottwilkerson »

AUIInternetOps wrote:I was also wondering that aspect. Thanks for the confirmation.
No problem

Let us know if the load continues to be high after the database repair.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
AUIInternetOps
Posts: 6
Joined: Fri Jan 11, 2019 8:34 am

Re: XI Load warning 6+ hours

Post by AUIInternetOps »

The table repair yesterday finished with no extraneous errors, and there hasn't been any loglines in the /var/log/mariadb/mariadb.log since yesterday. But we once again are currently showing a high load alert for the server (15.83, 14.53, 12.61). Could it be possible we need to just change the default alert values of the check? Other alerts look to still still come in during these high load times so it does not seem to currently be having an overall negative effect on the server. I will throw in that I've noticed when these high load times occur that the perfdata graphs do not display for the duration, but they show info for the high load time frame after the load subsides/returns to low.

New profile zip from today attached.
Last edited by scottwilkerson on Wed Jan 15, 2020 9:27 am, edited 1 time in total.
Reason: removed profile from public access
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: XI Load warning 6+ hours

Post by scottwilkerson »

AUIInternetOps wrote:Could it be possible we need to just change the default alert values of the check?
Possibly.

I would actually recommend adding more CPU's if this is in a virtual environment as the 4 CPU's are being gobbled up by httpd processes (not sure what else you may be running on this server)

from the top

Code: Select all

13974 apache    20   0  581164  32684   6684 R  83.3  0.2   0:51.65 httpd
18378 apache    20   0  576404  27704   6668 R  61.1  0.2   0:16.45 httpd
20199 apache    20   0  575936  27388   6904 R  61.1  0.2   0:18.90 httpd
17980 apache    20   0  577700  29032   6672 R  55.6  0.2   0:47.17 httpd
 3208 apache    20   0  581532  32972   6716 R  50.0  0.2   0:59.55 httpd
 7607 apache    20   0  584252  35548   6708 R  27.8  0.2   0:42.14 httpd
 3238 apache    20   0       0      0      0 Z  11.1  0.0   0:07.72 httpd
AUIInternetOps wrote:I will throw in that I've noticed when these high load times occur that the perfdata graphs do not display for the duration, but they show info for the high load time frame after the load subsides/returns to low.
this can be adjusted, the default stops processing performance data if the load goes over 10.

edit /usr/local/nagios/etc/pnp/npcd.cfg and change this

Code: Select all

load_threshold = 10.0
to this

Code: Select all

load_threshold = 40.0
then restart npcd

Code: Select all

service npcd restart
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked