XI Load warning 6+ hours
-
AUIInternetOps
- Posts: 6
- Joined: Fri Jan 11, 2019 8:34 am
XI Load warning 6+ hours
Please forgive anything we might have left out/not done. This is our first "ticket"/support post since switching to Nagios XI in April 2019.
Red Hat Enterprise Linux Server release 7.6 (Maipo)
Nagios 5.5.8
4 CPU
64 bit
Manual Install
free -m
total used free shared buff/cache available
Mem: 16046 1611 4454 817 9980 13287
Swap: 4095 98 3997
load average: 14.93, 14.76, 14.23
Profile.zip attached
Last week we noticed that the critical load warning for the nagios xi localhost was at ~15% every few days for 2-3 hours and then would drop down. We followed KB https://support.nagios.com/kb/article/n ... s-150.html and it seemed to drop it from 15% for a few hours to just ~6% for much less a time frame for the week.
But today, the load warning of ~15% has been on the board for 6+ hours now. When top is ran, it looks that there is 5+ httpd processes that have anywhere from 30% to 70% CPU each. httpd has not been stopped/restarted so we do not possibly cause further issue without know what it's doing.
Red Hat Enterprise Linux Server release 7.6 (Maipo)
Nagios 5.5.8
4 CPU
64 bit
Manual Install
free -m
total used free shared buff/cache available
Mem: 16046 1611 4454 817 9980 13287
Swap: 4095 98 3997
load average: 14.93, 14.76, 14.23
Profile.zip attached
Last week we noticed that the critical load warning for the nagios xi localhost was at ~15% every few days for 2-3 hours and then would drop down. We followed KB https://support.nagios.com/kb/article/n ... s-150.html and it seemed to drop it from 15% for a few hours to just ~6% for much less a time frame for the week.
But today, the load warning of ~15% has been on the board for 6+ hours now. When top is ran, it looks that there is 5+ httpd processes that have anywhere from 30% to 70% CPU each. httpd has not been stopped/restarted so we do not possibly cause further issue without know what it's doing.
Last edited by scottwilkerson on Mon Jan 13, 2020 3:33 pm, edited 1 time in total.
Reason: removed profile shared with support team
Reason: removed profile shared with support team
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: XI Load warning 6+ hours
Welcome to the forum!
Looking over your profile it appears you have some crashed tables in your database.
Please run the following procedure and lets see if it settles down
https://assets.nagios.com/downloads/nag ... tabase.pdf
Looking over your profile it appears you have some crashed tables in your database.
Please run the following procedure and lets see if it settles down
https://assets.nagios.com/downloads/nag ... tabase.pdf
-
AUIInternetOps
- Posts: 6
- Joined: Fri Jan 11, 2019 8:34 am
Re: XI Load warning 6+ hours
Thank you for the direction. Luckily about 20 minutes after I posted, the alarm finally cleared out. I will overlook the supplied PDF and set to the defined steps tomorrow as it's almost COB for us today.
Thanks again.
Thanks again.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: XI Load warning 6+ hours
Sounds good!AUIInternetOps wrote:Thank you for the direction. Luckily about 20 minutes after I posted, the alarm finally cleared out. I will overlook the supplied PDF and set to the defined steps tomorrow as it's almost COB for us today.
Thanks again.
-
AUIInternetOps
- Posts: 6
- Joined: Fri Jan 11, 2019 8:34 am
Re: XI Load warning 6+ hours
Logged in as root, I am attempting a backup (/usr/local/nagiosxi/scripts/backup_xi.sh) prior to running the table repair but I am getting the following error:
I see in the xi-sys.cfg the "mysqlpass" listed there, I am able to log in to maria/mysql via the comamnd line, and access the nagios database.
In my KB hunting, I am seeing reference to a line with "themysqlpass" in the backup_xi.sh script(though that doc is from 2017), but I do not see that in our script. Is there another reference to the password that I am not seeing/connecting?
Code: Select all
"Error backing up MySQL database 'nagios' - check the password in this script!"Code: Select all
[root@nagiosxi-1 ~]# mysql -u root -p
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 10923456
Server version: 5.5.60-MariaDB MariaDB Server
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> use nagios;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: XI Load warning 6+ hours
You are not going to be able to take a backup if the DB has crashed tables like it does. You must do the repair as it is
-
AUIInternetOps
- Posts: 6
- Joined: Fri Jan 11, 2019 8:34 am
Re: XI Load warning 6+ hours
I was also wondering that aspect. Thanks for the confirmation.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: XI Load warning 6+ hours
No problemAUIInternetOps wrote:I was also wondering that aspect. Thanks for the confirmation.
Let us know if the load continues to be high after the database repair.
-
AUIInternetOps
- Posts: 6
- Joined: Fri Jan 11, 2019 8:34 am
Re: XI Load warning 6+ hours
The table repair yesterday finished with no extraneous errors, and there hasn't been any loglines in the /var/log/mariadb/mariadb.log since yesterday. But we once again are currently showing a high load alert for the server (15.83, 14.53, 12.61). Could it be possible we need to just change the default alert values of the check? Other alerts look to still still come in during these high load times so it does not seem to currently be having an overall negative effect on the server. I will throw in that I've noticed when these high load times occur that the perfdata graphs do not display for the duration, but they show info for the high load time frame after the load subsides/returns to low.
New profile zip from today attached.
New profile zip from today attached.
Last edited by scottwilkerson on Wed Jan 15, 2020 9:27 am, edited 1 time in total.
Reason: removed profile from public access
Reason: removed profile from public access
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: XI Load warning 6+ hours
Possibly.AUIInternetOps wrote:Could it be possible we need to just change the default alert values of the check?
I would actually recommend adding more CPU's if this is in a virtual environment as the 4 CPU's are being gobbled up by httpd processes (not sure what else you may be running on this server)
from the top
Code: Select all
13974 apache 20 0 581164 32684 6684 R 83.3 0.2 0:51.65 httpd
18378 apache 20 0 576404 27704 6668 R 61.1 0.2 0:16.45 httpd
20199 apache 20 0 575936 27388 6904 R 61.1 0.2 0:18.90 httpd
17980 apache 20 0 577700 29032 6672 R 55.6 0.2 0:47.17 httpd
3208 apache 20 0 581532 32972 6716 R 50.0 0.2 0:59.55 httpd
7607 apache 20 0 584252 35548 6708 R 27.8 0.2 0:42.14 httpd
3238 apache 20 0 0 0 0 Z 11.1 0.0 0:07.72 httpdthis can be adjusted, the default stops processing performance data if the load goes over 10.AUIInternetOps wrote:I will throw in that I've noticed when these high load times occur that the perfdata graphs do not display for the duration, but they show info for the high load time frame after the load subsides/returns to low.
edit /usr/local/nagios/etc/pnp/npcd.cfg and change this
Code: Select all
load_threshold = 10.0Code: Select all
load_threshold = 40.0Code: Select all
service npcd restart