Page 1 of 2
Problems after upgrade from 5.6.14 to 5.8.1
Posted: Tue Feb 09, 2021 10:56 am
by cnanational_mark
Hello,
We upgraded from Nagios 5.6.14 to 5.8.1 about a week ago. Since then I've seen a few problems that I've not been able to correct. Since the upgrade, I've seen really high mariadb activity and at times, service checks do not update.
Most of the time, mariadb is using 100-200% cpu (as reported by top, see attachment). This does not seem to affect the rest of the system and it's pretty responsive. It doesn't seem to be running out of cpu or memory.
Also, Nagios will stop processing checks at times. I haven't verified it yet but it seems to happen when scheduled downtime events take place and when they clear. We have a nightly blackout period for some systems and it seems to stop processing when those events start and stop. Maybe high database load is stopping processing?
I looked at some of the articles about tuning the database but they really didn't seem to apply. Also, this wasn't a problem prior to 5.8.1. i did follow the steps in this article:
https://support.nagios.com/kb/article.php?id=139 as the symptoms seem to be close to what I'm seeing (minus the log file warnings about the message queue). It didn't seem to help.
At this point, I'm not really sure where to go next. Any ideas?
Thanks,
Mark Maynard
Moderator's Note: The profile has been shared with the support team but has been removed from the public forum.
Re: Problems after upgrade from 5.6.14 to 5.8.1
Posted: Tue Feb 09, 2021 4:51 pm
by dchurch
Please run the following commands and post their output:
Code: Select all
mysql -uroot -p <<< 'SHOW ENGINE INNODB STATUS\G'
mysql -unagiosql nagiosql <<< "SHOW WARNINGS"
mysql -unagiosql <<< 'SHOW full processlist\G'
Have you tried running the database repair script? Run the following as root from the terminal:
Code: Select all
/usr/local/nagiosxi/scripts/repair_databases.sh
(See here for complete instructions:
run the database repair)
Re: Problems after upgrade from 5.6.14 to 5.8.1
Posted: Tue Feb 09, 2021 7:12 pm
by cnanational_mark
I've attached a file with the output from the first command. The other two commands had an account error:
[root@mvsc1lx0128 ~]# mysql -unagiosql nagiosql <<< "SHOW WARNINGS"
ERROR 1045 (28000): Access denied for user 'nagiosql'@'localhost' (using password: NO)
[root@mvsc1lx0128 ~]# mysql -unagiosql <<< 'SHOW full processlist\G'
ERROR 1045 (28000): Access denied for user 'nagiosql'@'localhost' (using password: NO)
Also, I did try running a database repair when the problem first showed up. I'm not certain it helped as I restarted nagios at the same time and that seems to be the thing that temporarily solved the problem.
Thanks,
Mark
Re: Problems after upgrade from 5.6.14 to 5.8.1
Posted: Wed Feb 10, 2021 12:15 pm
by dchurch
This really shouldn't be causing this, but I noticed that the worker process executing the host check for host
DPMS_5th_Flr_Board_rm would time out at 30.1 seconds. The
check_ping to check that host takes exactly 30 seconds. This may be causing other checks to fail. Possibly try removing the host check for that host, or deactivating that host to see if that causes the CPU usage to clear up.
What's the outputs of the following commands?:
Code: Select all
mysql -unagiosql -pn@gweb <<< 'show full processlist\G'
mysql -unagiosql -pn@gweb <<< 'show warnings'
Another thing you may want to check out is Admin => Unconfigured Objects to see if these passive checks coming in are being correctly attributed to hosts they belong to. If there's a host you expected to be configured, but isn't, then
Re: Problems after upgrade from 5.6.14 to 5.8.1
Posted: Wed Feb 10, 2021 3:52 pm
by cnanational_mark
Output from this command: mysql -unagiosql -pn@gweb <<< 'show full processlist\G' is in the attached file "out2.txt"
There was no output from this command: mysql -unagiosql -pn@gweb <<< 'show warnings'
[root@mvsc1lx0128 ~]# mysql -unagiosql -pn@gweb <<< 'show warnings'
[root@mvsc1lx0128 ~]#
I removed the check for DPMS_5th_Flr_Board_rm and that did make a change as the cpu load for mysqld dropped from 200% to ~100%. However, i did have a couple of instances this morning where I had to restart the monitoring engine as it had stalled.
One thing that happened when I removed the service check is that, although it stopped checking the service, the service entry still appeared in the service status page but it was grayed out. I suspected that the database might need another repair so I ran the database repair script and after that, the service check went away.
Right now, mysqld is still showing ~100% cpu usage but service checks seem to be working although if it follows its usual pattern it will stop sometime in the early evening.
Thanks,
Mark
Re: Problems after upgrade from 5.6.14 to 5.8.1
Posted: Wed Feb 10, 2021 4:45 pm
by dchurch
Would it be possible to use
mytop to view a list of current queries?
Code: Select all
yum install mytop
mytop -undoutils -pn@gweb
Re: Problems after upgrade from 5.6.14 to 5.8.1
Posted: Wed Feb 10, 2021 4:51 pm
by cnanational_mark
MySQL on localhost (5.5.60-MariaDB) up 0+01:27:03 [14:50:49]
Queries: 963.0k qps: 189 Slow: 503.0 Se/In/Up/De(%): 124/18/04/10
qps now: 167 Slow qps: 0.2 Threads: 52 ( 2/ 0) 106/22/01/22
Cache Hits: 565.8k Hits/s: 110.9 Hits now: 86.9 Ratio: 47.5% Ratio now: 49.1%
Key Efficiency: 94.6% Bps in/out: 21.2k/31.7k Now in/out: 18.7k/15.3k
Id User Host/IP DB Time Cmd Query or State
-- ---- ------- -- ---- --- ----------
3701 ndoutils localhost nagios 0 Sleep
4084 ndoutils localhost test 0 Query show full processlist
3708 ndoutils localhost nagios 3 Execut UPDATE nagios_commenthistory SET deletion_time = FROM_UNIXTIME(?), deletion_t
4102 ndoutils localhost nagios 7 Sleep
1217 ndoutils localhost nagios 9 Sleep
67 ndoutils localhost nagios 11 Sleep
1314 ndoutils localhost nagios 11 Sleep
198 ndoutils localhost nagios 21 Sleep
1266 ndoutils localhost nagios 21 Sleep
1414 ndoutils localhost nagios 39 Sleep
4094 ndoutils localhost nagios 47 Sleep
4096 ndoutils localhost nagios 47 Sleep
4103 ndoutils localhost nagios 47 Sleep
4112 ndoutils localhost nagios 47 Sleep
4121 ndoutils localhost nagios 47 Sleep
1120 ndoutils localhost nagios 51 Sleep
1509 ndoutils localhost nagios 71 Sleep
3845 ndoutils localhost nagios 71 Sleep
3901 ndoutils localhost nagios 81 Sleep
3702 ndoutils localhost nagios 110 Sleep
Also, it's required another restart since the last post so it seems to be a bit worse.
Thanks,
Mark
Re: Problems after upgrade from 5.6.14 to 5.8.1
Posted: Wed Feb 10, 2021 5:04 pm
by cnanational_mark
Another screenshot from a few minutes later:
Queries: 1.3M qps: 225 Slow: 575.0 Se/In/Up/De(%): 134/15/03/09
qps now: 9 Slow qps: 0.2 Threads: 52 ( 3/ 0) 77/23/17/02
Cache Hits: 848.3k Hits/s: 145.5 Hits now: 2.2 Ratio: 48.2% Ratio now: 30.6%
Key Efficiency: 91.4% Bps in/out: 24.0k/44.8k Now in/out: 996.2/ 5.3k
Id User Host/IP DB Time Cmd Query or State
-- ---- ------- -- ---- --- ----------
4084 ndoutils localhost test 0 Query show full processlist
3701 ndoutils localhost nagios 1 Execut INSERT INTO nagios_commenthistory (instance_id, comment_type, entry_
3708 ndoutils localhost nagios 1 Execut UPDATE nagios_commenthistory SET deletion_time = FROM_UNIXTIME(?), d
67 ndoutils localhost nagios 8 Sleep
4690 ndoutils localhost nagios 16 Sleep
4695 ndoutils localhost nagios 16 Sleep
4696 ndoutils localhost nagios 16 Sleep
4703 ndoutils localhost nagios 16 Sleep
4708 ndoutils localhost nagios 16 Sleep
4714 ndoutils localhost nagios 16 Sleep
1217 ndoutils localhost nagios 20 Sleep
1314 ndoutils localhost nagios 20 Sleep
1509 ndoutils localhost nagios 20 Sleep
3845 ndoutils localhost nagios 38 Sleep
1120 ndoutils localhost nagios 40 Sleep
3901 ndoutils localhost nagios 40 Sleep
1266 ndoutils localhost nagios 50 Sleep
198 ndoutils localhost nagios 80 Sleep
1414 ndoutils localhost nagios 80 Sleep
3702 ndoutils localhost nagios 137 Sleep
Thanks,
Mark
Re: Problems after upgrade from 5.6.14 to 5.8.1
Posted: Wed Feb 10, 2021 5:13 pm
by cnanational_mark
After watching for a while, it looks like when this process showed up:
3701 ndoutils localhost nagios 1 Execut INSERT INTO nagios_commenthistory (instance_id, comment_type, entry_
Nagios stopped updating service checks.
Thanks,
Mark
Re: Problems after upgrade from 5.6.14 to 5.8.1
Posted: Thu Feb 11, 2021 1:00 pm
by dchurch
What's the output from the following command?
Code: Select all
mysql -undoutils -pn@gweb nagios <<< 'select count(*) from nagios_commenthistory union all select count(*) from nagios_commenthistory where deletion_time > from_unixtime(0);'