Page 1 of 5

sucessful migration - mysql issues

Posted: Fri Jun 12, 2020 7:16 am
by veehexx
edit: see post2, not related to what i thought was a migration issue.

we've been running the pre-built hyperv VM since we bought XI, which runs on centos6. Thats EOL this year and with XI 5.7.1 now supporting centos8, i'm in the process of migrating.

firstup, our centos6 XI install works well - on 5.7.1, every working and a backup is taken from the webUI.
Rebuild a new VM running centos 8 stream, minimal install, dnf update and follow the installation pdf with the install.sh script method which appeared to be sucessful. 5.7.1 XI WebUI is accessible and working as expected.

rsync the backup tar file between these 2 hosts, and perform a restore as per instructions. Below is the full output which sugests all is good. I am assuming the warning is just that and not the cause of the issue. I'm not aware we use any custom plugins, binaries or special httpd config (other than using a public cert instead of the default self-signed)
# /usr/local/nagiosxi/scripts/restore_xi.sh /store/backups/nagiosxi/1591959402.tar.gz
WARNING: you are trying to restore a el6 backup on a el8 system
Compiled plugins and other binaries as well as httpd configurations
will NOT be restored.

You will need to re-download the Nagios XI tarball, and re-install
the subcomponents for this system. More info here:
https://assets.nagios.com/downloads/nag ... ios-XI.pdf

Are you sure you want to continue? [y/N] y
TS=1591962456
Extracting backup to /store/backups/nagiosxi/1591962456-restore...
In /store/backups/nagiosxi/1591962456-restore/1591959402...
Backup files look okay. Preparing to restore...
Shutting down services...
Restoring directories to /...
Restoring Nagios Core...
Restoring Nagios XI...
Restoring NRDP backups...
Restoring MRTG...
Restoring SNMP configuration files...
Restoring SNMP MIBs...
Restoring Nagvis backups...
Restoring nagios home dir...
Restoring MySQL databases...
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
Restoring Nagios XI MySQL database...
mysql: [Warning] Using a password on the command line interface can be insecure.
Restarting database servers...
Restoring Apache cronjobs...
Restoring logrotate config files...
Skipping Apache config files restoration

===============
RESTORE COMPLETE
===============
this is where i start to see issues, so i'll list out the issues i've found so far.
- mysql has 2-3 processes at 100% CPU time. current load avg is 4.3 3.7 2.0.
- Admin > System Status is not showing CPU stats, but others stats appearing correctly
- admin > Monitoring Engine Status is not showing info in the Engine check stats, Engine perf or Engine event queue.
- While nagios email alerts appear to be working, i see no results (0 hosts & services) in any of the Quick View or Details screens, Maps is intermittent - for example minemap is empty but hypermap and network status map do at least show the hosts.
- Core Config Manager shows the expected host & services and appears fine.

it appears something is broken but i cant see what. since mysql is showing consistently high cpu time i'd say it's all related to a db issue.
SELinux is disabled by the install.sh script so its not that getting in the way.

any ideas how to fix?

mysql high cpu usage

Posted: Fri Jun 12, 2020 10:06 am
by veehexx
closing in on the weekend so i reverted back to our old centos6 nagiosXI install. thats got high CPU time on mysqld now.

seems the issue isnt related to the upgrade at all and that just highlighted it.
DB repair hasnt fixed it the issue so now looking

Re: sucessful migration - mysql issues

Posted: Fri Jun 12, 2020 1:37 pm
by benjaminsmith
Hi @veehexx,

I'd like to summarize a few items, so I have a good understanding of the issues here. First, you've upgraded your current CentOS 6 system to Nagios XI 5.7.1 and this server is now working without ok besides the CPU load issue.

Next, you installed 5.7.1 on a new VM running Cent OS 8 and then performed a backup and restore from the Cent 6 VM. However, the Cent 8 system, now has the following errors:
- mysql has 2-3 processes at 100% CPU time. current load avg is 4.3 3.7 2.0.
- Admin > System Status is not showing CPU stats, but others stats appearing correctly
- admin > Monitoring Engine Status is not showing info in the Engine check stats, Engine perf or Engine event queue.
- While nagios email alerts appear to be working, i see no results (0 hosts & services) in any of the Quick View or Details screens, Maps is intermittent - for example minemap is empty but hypermap and network status map do at least show the hosts.
- Core Config Manager shows the expected host & services and appears fine.
Some of the those conditions were fixed in the 5.7.1 releases. A few questions. Do you have an offloaded database? Also, is this system running Postgres for the nagisoxi database. If the following command returns 2, that would be the case.

Code: Select all

grep mysql /usr/local/nagiosxi/html/config.inc.php | wc -l
Also, could you PM a current profile for the Cent OS 8 system, so we can review the logs. Thanks, Benjamin

To send us your system profile.
Login to the Nagios XI GUI using a web browser
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share this in a private message and then reply to this post to bring it up in the queue.

Re: sucessful migration - mysql issues

Posted: Sat Jun 13, 2020 7:42 am
by veehexx
benjaminsmith wrote:Hi @veehexx,

I'd like to summarize a few items, so I have a good understanding of the issues here. First, you've upgraded your current CentOS 6 system to Nagios XI 5.7.1 and this server is now working without ok besides the CPU load issue.
yes, the centos6 VM was upgraded to 5.7.1. VM is the nagios pre-built HyperV vhd image that we've ran for 4yrs or so since we bought XI.
Next, you installed 5.7.1 on a new VM running Cent OS 8 and then performed a backup and restore from the Cent 6 VM. However, the Cent 8 system, now has the following errors:
- mysql has 2-3 processes at 100% CPU time. current load avg is 4.3 3.7 2.0.
- Admin > System Status is not showing CPU stats, but others stats appearing correctly
- admin > Monitoring Engine Status is not showing info in the Engine check stats, Engine perf or Engine event queue.
- While nagios email alerts appear to be working, i see no results (0 hosts & services) in any of the Quick View or Details screens, Maps is intermittent - for example minemap is empty but hypermap and network status map do at least show the hosts.
- Core Config Manager shows the expected host & services and appears fine.
while the issues still remain & are accurate, it seems the high CPU time is also present on our centos6/5.7.1.

Some of the those conditions were fixed in the 5.7.1 releases. A few questions. Do you have an offloaded database? Also, is this system running Postgres for the nagisoxi database. If the following command returns 2, that would be the case.

Code: Select all

grep mysql /usr/local/nagiosxi/html/config.inc.php | wc -l
Offloaded DB - no. we're around 107hosts, 1100 services so don't need offloading. I used the nagios install.sh to deploy 5.7.1 which i believe uses MySQL as default.

Also, could you PM a current profile for the Cent OS 8 system, so we can review the logs. Thanks, Benjamin

To send us your system profile.
Login to the Nagios XI GUI using a web browser
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share this in a private message and then reply to this post to bring it up in the queue.


i assume you want the profile of the restored backup high-cpu load state?

the centos8/5.7.1 VM is snapshotted and by the end of Friday i reverted to my pre-backup restore snapshot to try as a clean deployment. there's no performance issues with a clean install so definitely triggered by the backup restore.
Also since my last response, it seems i'm also having issues with nscp (clients are latest stable version) with the centos8 deployment. sslv3 handshake issue according to nsclient logs and manually running 'check_nrpe -n ...' for no SSL also requires client side nsclient.ini to have 'use ssl = 0' defined. that one is a bit strange since the only change has been server side.

i then got as far as testing NCPA as a replacement for NSCP, which seems to be the way forward so pending further testing to see suitability.

Re: sucessful migration - mysql issues

Posted: Mon Jun 15, 2020 2:56 am
by veehexx
file PM'd to you benjaminsmith

Re: sucessful migration - mysql issues

Posted: Mon Jun 15, 2020 3:12 pm
by benjaminsmith
Hi @Veehex,

Thanks for sending over the profile, I believe once the restore completed it may have has resulted in some database issues. For some reason, the database log was not in the profile, could you fetch this log from the Cent 8 server, and post it to the thread?

Also, I would just go ahead and run the repair script then, to see if the load comes down.

Code: Select all

/usr/local/nagiosxi/scripts/repair_databases.sh
Trying to confirm the issue with the CPU stats, are you able to run the following command successfully on the server?

Code: Select all

/usr/bin/iostat -c 5 2 | tail --lines=2 | head --lines=1 | awk '{ print $1,$2,$3,$4,$5,$6 }'
Regarding the issue with minemap, can you run the following tail command, load this page and post the output. Thanks, Benjamin

Code: Select all

tail -f /var/log/httpd/*error.log

Re: sucessful migration - mysql issues

Posted: Tue Jun 16, 2020 3:14 am
by veehexx
repair db done - no change, still high CPU.

your iostat cmd failed to return anything, but if i run '/usr/bin/iostat -c 5 2', then i get the following

Code: Select all

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          29.81    0.11   45.70    0.46    0.00   23.92


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          15.26    0.00   35.14    0.00    0.00   49.60
'tail -f /var/log/httpd/error_log' (*error.log doesnt exist) has no errors logged.

minemap shows the correct page layout, but get 'no data to display' text
similar with Host Status 'No matching host found' or Service Status 'No matching services found'. I'm assuming this points to a db issue where it cannot query.
these pages load instantly too - eg, no 30sec wait while something times out. almost like it's a fresh install with no hosts or services defined.

Re: sucessful migration - mysql issues

Posted: Tue Jun 16, 2020 3:57 am
by veehexx
high sql load is triggered by the Monitoring Engine process.
even with all the Process Settings disabled, if the parent ME process is running then i see high SQL CPU%. stopping ME will eventually (after about 30seconds) return to idle cpu load. starting ME again will instantly raise CPU usage.

Re: sucessful migration - mysql issues

Posted: Tue Jun 16, 2020 4:40 am
by veehexx
done some futher poking about on the system with a focus on /usr/local/nagios/var/nagios.log

some hosts had no contact/contactgroup. they were using the generic_host template which didnt have a contact defined so added that which got rid of a number of warnings.

i'm now seeing NDO issues.
havent proved it 100% yet, but it appears the following has some truth in it: restart the Montioring Engine process and see that CPU time is high. after a while the following appears in the above nagios.log file and CPU time has returned to idle. various areas of nagios webUI that were not working are now working.
...
[1592299800] NDO-3: ndo_return = 1 (Out of range value for column 'duration' at row 1)
[1592299800] NDO-3: ndo_handle_downtime(ndo-handlers.c:589): Unable to execute statement
[1592299800] NDO-3: ndo_return = 1 (Out of range value for column 'duration' at row 1)
[1592299800] NDO-3: ndo_handle_downtime(ndo-handlers.c:589): Unable to execute statement
[1592299800] NDO-3: ndo_return = 1 (Out of range value for column 'duration' at row 1)
[1592299800] NDO-3: ndo_handle_downtime(ndo-handlers.c:589): Unable to execute statement
[1592299800] NDO-3: ndo_return = 1 (Out of range value for column 'duration' at row 1)
[1592299800] NDO-3: ndo_handle_downtime(ndo-handlers.c:589): Unable to execute statement
log & forum search isnt being much help with this NDO failure and what needs to be fixed.

edit: re-tested. high CPU even after NDO has thrown the errors above. webUI info does start to populate after NDO has failed.

edit2: NDO also having issues on service checks.
...[1592303730] NDO-3: ndo_return = 1 (Data too long for column 'check_command' at row 1)
[1592303730] NDO-3: ndo_handle_service_status(ndo-handlers.c:953): Unable to execute statement
edit3: after a number of hours running, Monitoring Engine reloaded and nagios has returned back to it's broken state with high CPU and no webUI info in certain areas. Server reboot (so far; 10 minutes) has not brought it back. Probably worth saying this is on the centos8stream&nagiosXI migrated installation, not the old centos6 server.

Re: sucessful migration - mysql issues

Posted: Tue Jun 16, 2020 4:50 pm
by ssax

Code: Select all

   31: [Fri Jun 12 10:27:33.744020 2020] [core:notice] [pid 103021:tid 140160184473920] SELinux policy enabled; httpd running as context system_u:system_r:httpd_t:s0
Do you have selinux enabled on this system?

Code: Select all

sestatus
getenforce
Also, I see a ton of checks failing which will cause the CPU to climb:

Code: Select all

Jun 15 08:49:46 nagiosxi nagios[1575]: SERVICE ALERT: XXXXXXXXXX;Network counters Connections Reset IPv6;CRITICAL;SOFT;2;CHECK_NRPE: (ssl_err != 5) Error - Could not complete SSL handshake with X.X.243.56: 1
Jun 15 08:49:26 nagiosxi check_nrpe[5807]: Error: (!log_opts) Could not complete SSL handshake with X.X.243.72: dh key too small
Also, are you re-using the IP or is it a new IP address? The agents configs and firewall rules would likely need to be adjusted to allow the new IP if you are not re-using the original IP.

You may need to adjust your NRPE checks that go against NSClient++ to add a -3 or a -2 (whichever works) onto the checks to see if that resolves them.

If that doesn't resolve those you may need to do this for the ones with too small of a dh key:

https://help.univention.com/t/problem-n ... ient/11620