Page 1 of 1
Web Inteface shutsdown RHEL8 platform
Posted: Thu Aug 05, 2021 8:23 am
by itunixops
We're finalizing our testing on RHEL8 and we found this annoying issue. When we log in after a day or two the web interface comes up saying to repair databases. We found a workaround by restarting the mysql database. It seems to work for now after doing that and we've put that into a cron job to restart nightly.
Has anybody else seen this type of bug? This is Nagios XI 5.8.5 on RHEL 8.4 latest patch level.
Re: Web Inteface shutsdown RHEL8 platform
Posted: Thu Aug 05, 2021 3:29 pm
by benjaminsmith
Hi,
That shouldn't be happening so frequently. Converting all the database tables to InnoDB storage engine will help. however, please send us the latest profile and we'll take a look a the logs first?
To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Also, let's check the size of the database tables. Thanks, Benjamin
Code: Select all
echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -uroot -pnagiosxi --table
Re: Web Inteface shutsdown RHEL8 platform
Posted: Fri Aug 06, 2021 8:26 am
by itunixops
Profile is now uploaded. Please note this is still in testing.
echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -uroot -pnagiosxi --table
.
+--------------------------------------------+------------+
| Table | Size in MB |
+--------------------------------------------+------------+
| nagios_acknowledgements | 0.06 |
| nagios_commands | 0.03 |
| nagios_commenthistory | 3.74 |
| nagios_comments | 0.00 |
| nagios_configfiles | 0.01 |
| nagios_configfilevariables | 0.01 |
| nagios_conninfo | 0.68 |
| nagios_contact_addresses | 0.00 |
| nagios_contact_notificationcommands | 0.01 |
| nagios_contactgroup_members | 0.00 |
| nagios_contactgroups | 0.00 |
| nagios_contactnotificationmethods | 67.02 |
| nagios_contactnotifications | 48.96 |
| nagios_contacts | 0.01 |
| nagios_contactstatus | 0.00 |
| nagios_customvariables | 0.19 |
| nagios_customvariablestatus | 0.19 |
| nagios_dbversion | 0.00 |
| nagios_downtimehistory | 0.15 |
| nagios_eventhandlers | 0.51 |
| nagios_externalcommands | 0.00 |
| nagios_flappinghistory | 2.23 |
| nagios_host_contactgroups | 0.02 |
| nagios_host_contacts | 0.02 |
| nagios_host_parenthosts | 0.00 |
| nagios_hostchecks | 4.40 |
| nagios_hostdependencies | 0.00 |
| nagios_hostescalation_contactgroups | 0.00 |
| nagios_hostescalation_contacts | 0.00 |
| nagios_hostescalations | 0.00 |
| nagios_hostgroup_members | 0.01 |
| nagios_hostgroups | 0.00 |
| nagios_hosts | 0.14 |
| nagios_hoststatus | 0.23 |
| nagios_instances | 0.00 |
| nagios_logentries | 108.16 |
| nagios_notifications | 5.91 |
| nagios_objects | 1.50 |
| nagios_processevents | 0.15 |
| nagios_programstatus | 0.00 |
| nagios_runtimevariables | 0.00 |
| nagios_scheduleddowntime | 0.00 |
| nagios_service_contactgroups | 0.17 |
| nagios_service_contacts | 0.07 |
| nagios_service_parentservices | 0.00 |
| nagios_servicechecks | 31.30 |
| nagios_servicedependencies | 0.00 |
| nagios_serviceescalation_contactgroups | 0.00 |
| nagios_serviceescalation_contacts | 0.00 |
| nagios_serviceescalations | 0.00 |
| nagios_servicegroup_members | 0.01 |
| nagios_servicegroups | 0.00 |
| nagios_services | 0.84 |
| nagios_servicestatus | 1.92 |
| nagios_statehistory | 141.97 |
| nagios_systemcommands | 1.70 |
| nagios_timedeventqueue | 0.00 |
| nagios_timedevents | 0.00 |
| nagios_timeperiod_timeranges | 0.02 |
| nagios_timeperiods | 0.01 |
| tbl_command | 0.07 |
| tbl_contact | 0.02 |
| tbl_contactgroup | 0.01 |
| tbl_contacttemplate | 0.01 |
| tbl_domain | 0.01 |
| tbl_host | 0.11 |
| tbl_hostdependency | 0.00 |
| tbl_hostescalation | 0.00 |
| tbl_hostextinfo | 0.00 |
| tbl_hostgroup | 0.01 |
| tbl_hosttemplate | 0.02 |
| tbl_info | 0.27 |
| tbl_lnkContactToCommandHost | 0.00 |
| tbl_lnkContactToCommandService | 0.00 |
| tbl_lnkContactToContactgroup | 0.00 |
| tbl_lnkContactToContacttemplate | 0.00 |
| tbl_lnkContactToVariabledefinition | 0.00 |
| tbl_lnkContactgroupToContact | 0.00 |
| tbl_lnkContactgroupToContactgroup | 0.00 |
| tbl_lnkContacttemplateToCommandHost | 0.00 |
| tbl_lnkContacttemplateToCommandService | 0.00 |
| tbl_lnkContacttemplateToContactgroup | 0.00 |
| tbl_lnkContacttemplateToContacttemplate | 0.00 |
| tbl_lnkContacttemplateToVariabledefinition | 0.00 |
| tbl_lnkHostToContact | 0.01 |
| tbl_lnkHostToContactgroup | 0.01 |
| tbl_lnkHostToHost | 0.00 |
| tbl_lnkHostToHostgroup | 0.00 |
| tbl_lnkHostToHosttemplate | 0.01 |
| tbl_lnkHostToVariabledefinition | 0.01 |
| tbl_lnkHostdependencyToHost_DH | 0.00 |
| tbl_lnkHostdependencyToHost_H | 0.00 |
| tbl_lnkHostdependencyToHostgroup_DH | 0.00 |
| tbl_lnkHostdependencyToHostgroup_H | 0.00 |
| tbl_lnkHostescalationToContact | 0.00 |
| tbl_lnkHostescalationToContactgroup | 0.00 |
| tbl_lnkHostescalationToHost | 0.00 |
| tbl_lnkHostescalationToHostgroup | 0.00 |
| tbl_lnkHostgroupToHost | 0.01 |
| tbl_lnkHostgroupToHostgroup | 0.00 |
| tbl_lnkHosttemplateToContact | 0.00 |
| tbl_lnkHosttemplateToContactgroup | 0.00 |
| tbl_lnkHosttemplateToHost | 0.00 |
| tbl_lnkHosttemplateToHostgroup | 0.00 |
| tbl_lnkHosttemplateToHosttemplate | 0.00 |
| tbl_lnkHosttemplateToVariabledefinition | 0.00 |
| tbl_lnkServiceToContact | 0.04 |
| tbl_lnkServiceToContactgroup | 0.09 |
| tbl_lnkServiceToHost | 0.06 |
| tbl_lnkServiceToHostgroup | 0.00 |
| tbl_lnkServiceToServicegroup | 0.00 |
| tbl_lnkServiceToServicetemplate | 0.04 |
| tbl_lnkServiceToVariabledefinition | 0.05 |
| tbl_lnkServicedependencyToHost_DH | 0.00 |
| tbl_lnkServicedependencyToHost_H | 0.00 |
| tbl_lnkServicedependencyToHostgroup_DH | 0.00 |
| tbl_lnkServicedependencyToHostgroup_H | 0.00 |
| tbl_lnkServicedependencyToService_DS | 0.00 |
| tbl_lnkServicedependencyToService_S | 0.00 |
| tbl_lnkServicedependencyToServicegroup_DS | 0.02 |
| tbl_lnkServicedependencyToServicegroup_S | 0.02 |
| tbl_lnkServiceescalationToContact | 0.00 |
| tbl_lnkServiceescalationToContactgroup | 0.00 |
| tbl_lnkServiceescalationToHost | 0.00 |
| tbl_lnkServiceescalationToHostgroup | 0.00 |
| tbl_lnkServiceescalationToService | 0.00 |
| tbl_lnkServiceescalationToServicegroup | 0.02 |
| tbl_lnkServicegroupToService | 0.01 |
| tbl_lnkServicegroupToServicegroup | 0.00 |
| tbl_lnkServicetemplateToContact | 0.00 |
| tbl_lnkServicetemplateToContactgroup | 0.00 |
| tbl_lnkServicetemplateToHost | 0.00 |
| tbl_lnkServicetemplateToHostgroup | 0.00 |
| tbl_lnkServicetemplateToServicegroup | 0.00 |
| tbl_lnkServicetemplateToServicetemplate | 0.01 |
| tbl_lnkServicetemplateToVariabledefinition | 0.00 |
| tbl_lnkTimeperiodToTimeperiod | 0.00 |
| tbl_logbook | 0.00 |
| tbl_mainmenu | 0.00 |
| tbl_permission | 0.02 |
| tbl_permission_inactive | 0.02 |
| tbl_service | 0.82 |
| tbl_servicedependency | 0.00 |
| tbl_serviceescalation | 0.00 |
| tbl_serviceextinfo | 0.00 |
| tbl_servicegroup | 0.01 |
| tbl_servicetemplate | 0.03 |
| tbl_session | 0.00 |
| tbl_session_locks | 0.00 |
| tbl_settings | 0.00 |
| tbl_submenu | 0.00 |
| tbl_timedefinition | 0.03 |
| tbl_timeperiod | 0.02 |
| tbl_user | 0.01 |
| tbl_variabledefinition | 0.17 |
| xi_auditlog | 0.49 |
| xi_auth_tokens | 0.11 |
| xi_cmp_ccm_backups | 0.02 |
| xi_cmp_favorites | 0.03 |
| xi_cmp_nagiosbpi_backups | 0.34 |
| xi_cmp_trapdata | 0.03 |
| xi_cmp_trapdata_log | 0.03 |
| xi_commands | 0.00 |
| xi_deploy_agents | 0.02 |
| xi_deploy_jobs | 0.02 |
| xi_eventqueue | 23.38 |
| xi_events | 31.02 |
| xi_incidents | 0.02 |
| xi_meta | 1855.79 |
| xi_mibs | 0.05 |
| xi_options | 0.03 |
| xi_sessions | 0.03 |
| xi_sysstat | 0.01 |
| xi_usermeta | 0.27 |
| xi_users | 0.02 |
+--------------------------------------------+------------+
Moderator's Note: The profile has been shared with the support team but has been removed from the public forum.
Re: Web Inteface shutsdown RHEL8 platform
Posted: Fri Aug 06, 2021 3:51 pm
by benjaminsmith
Hi,
Thanks for sending of the profile, I'm seeing these errors. The database log was not included for some reason.
297: Database Error: Could not connect to database
298 Too many connections
The system load is very high and the MySQL process is consuming more CPU than normal. The db tables were likely corrupted when you generated the system profile.
Code: Select all
top - 09:20:57 up 15 min, 0 users, load average: 31.72, 24.83, 14.10
Tasks: 323 total, 2 running, 321 sleeping, 0 stopped, 0 zombie
%Cpu(s): 51.5 us, 48.5 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 15805.9 total, 11924.8 free, 2023.1 used, 1858.0 buff/cache
MiB Swap: 4096.0 total, 4096.0 free, 0.0 used. 13502.6 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1109 mysql 20 0 1906532 469948 36368 S 193.3 2.9 23:41.48 mysqld
We might want to move this system over to ndo2db and see if that helps. However, let's try increasing the max connections on the database. Please follow the steps in the article below to increase those.
Nagios XI - MySQL/MariaDB - Max Connections
I would also recommend converting the storage engine on the nagios database to innodb. We have guide for that process as well.
Database Storage Engine and High CPU usage in Nagios XI
Please take a full backup before making any changes and let me know if you see an improvement.
Regards,
Benjamin
Re: Web Inteface shutsdown RHEL8 platform
Posted: Mon Aug 09, 2021 8:20 am
by itunixops
That trick did it. We have now really increased our performance by a significant number.
Since 2017 we've been upgrading and upgrading and seems like over the years have had corruptions or so which we didn't address until now. Is there something in the fuure we should look at before we do any updates of this nature? We would prefer not to reenter and regenerate data because that could take time.
With Nagios XI 6 coming (and I hope it is) we're hoping to direclty migrate without issues but like to plan ahead to what will be coming.
Thanks for the help on this. If we have more issues we will be contacting you.
Re: Web Inteface shutsdown RHEL8 platform
Posted: Mon Aug 09, 2021 4:37 pm
by benjaminsmith
Hi
@itunixops,
Excellent, glad to hear the system is working much better. When you upgrade, the script will take a backup but if there are any corrupted tables this may cause failure or other issues. I would recommend using the following plugin to monitor the table status for any corruption so you get a notification or alert.
https://exchange.nagios.org/directory/P ... us/details
Also, if you converted to InnoDB, make sure to set up regular backup jobs. The tables are more resilient to corruption but can be harder to repair compared to MyISAM.
Backing Up And Restoring Your Nagios XI System
Let me know if you need anything else or if it's okay to mark this as resolved.
Regards,
Benjamin