Page 1 of 1

Web Inteface shutsdown RHEL8 platform

Posted: Thu Aug 05, 2021 8:23 am
by itunixops
We're finalizing our testing on RHEL8 and we found this annoying issue. When we log in after a day or two the web interface comes up saying to repair databases. We found a workaround by restarting the mysql database. It seems to work for now after doing that and we've put that into a cron job to restart nightly.

Has anybody else seen this type of bug? This is Nagios XI 5.8.5 on RHEL 8.4 latest patch level.

Re: Web Inteface shutsdown RHEL8 platform

Posted: Thu Aug 05, 2021 3:29 pm
by benjaminsmith
Hi,

That shouldn't be happening so frequently. Converting all the database tables to InnoDB storage engine will help. however, please send us the latest profile and we'll take a look a the logs first?

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button

Also, let's check the size of the database tables. Thanks, Benjamin

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -uroot -pnagiosxi --table

Re: Web Inteface shutsdown RHEL8 platform

Posted: Fri Aug 06, 2021 8:26 am
by itunixops
Profile is now uploaded. Please note this is still in testing.

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -uroot -pnagiosxi --table

.
+--------------------------------------------+------------+
| Table | Size in MB |
+--------------------------------------------+------------+
| nagios_acknowledgements | 0.06 |
| nagios_commands | 0.03 |
| nagios_commenthistory | 3.74 |
| nagios_comments | 0.00 |
| nagios_configfiles | 0.01 |
| nagios_configfilevariables | 0.01 |
| nagios_conninfo | 0.68 |
| nagios_contact_addresses | 0.00 |
| nagios_contact_notificationcommands | 0.01 |
| nagios_contactgroup_members | 0.00 |
| nagios_contactgroups | 0.00 |
| nagios_contactnotificationmethods | 67.02 |
| nagios_contactnotifications | 48.96 |
| nagios_contacts | 0.01 |
| nagios_contactstatus | 0.00 |
| nagios_customvariables | 0.19 |
| nagios_customvariablestatus | 0.19 |
| nagios_dbversion | 0.00 |
| nagios_downtimehistory | 0.15 |
| nagios_eventhandlers | 0.51 |
| nagios_externalcommands | 0.00 |
| nagios_flappinghistory | 2.23 |
| nagios_host_contactgroups | 0.02 |
| nagios_host_contacts | 0.02 |
| nagios_host_parenthosts | 0.00 |
| nagios_hostchecks | 4.40 |
| nagios_hostdependencies | 0.00 |
| nagios_hostescalation_contactgroups | 0.00 |
| nagios_hostescalation_contacts | 0.00 |
| nagios_hostescalations | 0.00 |
| nagios_hostgroup_members | 0.01 |
| nagios_hostgroups | 0.00 |
| nagios_hosts | 0.14 |
| nagios_hoststatus | 0.23 |
| nagios_instances | 0.00 |
| nagios_logentries | 108.16 |
| nagios_notifications | 5.91 |
| nagios_objects | 1.50 |
| nagios_processevents | 0.15 |
| nagios_programstatus | 0.00 |
| nagios_runtimevariables | 0.00 |
| nagios_scheduleddowntime | 0.00 |
| nagios_service_contactgroups | 0.17 |
| nagios_service_contacts | 0.07 |
| nagios_service_parentservices | 0.00 |
| nagios_servicechecks | 31.30 |
| nagios_servicedependencies | 0.00 |
| nagios_serviceescalation_contactgroups | 0.00 |
| nagios_serviceescalation_contacts | 0.00 |
| nagios_serviceescalations | 0.00 |
| nagios_servicegroup_members | 0.01 |
| nagios_servicegroups | 0.00 |
| nagios_services | 0.84 |
| nagios_servicestatus | 1.92 |
| nagios_statehistory | 141.97 |
| nagios_systemcommands | 1.70 |
| nagios_timedeventqueue | 0.00 |
| nagios_timedevents | 0.00 |
| nagios_timeperiod_timeranges | 0.02 |
| nagios_timeperiods | 0.01 |
| tbl_command | 0.07 |
| tbl_contact | 0.02 |
| tbl_contactgroup | 0.01 |
| tbl_contacttemplate | 0.01 |
| tbl_domain | 0.01 |
| tbl_host | 0.11 |
| tbl_hostdependency | 0.00 |
| tbl_hostescalation | 0.00 |
| tbl_hostextinfo | 0.00 |
| tbl_hostgroup | 0.01 |
| tbl_hosttemplate | 0.02 |
| tbl_info | 0.27 |
| tbl_lnkContactToCommandHost | 0.00 |
| tbl_lnkContactToCommandService | 0.00 |
| tbl_lnkContactToContactgroup | 0.00 |
| tbl_lnkContactToContacttemplate | 0.00 |
| tbl_lnkContactToVariabledefinition | 0.00 |
| tbl_lnkContactgroupToContact | 0.00 |
| tbl_lnkContactgroupToContactgroup | 0.00 |
| tbl_lnkContacttemplateToCommandHost | 0.00 |
| tbl_lnkContacttemplateToCommandService | 0.00 |
| tbl_lnkContacttemplateToContactgroup | 0.00 |
| tbl_lnkContacttemplateToContacttemplate | 0.00 |
| tbl_lnkContacttemplateToVariabledefinition | 0.00 |
| tbl_lnkHostToContact | 0.01 |
| tbl_lnkHostToContactgroup | 0.01 |
| tbl_lnkHostToHost | 0.00 |
| tbl_lnkHostToHostgroup | 0.00 |
| tbl_lnkHostToHosttemplate | 0.01 |
| tbl_lnkHostToVariabledefinition | 0.01 |
| tbl_lnkHostdependencyToHost_DH | 0.00 |
| tbl_lnkHostdependencyToHost_H | 0.00 |
| tbl_lnkHostdependencyToHostgroup_DH | 0.00 |
| tbl_lnkHostdependencyToHostgroup_H | 0.00 |
| tbl_lnkHostescalationToContact | 0.00 |
| tbl_lnkHostescalationToContactgroup | 0.00 |
| tbl_lnkHostescalationToHost | 0.00 |
| tbl_lnkHostescalationToHostgroup | 0.00 |
| tbl_lnkHostgroupToHost | 0.01 |
| tbl_lnkHostgroupToHostgroup | 0.00 |
| tbl_lnkHosttemplateToContact | 0.00 |
| tbl_lnkHosttemplateToContactgroup | 0.00 |
| tbl_lnkHosttemplateToHost | 0.00 |
| tbl_lnkHosttemplateToHostgroup | 0.00 |
| tbl_lnkHosttemplateToHosttemplate | 0.00 |
| tbl_lnkHosttemplateToVariabledefinition | 0.00 |
| tbl_lnkServiceToContact | 0.04 |
| tbl_lnkServiceToContactgroup | 0.09 |
| tbl_lnkServiceToHost | 0.06 |
| tbl_lnkServiceToHostgroup | 0.00 |
| tbl_lnkServiceToServicegroup | 0.00 |
| tbl_lnkServiceToServicetemplate | 0.04 |
| tbl_lnkServiceToVariabledefinition | 0.05 |
| tbl_lnkServicedependencyToHost_DH | 0.00 |
| tbl_lnkServicedependencyToHost_H | 0.00 |
| tbl_lnkServicedependencyToHostgroup_DH | 0.00 |
| tbl_lnkServicedependencyToHostgroup_H | 0.00 |
| tbl_lnkServicedependencyToService_DS | 0.00 |
| tbl_lnkServicedependencyToService_S | 0.00 |
| tbl_lnkServicedependencyToServicegroup_DS | 0.02 |
| tbl_lnkServicedependencyToServicegroup_S | 0.02 |
| tbl_lnkServiceescalationToContact | 0.00 |
| tbl_lnkServiceescalationToContactgroup | 0.00 |
| tbl_lnkServiceescalationToHost | 0.00 |
| tbl_lnkServiceescalationToHostgroup | 0.00 |
| tbl_lnkServiceescalationToService | 0.00 |
| tbl_lnkServiceescalationToServicegroup | 0.02 |
| tbl_lnkServicegroupToService | 0.01 |
| tbl_lnkServicegroupToServicegroup | 0.00 |
| tbl_lnkServicetemplateToContact | 0.00 |
| tbl_lnkServicetemplateToContactgroup | 0.00 |
| tbl_lnkServicetemplateToHost | 0.00 |
| tbl_lnkServicetemplateToHostgroup | 0.00 |
| tbl_lnkServicetemplateToServicegroup | 0.00 |
| tbl_lnkServicetemplateToServicetemplate | 0.01 |
| tbl_lnkServicetemplateToVariabledefinition | 0.00 |
| tbl_lnkTimeperiodToTimeperiod | 0.00 |
| tbl_logbook | 0.00 |
| tbl_mainmenu | 0.00 |
| tbl_permission | 0.02 |
| tbl_permission_inactive | 0.02 |
| tbl_service | 0.82 |
| tbl_servicedependency | 0.00 |
| tbl_serviceescalation | 0.00 |
| tbl_serviceextinfo | 0.00 |
| tbl_servicegroup | 0.01 |
| tbl_servicetemplate | 0.03 |
| tbl_session | 0.00 |
| tbl_session_locks | 0.00 |
| tbl_settings | 0.00 |
| tbl_submenu | 0.00 |
| tbl_timedefinition | 0.03 |
| tbl_timeperiod | 0.02 |
| tbl_user | 0.01 |
| tbl_variabledefinition | 0.17 |
| xi_auditlog | 0.49 |
| xi_auth_tokens | 0.11 |
| xi_cmp_ccm_backups | 0.02 |
| xi_cmp_favorites | 0.03 |
| xi_cmp_nagiosbpi_backups | 0.34 |
| xi_cmp_trapdata | 0.03 |
| xi_cmp_trapdata_log | 0.03 |
| xi_commands | 0.00 |
| xi_deploy_agents | 0.02 |
| xi_deploy_jobs | 0.02 |
| xi_eventqueue | 23.38 |
| xi_events | 31.02 |
| xi_incidents | 0.02 |
| xi_meta | 1855.79 |
| xi_mibs | 0.05 |
| xi_options | 0.03 |
| xi_sessions | 0.03 |
| xi_sysstat | 0.01 |
| xi_usermeta | 0.27 |
| xi_users | 0.02 |
+--------------------------------------------+------------+

Moderator's Note: The profile has been shared with the support team but has been removed from the public forum.

Re: Web Inteface shutsdown RHEL8 platform

Posted: Fri Aug 06, 2021 3:51 pm
by benjaminsmith
Hi,

Thanks for sending of the profile, I'm seeing these errors. The database log was not included for some reason.
297: Database Error: Could not connect to database
298 Too many connections
The system load is very high and the MySQL process is consuming more CPU than normal. The db tables were likely corrupted when you generated the system profile.

Code: Select all

top - 09:20:57 up 15 min,  0 users,  load average: 31.72, 24.83, 14.10
Tasks: 323 total,   2 running, 321 sleeping,   0 stopped,   0 zombie
%Cpu(s): 51.5 us, 48.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  15805.9 total,  11924.8 free,   2023.1 used,   1858.0 buff/cache
MiB Swap:   4096.0 total,   4096.0 free,      0.0 used.  13502.6 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   1109 mysql     20   0 1906532 469948  36368 S 193.3   2.9  23:41.48 mysqld
We might want to move this system over to ndo2db and see if that helps. However, let's try increasing the max connections on the database. Please follow the steps in the article below to increase those.

Nagios XI - MySQL/MariaDB - Max Connections

I would also recommend converting the storage engine on the nagios database to innodb. We have guide for that process as well.

Database Storage Engine and High CPU usage in Nagios XI

Please take a full backup before making any changes and let me know if you see an improvement.

Regards,
Benjamin

Re: Web Inteface shutsdown RHEL8 platform

Posted: Mon Aug 09, 2021 8:20 am
by itunixops
That trick did it. We have now really increased our performance by a significant number.

Since 2017 we've been upgrading and upgrading and seems like over the years have had corruptions or so which we didn't address until now. Is there something in the fuure we should look at before we do any updates of this nature? We would prefer not to reenter and regenerate data because that could take time.

With Nagios XI 6 coming (and I hope it is) we're hoping to direclty migrate without issues but like to plan ahead to what will be coming.

Thanks for the help on this. If we have more issues we will be contacting you.

Re: Web Inteface shutsdown RHEL8 platform

Posted: Mon Aug 09, 2021 4:37 pm
by benjaminsmith
Hi @itunixops,

Excellent, glad to hear the system is working much better. When you upgrade, the script will take a backup but if there are any corrupted tables this may cause failure or other issues. I would recommend using the following plugin to monitor the table status for any corruption so you get a notification or alert.

https://exchange.nagios.org/directory/P ... us/details

Also, if you converted to InnoDB, make sure to set up regular backup jobs. The tables are more resilient to corruption but can be harder to repair compared to MyISAM.

Backing Up And Restoring Your Nagios XI System

Let me know if you need anything else or if it's okay to mark this as resolved.

Regards,
Benjamin