nagios xi database crash frequently

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
yonten
Posts: 6
Joined: Fri Sep 14, 2018 5:21 am

nagios xi database crash frequently

Post by yonten »

Dear Team,

I have been noticed several times that the nagios xi database is crash frequently and have to repair it time and again which is very annoying.

The error message i got is stated below:

Databse Error
A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.
Run the following from the CLI as root to attempt to repair the DB:

/usr/local/nagiosxi/scripts/repair_databases.sh

therefore, please help us to resolve the issue permanently.

-warm regards,
Yonten Tshering
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: nagios xi database crash frequently

Post by ssax »

Please PM me a copy of your profile, you can download it from Admin > System Profile > Download Profile.

The usual culprits are high IO wait, high load, or something else.

Please post a screenshot of Admin > Performance Settings > Databases tab as well.

Thank you
yonten
Posts: 6
Joined: Fri Sep 14, 2018 5:21 am

Re: nagios xi database crash frequently

Post by yonten »

Dear Sir,

As requested, please find the attachment for your kind information la.

-warm regards,
Yonten tshering
You do not have the required permissions to view the files attached to this post.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: nagios xi database crash frequently

Post by npolovenko »

@yonten, Please follow this article to check the number of open MySQL connections on your system and double the value you have in your settings.
https://support.nagios.com/kb/article.php?id=513

You also have multiple nagios ipcs queues. To fix that issue run through the following commands:
systemctl stop crond
systemctl stop npcd
systemctl stop nagios
systemctl stop ndo2db
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
systemctl restart mariadb
systemctl start ndo2db
systemctl start nagios
systemctl start npcd
systemctl start crond
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
yonten
Posts: 6
Joined: Fri Sep 14, 2018 5:21 am

Re: nagios xi database crash frequently

Post by yonten »

Hi,
I am applied commands sent by you and will observe sometime.

And also it was observed that /usr/libexec/mysqld is consuming lots of CPU usages. Attached here for system status information and htop command output for your verification please.

-warm regards,
Yonten Tshering
You do not have the required permissions to view the files attached to this post.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: nagios xi database crash frequently

Post by npolovenko »

@yonten, Please run the following commands and then upload the nagios.log and nagios1.log files from the /tmp/ folder.
service crond status
su - nagios
/usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php >> /tmp/nagios.log
ps -ef | grep cron >> /tmp/nagios1.log
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
yonten
Posts: 6
Joined: Fri Sep 14, 2018 5:21 am

Re: nagios xi database crash frequently

Post by yonten »

Hi,

Thanking you for your quick response. please kindly find the nagios.log and nagios1.log file for your kind reference.

[root@ecms ~]# service crond status
Redirecting to /bin/systemctl status crond.service
● crond.service - Command Scheduler
Loaded: loaded (/usr/lib/systemd/system/crond.service; enabled; vendor preset : enabled)
Active: active (running) since Mon 2018-12-31 16:03:55 +06; 12min ago
Main PID: 781 (crond)
CGroup: /system.slice/crond.service
└─781 /usr/sbin/crond -n

Dec 31 16:03:55 ecms systemd[1]: Started Command Scheduler.
Dec 31 16:03:55 ecms systemd[1]: Starting Command Scheduler...
Dec 31 16:03:55 ecms crond[781]: (CRON) INFO (RANDOM_DELAY will be scaled w....)
Dec 31 16:03:55 ecms crond[781]: (CRON) INFO (running with inotify support)
Hint: Some lines were ellipsized, use -l to show in full.
[root@ecms ~]#

-warm regards,
Yonten Tshering
You do not have the required permissions to view the files attached to this post.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: nagios xi database crash frequently

Post by ssax »

What is the output of this command?

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -uroot -pnagiosxi --table
Please go to Admin > Performance Settings > Databases tab:
- Set all 3 Optimize Intervals on that page to 300.
- Click the Update Settings button

This can be the cause of crashed tables if one optimize isn't finished before the next one starts, making this change should alleviate that issue if the output of the requested command shows large tables.
yonten
Posts: 6
Joined: Fri Sep 14, 2018 5:21 am

Re: nagios xi database crash frequently

Post by yonten »

Dear Support team,
kindly find attached herewith output for your reference please.

[root@ecms mail]# echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -uroot -pnagiosxi --table
+--------------------------------------------+------------+
| Table | Size in MB |
+--------------------------------------------+------------+
| nagios_acknowledgements | 0.00 |
| nagios_commands | 0.02 |
| nagios_commenthistory | 5.05 |
| nagios_comments | 0.00 |
| nagios_configfiles | 0.00 |
| nagios_configfilevariables | 0.01 |
| nagios_conninfo | 0.22 |
| nagios_contact_addresses | 0.00 |
| nagios_contact_notificationcommands | 0.01 |
| nagios_contactgroup_members | 0.00 |
| nagios_contactgroups | 0.00 |
| nagios_contactnotificationmethods | 110.51 |
| nagios_contactnotifications | 115.36 |
| nagios_contacts | 0.00 |
| nagios_contactstatus | 0.00 |
| nagios_customvariables | 0.41 |
| nagios_customvariablestatus | 0.43 |
| nagios_dbversion | 0.00 |
| nagios_downtimehistory | 0.00 |
| nagios_eventhandlers | 3.52 |
| nagios_externalcommands | 0.00 |
| nagios_flappinghistory | 2.06 |
| nagios_host_contactgroups | 0.00 |
| nagios_host_contacts | 0.01 |
| nagios_host_parenthosts | 0.01 |
| nagios_hostchecks | 0.00 |
| nagios_hostdependencies | 0.00 |
| nagios_hostescalation_contactgroups | 0.01 |
| nagios_hostescalation_contacts | 0.04 |
| nagios_hostescalations | 0.05 |
| nagios_hostgroup_members | 0.01 |
| nagios_hostgroups | 0.00 |
| nagios_hosts | 0.04 |
| nagios_hoststatus | 0.11 |
| nagios_instances | 0.00 |
| nagios_logentries | NULL |
| nagios_notifications | 603.57 |
| nagios_objects | 0.57 |
| nagios_processevents | 0.57 |
| nagios_programstatus | 0.00 |
| nagios_runtimevariables | 0.00 |
| nagios_scheduleddowntime | 0.00 |
| nagios_service_contactgroups | 0.02 |
| nagios_service_contacts | 0.14 |
| nagios_service_parentservices | 0.00 |
| nagios_servicechecks | 0.00 |
| nagios_servicedependencies | 0.00 |
| nagios_serviceescalation_contactgroups | 0.00 |
| nagios_serviceescalation_contacts | 0.00 |
| nagios_serviceescalations | 0.00 |
| nagios_servicegroup_members | 0.08 |
| nagios_servicegroups | 0.00 |
| nagios_services | 1.10 |
| nagios_servicestatus | 2.91 |
| nagios_statehistory | 125.77 |
| nagios_systemcommands | 1.04 |
| nagios_timedeventqueue | 0.00 |
| nagios_timedevents | 0.00 |
| nagios_timeperiod_timeranges | 0.01 |
| nagios_timeperiods | 0.00 |
| tbl_command | 0.06 |
| tbl_contact | 0.03 |
| tbl_contactgroup | 0.03 |
| tbl_contacttemplate | 0.03 |
| tbl_domain | 0.03 |
| tbl_host | 0.08 |
| tbl_hostdependency | 0.03 |
| tbl_hostescalation | 0.03 |
| tbl_hostextinfo | 0.03 |
| tbl_hostgroup | 0.03 |
| tbl_hosttemplate | 0.03 |
| tbl_info | 0.17 |
| tbl_lnkContactToCommandHost | 0.02 |
| tbl_lnkContactToCommandService | 0.02 |
| tbl_lnkContactToContactgroup | 0.02 |
| tbl_lnkContactToContacttemplate | 0.02 |
| tbl_lnkContactToVariabledefinition | 0.02 |
| tbl_lnkContactgroupToContact | 0.02 |
| tbl_lnkContactgroupToContactgroup | 0.02 |
| tbl_lnkContacttemplateToCommandHost | 0.02 |
| tbl_lnkContacttemplateToCommandService | 0.02 |
| tbl_lnkContacttemplateToContactgroup | 0.02 |
| tbl_lnkContacttemplateToContacttemplate | 0.02 |
| tbl_lnkContacttemplateToVariabledefinition | 0.02 |
| tbl_lnkHostToContact | 0.02 |
| tbl_lnkHostToContactgroup | 0.02 |
| tbl_lnkHostToHost | 0.02 |
| tbl_lnkHostToHostgroup | 0.02 |
| tbl_lnkHostToHosttemplate | 0.02 |
| tbl_lnkHostToVariabledefinition | 0.02 |
| tbl_lnkHostdependencyToHost_DH | 0.02 |
| tbl_lnkHostdependencyToHost_H | 0.02 |
| tbl_lnkHostdependencyToHostgroup_DH | 0.02 |
| tbl_lnkHostdependencyToHostgroup_H | 0.02 |
| tbl_lnkHostescalationToContact | 0.02 |
| tbl_lnkHostescalationToContactgroup | 0.02 |
| tbl_lnkHostescalationToHost | 0.02 |
| tbl_lnkHostescalationToHostgroup | 0.02 |
| tbl_lnkHostgroupToHost | 0.02 |
| tbl_lnkHostgroupToHostgroup | 0.02 |
| tbl_lnkHosttemplateToContact | 0.02 |
| tbl_lnkHosttemplateToContactgroup | 0.02 |
| tbl_lnkHosttemplateToHost | 0.02 |
| tbl_lnkHosttemplateToHostgroup | 0.02 |
| tbl_lnkHosttemplateToHosttemplate | 0.02 |
| tbl_lnkHosttemplateToVariabledefinition | 0.02 |
| tbl_lnkServiceToContact | 0.13 |
| tbl_lnkServiceToContactgroup | 0.02 |
| tbl_lnkServiceToHost | 0.17 |
| tbl_lnkServiceToHostgroup | 0.02 |
| tbl_lnkServiceToServicegroup | 0.02 |
| tbl_lnkServiceToServicetemplate | 0.20 |
| tbl_lnkServiceToVariabledefinition | 0.17 |
| tbl_lnkServicedependencyToHost_DH | 0.02 |
| tbl_lnkServicedependencyToHost_H | 0.02 |
| tbl_lnkServicedependencyToHostgroup_DH | 0.02 |
| tbl_lnkServicedependencyToHostgroup_H | 0.02 |
| tbl_lnkServicedependencyToService_DS | 0.02 |
| tbl_lnkServicedependencyToService_S | 0.02 |
| tbl_lnkServiceescalationToContact | 0.02 |
| tbl_lnkServiceescalationToContactgroup | 0.02 |
| tbl_lnkServiceescalationToHost | 0.02 |
| tbl_lnkServiceescalationToHostgroup | 0.02 |
| tbl_lnkServiceescalationToService | 0.02 |
| tbl_lnkServicegroupToService | 0.08 |
| tbl_lnkServicegroupToServicegroup | 0.02 |
| tbl_lnkServicetemplateToContact | 0.02 |
| tbl_lnkServicetemplateToContactgroup | 0.02 |
| tbl_lnkServicetemplateToHost | 0.02 |
| tbl_lnkServicetemplateToHostgroup | 0.02 |
| tbl_lnkServicetemplateToServicegroup | 0.02 |
| tbl_lnkServicetemplateToServicetemplate | 0.02 |
| tbl_lnkServicetemplateToVariabledefinition | 0.02 |
| tbl_lnkTimeperiodToTimeperiod | 0.02 |
| tbl_logbook | 0.02 |
| tbl_mainmenu | 0.02 |
| tbl_permission | 0.02 |
| tbl_permission_inactive | 0.02 |
| tbl_service | 1.52 |
| tbl_servicedependency | 0.03 |
| tbl_serviceescalation | 0.03 |
| tbl_serviceextinfo | 0.03 |
| tbl_servicegroup | 0.03 |
| tbl_servicetemplate | 0.03 |
| tbl_session | 0.02 |
| tbl_session_locks | 0.02 |
| tbl_settings | 0.03 |
| tbl_submenu | 0.02 |
| tbl_timedefinition | 0.02 |
| tbl_timeperiod | 0.03 |
| tbl_user | 0.03 |
| tbl_variabledefinition | 0.30 |
| xi_auditlog | 4.19 |
| xi_auth_tokens | 1.56 |
| xi_cmp_trapdata | 0.03 |
| xi_cmp_trapdata_log | 0.03 |
| xi_commands | 0.02 |
| xi_eventqueue | 368.88 |
| xi_events | 2767.92 |
| xi_incidents | 0.02 |
| xi_meta | 41160.00 |
| xi_options | 0.06 |
| xi_sessions | 0.09 |
| xi_sysstat | 0.03 |
| xi_usermeta | 0.27 |
| xi_users | 0.03 |
+--------------------------------------------+------------+




-warm regards,
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: nagios xi database crash frequently

Post by ssax »

You are hitting a bug, please run these commands and it should fix your issue (and likely fix the issue with the tables crashing all the time as well, it's likely because of how large your xi_meta and xi_events tables are):

Code: Select all

systemctl stop crond
systemctl stop npcd
systemctl stop nagios
systemctl stop ndo2db
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
systemctl restart mysqld || systemctl restart mariadb
echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | mysql -u root -pnagiosxi nagiosxi
systemctl start ndo2db
systemctl start nagios
systemctl start npcd
systemctl start crond
Once you are done, please send the output of this command again:

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -uroot -pnagiosxi --table
Locked