Page 1 of 1

Nagios XI 5.5.11 - Issues

Posted: Thu Aug 15, 2019 12:57 pm
by msmulpuri
Hi,

We have VM instance of Nagios XI 5.5.11 installed and running at a customer site. Seeing all kinds of issues as this instance monitors over 2300 of hosts and over 1100 of services. The monitoring also includes SNMP Traps and polling in place. SNMP Traps keeps on increasing and stuck in the as they slowly clear up. A very high number of traps generated. Also, seeing the below in dbmaint.log even after the suggested repair is done. The file ibdata1 (MariaDB) is huge and keeps on growing. Continuous performance degradation of the VM due to CPU spikes and large Memory and swap space usage. I have attached the System Profile to this post just in case if needed. Please help.

Configuration of the VM:
CentOS 7
Nagios XI 5.5.11
Memory allocated: 24 GB
Swap space: 16 GB

dbmaint.log output:
LOCKFILE '/usr/local/nagiosxi/var/dbmaint.lock' EXISTS - EXITING!
LOCKFILE '/usr/local/nagiosxi/var/dbmaint.lock' EXISTS - EXITING!
LOCKFILE '/usr/local/nagiosxi/var/dbmaint.lock' EXISTS - EXITING!
LOCKFILE '/usr/local/nagiosxi/var/dbmaint.lock' IS OLD - REMOVING
CREATING: /usr/local/nagiosxi/var/dbmaint.lock
<h3>Database Error</h3>A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.<p>Run the following from the CLI as root to attempt to repair the DB:<br><pre>/usr/local/nagiosxi/scripts/repair_databases.sh</pre></p>LOCKFILE '/usr/local/nagiosxi/var/dbmaint.lock' EXISTS - EXITING!
LOCKFILE '/usr/local/nagiosxi/var/dbmaint.lock' EXISTS - EXITING!
LOCKFILE '/usr/local/nagiosxi/var/dbmaint.lock' EXISTS - EXITING!
LOCKFILE '/usr/local/nagiosxi/var/dbmaint.lock' EXISTS - EXITING!
LOCKFILE '/usr/local/nagiosxi/var/dbmaint.lock' EXISTS - EXITING!

Support Edit: profile.zip has been downloaded shared with the team.

Re: Nagios XI 5.5.11 - Issues

Posted: Thu Aug 15, 2019 4:53 pm
by benjaminsmith
Hello @msmulpuri,
<h3>Database Error</h3>A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.<p>Run the following from the CLI as root to attempt to repair the DB:<br>
Besides the error above, also a fatal php database call in the Apache log, and this is most likely causing the CPU/performance issues.

Run through the following commands to stop the processes, clear the message queue, repair the database and restart.

Code: Select all

systemctl stop crond
systemctl stop npcd
systemctl stop nagios
systemctl stop ndo2db
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
systemctl restart mysqld || systemctl restart mariadb
cd /usr/local/nagiosxi/scripts
./repair_databases.sh
systemctl start npcd
systemctl start crond
systemctl start nagios
systemctl start ndo2db
After running the above commands, can you send over a fresh system profile along with the database configuration file ( /etc/my.cnf ), and post the full output of the following commands:

Check for Corrupted Tables

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table | grep NULL
Check Table Sizes

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -uroot -pnagiosxi --table
Thanks.

Re: Nagios XI 5.5.11 - Issues

Posted: Thu Aug 15, 2019 8:09 pm
by msmulpuri
Hello,

First of all I would like to thank you for your quick response for my concern. Please find below the output for the steps that you requested me follow.

1. Database repair script yielded the below result.
===============
REPAIR COMPLETE
===============
DATABASE: nagiosql
TABLE:
/var/lib/mysql/nagiosql /usr/local/nagiosxi/var
No *.MYI files found, skipping nagiosql...
DATABASE: nagiosxi
TABLE:
/var/lib/mysql/nagiosxi /usr/local/nagiosxi/var
No *.MYI files found, skipping nagiosxi...

=======================
nagios database repair succeeded
nagiosql database repair skipped, no *.MYI files found
nagiosxi database repair skipped, no *.MYI files found

2. echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table | grep NULL
-No output returned

3. echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -uroot -pnagiosxi --table
+--------------------------------------------+------------+
| Table | Size in MB |
+--------------------------------------------+------------+
| nagios_acknowledgements | 0.00 |
| nagios_commands | 0.02 |
| nagios_commenthistory | 0.12 |
| nagios_comments | 0.00 |
| nagios_configfiles | 0.00 |
| nagios_configfilevariables | 0.01 |
| nagios_conninfo | 0.01 |
| nagios_contact_addresses | 0.00 |
| nagios_contact_notificationcommands | 0.01 |
| nagios_contactgroup_members | 0.00 |
| nagios_contactgroups | 0.00 |
| nagios_contactnotificationmethods | 282.09 |
| nagios_contactnotifications | 298.80 |
| nagios_contacts | 0.00 |
| nagios_contactstatus | 0.00 |
| nagios_customvariables | 0.23 |
| nagios_customvariablestatus | 0.23 |
| nagios_dbversion | 0.00 |
| nagios_downtimehistory | 0.00 |
| nagios_eventhandlers | 324.51 |
| nagios_externalcommands | 706.02 |
| nagios_flappinghistory | 0.05 |
| nagios_host_contactgroups | 0.00 |
| nagios_host_contacts | 0.19 |
| nagios_host_parenthosts | 0.00 |
| nagios_hostchecks | 0.00 |
| nagios_hostdependencies | 0.00 |
| nagios_hostescalation_contactgroups | 0.00 |
| nagios_hostescalation_contacts | 0.00 |
| nagios_hostescalations | 0.00 |
| nagios_hostgroup_members | 0.10 |
| nagios_hostgroups | 0.00 |
| nagios_hosts | 0.55 |
| nagios_hoststatus | 1.28 |
| nagios_instances | 0.00 |
| nagios_logentries | 2440.71 |
| nagios_notifications | 412.47 |
| nagios_objects | 0.32 |
| nagios_processevents | 0.03 |
| nagios_programstatus | 0.00 |
| nagios_runtimevariables | 0.00 |
| nagios_scheduleddowntime | 0.00 |
| nagios_service_contactgroups | 0.01 |
| nagios_service_contacts | 0.19 |
| nagios_service_parentservices | 0.00 |
| nagios_servicechecks | 0.00 |
| nagios_servicedependencies | 0.00 |
| nagios_serviceescalation_contactgroups | 0.00 |
| nagios_serviceescalation_contacts | 0.00 |
| nagios_serviceescalations | 0.00 |
| nagios_servicegroup_members | 0.04 |
| nagios_servicegroups | 0.00 |
| nagios_services | 0.32 |
| nagios_servicestatus | 0.73 |
| nagios_statehistory | 91.80 |
| nagios_systemcommands | 1.46 |
| nagios_timedeventqueue | 0.00 |
| nagios_timedevents | 0.00 |
| nagios_timeperiod_timeranges | 0.01 |
| nagios_timeperiods | 0.00 |
| tbl_command | 0.06 |
| tbl_contact | 0.03 |
| tbl_contactgroup | 0.03 |
| tbl_contacttemplate | 0.03 |
| tbl_domain | 0.03 |
| tbl_host | 0.48 |
| tbl_hostdependency | 0.03 |
| tbl_hostescalation | 0.03 |
| tbl_hostextinfo | 0.03 |
| tbl_hostgroup | 0.03 |
| tbl_hosttemplate | 0.03 |
| tbl_info | 0.17 |
| tbl_lnkContactToCommandHost | 0.02 |
| tbl_lnkContactToCommandService | 0.02 |
| tbl_lnkContactToContactgroup | 0.02 |
| tbl_lnkContactToContacttemplate | 0.02 |
| tbl_lnkContactToVariabledefinition | 0.02 |
| tbl_lnkContactgroupToContact | 0.02 |
| tbl_lnkContactgroupToContactgroup | 0.02 |
| tbl_lnkContacttemplateToCommandHost | 0.02 |
| tbl_lnkContacttemplateToCommandService | 0.02 |
| tbl_lnkContacttemplateToContactgroup | 0.02 |
| tbl_lnkContacttemplateToContacttemplate | 0.02 |
| tbl_lnkContacttemplateToVariabledefinition | 0.02 |
| tbl_lnkHostToContact | 0.16 |
| tbl_lnkHostToContactgroup | 0.02 |
| tbl_lnkHostToHost | 0.02 |
| tbl_lnkHostToHostgroup | 0.02 |
| tbl_lnkHostToHosttemplate | 0.11 |
| tbl_lnkHostToVariabledefinition | 0.09 |
| tbl_lnkHostdependencyToHost_DH | 0.02 |
| tbl_lnkHostdependencyToHost_H | 0.02 |
| tbl_lnkHostdependencyToHostgroup_DH | 0.02 |
| tbl_lnkHostdependencyToHostgroup_H | 0.02 |
| tbl_lnkHostescalationToContact | 0.02 |
| tbl_lnkHostescalationToContactgroup | 0.02 |
| tbl_lnkHostescalationToHost | 0.02 |
| tbl_lnkHostescalationToHostgroup | 0.02 |
| tbl_lnkHostgroupToHost | 0.09 |
| tbl_lnkHostgroupToHostgroup | 0.02 |
| tbl_lnkHosttemplateToContact | 0.02 |
| tbl_lnkHosttemplateToContactgroup | 0.02 |
| tbl_lnkHosttemplateToHost | 0.02 |
| tbl_lnkHosttemplateToHostgroup | 0.02 |
| tbl_lnkHosttemplateToHosttemplate | 0.02 |
| tbl_lnkHosttemplateToVariabledefinition | 0.02 |
| tbl_lnkServiceToContact | 0.23 |
| tbl_lnkServiceToContactgroup | 0.02 |
| tbl_lnkServiceToHost | 0.06 |
| tbl_lnkServiceToHostgroup | 0.02 |
| tbl_lnkServiceToServicegroup | 0.02 |
| tbl_lnkServiceToServicetemplate | 0.06 |
| tbl_lnkServiceToVariabledefinition | 0.05 |
| tbl_lnkServicedependencyToHost_DH | 0.02 |
| tbl_lnkServicedependencyToHost_H | 0.02 |
| tbl_lnkServicedependencyToHostgroup_DH | 0.02 |
| tbl_lnkServicedependencyToHostgroup_H | 0.02 |
| tbl_lnkServicedependencyToService_DS | 0.02 |
| tbl_lnkServicedependencyToService_S | 0.02 |
| tbl_lnkServiceescalationToContact | 0.02 |
| tbl_lnkServiceescalationToContactgroup | 0.02 |
| tbl_lnkServiceescalationToHost | 0.02 |
| tbl_lnkServiceescalationToHostgroup | 0.02 |
| tbl_lnkServiceescalationToService | 0.02 |
| tbl_lnkServicegroupToService | 0.05 |
| tbl_lnkServicegroupToServicegroup | 0.02 |
| tbl_lnkServicetemplateToContact | 0.02 |
| tbl_lnkServicetemplateToContactgroup | 0.02 |
| tbl_lnkServicetemplateToHost | 0.02 |
| tbl_lnkServicetemplateToHostgroup | 0.02 |
| tbl_lnkServicetemplateToServicegroup | 0.02 |
| tbl_lnkServicetemplateToServicetemplate | 0.02 |
| tbl_lnkServicetemplateToVariabledefinition | 0.02 |
| tbl_lnkTimeperiodToTimeperiod | 0.02 |
| tbl_logbook | 0.02 |
| tbl_mainmenu | 0.02 |
| tbl_permission | 0.02 |
| tbl_permission_inactive | 0.02 |
| tbl_service | 0.30 |
| tbl_servicedependency | 0.03 |
| tbl_serviceescalation | 0.03 |
| tbl_serviceextinfo | 0.03 |
| tbl_servicegroup | 0.03 |
| tbl_servicetemplate | 0.03 |
| tbl_session | 0.02 |
| tbl_session_locks | 0.02 |
| tbl_settings | 0.03 |
| tbl_submenu | 0.02 |
| tbl_timedefinition | 0.02 |
| tbl_timeperiod | 0.03 |
| tbl_user | 0.03 |
| tbl_variabledefinition | 0.19 |
| xi_auditlog | 2.06 |
| xi_auth_tokens | 1.03 |
| xi_cmp_trapdata | 0.03 |
| xi_cmp_trapdata_log | 0.03 |
| xi_commands | 0.02 |
| xi_eventqueue | 2348.66 |
| xi_events | 3966.00 |
| xi_incidents | 0.02 |
| xi_meta | 149175.50 |
| xi_options | 0.06 |
| xi_sessions | 0.03 |
| xi_sysstat | 0.03 |
| xi_usermeta | 0.17 |
| xi_users | 0.03 |
+--------------------------------------------+------------+

4. Below is the content of /etc/my.cnf file
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
# Settings user and group are ignored when systemd is used.
# If you need to run mysqld under a different user or group,
# customize your systemd unit file for mariadb according to the
# instructions in http://fedoraproject.org/wiki/Systemd

[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid

#
# include all files from the config directory
#
!includedir /etc/my.cnf.d

5. Please find attached fresh system profile after the above steps followed.

Once gain I really appreciate your help in assisting me resolve the issue. I have here with listed the ibdata1 file with its size as well. I need your help in reducing the size of this file as well. It grew insanely big.
157G Aug 15 20:03 ibdata1

I have also attached screen capture of Admin
Please let me know if you need anything else.
Moderator's Note: The profile has been shared with the support team but has been removed from the public forum.

Re: Nagios XI 5.5.11 - Issues

Posted: Fri Aug 16, 2019 11:13 am
by benjaminsmith
Hello @msmulpuri,

Some of the tables in the nagiosxi database has grown so large it's preventing the server from operating correcting. The server may have been shutdown incorrectly, corrupting tables and causing the database tables to grow excessively large.

Run the following commands to truncate the tables, and let us know if this resolve the issue for you.

Code: Select all

systemctl stop crond
systemctl stop npcd
systemctl stop nagios
systemctl stop ndo2db
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
rm -rf /usr/local/nagios/var/ndo2db.lock
rm -rf /usr/local/nagios/var/ndo2db.pid
rm -rf /usr/local/nagios/var/ndo2db.sock
rm -rf /usr/local/nagios/var/ndo.sock
rm -rf /us/local/nagiosxi/var/subsys/ndo2db
rm -rf /var/run/nagios.lock
rm -rf /usr/local/nagios/var/nagios.lock
systemctl restart mysqld || systemctl restart mariadb
echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | mysql -uroot -pnagiosxi -h 127.0.0.1 nagiosxi
systemctl start ndo2db
systemctl start nagios
systemctl start npcd
systemctl start crond
systemctl restart httpd
systemctl restart snmptt
Also, I would recommend increasing the max connections for the database. We have step-by-step instructions for this process in the following knowledge-base article.
Nagios XI - MySQL/MariaDB - Max Connections

Re: Nagios XI 5.5.11 - Issues

Posted: Tue Aug 20, 2019 2:35 pm
by msmulpuri
Hi,

I have followed the steps to resize the ibdata1 file and then your steps to further troubleshoot the issue. The steps worked and please close the topic. Thank you very much for all your help!

Re: Nagios XI 5.5.11 - Issues

Posted: Tue Aug 20, 2019 3:06 pm
by scottwilkerson
msmulpuri wrote:Hi,

I have followed the steps to resize the ibdata1 file and then your steps to further troubleshoot the issue. The steps worked and please close the topic. Thank you very much for all your help!
Great!

Locking