Nagios XI 5.5.11 - Issues

This board serves as an open discussion and support collaboration point for Nagios XI. NOTE: Nagios XI customers should use the Customer Support forum to obtain expedited support.

Nagios XI 5.5.11 - Issues

Postby msmulpuri » Thu Aug 15, 2019 12:57 pm

Hi,

We have VM instance of Nagios XI 5.5.11 installed and running at a customer site. Seeing all kinds of issues as this instance monitors over 2300 of hosts and over 1100 of services. The monitoring also includes SNMP Traps and polling in place. SNMP Traps keeps on increasing and stuck in the as they slowly clear up. A very high number of traps generated. Also, seeing the below in dbmaint.log even after the suggested repair is done. The file ibdata1 (MariaDB) is huge and keeps on growing. Continuous performance degradation of the VM due to CPU spikes and large Memory and swap space usage. I have attached the System Profile to this post just in case if needed. Please help.

Configuration of the VM:
CentOS 7
Nagios XI 5.5.11
Memory allocated: 24 GB
Swap space: 16 GB

dbmaint.log output:
LOCKFILE '/usr/local/nagiosxi/var/dbmaint.lock' EXISTS - EXITING!
LOCKFILE '/usr/local/nagiosxi/var/dbmaint.lock' EXISTS - EXITING!
LOCKFILE '/usr/local/nagiosxi/var/dbmaint.lock' EXISTS - EXITING!
LOCKFILE '/usr/local/nagiosxi/var/dbmaint.lock' IS OLD - REMOVING
CREATING: /usr/local/nagiosxi/var/dbmaint.lock
<h3>Database Error</h3>A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.<p>Run the following from the CLI as root to attempt to repair the DB:<br><pre>/usr/local/nagiosxi/scripts/repair_databases.sh</pre></p>LOCKFILE '/usr/local/nagiosxi/var/dbmaint.lock' EXISTS - EXITING!
LOCKFILE '/usr/local/nagiosxi/var/dbmaint.lock' EXISTS - EXITING!
LOCKFILE '/usr/local/nagiosxi/var/dbmaint.lock' EXISTS - EXITING!
LOCKFILE '/usr/local/nagiosxi/var/dbmaint.lock' EXISTS - EXITING!
LOCKFILE '/usr/local/nagiosxi/var/dbmaint.lock' EXISTS - EXITING!

Support Edit: profile.zip has been downloaded shared with the team.
Attachments
nagiosxi_status.png
nagiosxi_status.png (14.25 KiB) Viewed 175 times
msmulpuri
 
Posts: 26
Joined: Thu Sep 22, 2016 7:40 am

Re: Nagios XI 5.5.11 - Issues

Postby benjaminsmith » Thu Aug 15, 2019 4:53 pm

Hello @msmulpuri,
<h3>Database Error</h3>A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.<p>Run the following from the CLI as root to attempt to repair the DB:<br>

Besides the error above, also a fatal php database call in the Apache log, and this is most likely causing the CPU/performance issues.

Run through the following commands to stop the processes, clear the message queue, repair the database and restart.
Code: Select all
systemctl stop crond
systemctl stop npcd
systemctl stop nagios
systemctl stop ndo2db
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
systemctl restart mysqld || systemctl restart mariadb
cd /usr/local/nagiosxi/scripts
./repair_databases.sh
systemctl start npcd
systemctl start crond
systemctl start nagios
systemctl start ndo2db

After running the above commands, can you send over a fresh system profile along with the database configuration file ( /etc/my.cnf ), and post the full output of the following commands:

Check for Corrupted Tables
Code: Select all
echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table | grep NULL

Check Table Sizes
Code: Select all
echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -uroot -pnagiosxi --table

Thanks.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
benjaminsmith
 
Posts: 1473
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Nagios XI 5.5.11 - Issues

Postby msmulpuri » Thu Aug 15, 2019 8:09 pm

Hello,

First of all I would like to thank you for your quick response for my concern. Please find below the output for the steps that you requested me follow.

1. Database repair script yielded the below result.
===============
REPAIR COMPLETE
===============
DATABASE: nagiosql
TABLE:
/var/lib/mysql/nagiosql /usr/local/nagiosxi/var
No *.MYI files found, skipping nagiosql...
DATABASE: nagiosxi
TABLE:
/var/lib/mysql/nagiosxi /usr/local/nagiosxi/var
No *.MYI files found, skipping nagiosxi...

=======================
nagios database repair succeeded
nagiosql database repair skipped, no *.MYI files found
nagiosxi database repair skipped, no *.MYI files found

2. echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table | grep NULL
-No output returned

3. echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -uroot -pnagiosxi --table
+--------------------------------------------+------------+
| Table | Size in MB |
+--------------------------------------------+------------+
| nagios_acknowledgements | 0.00 |
| nagios_commands | 0.02 |
| nagios_commenthistory | 0.12 |
| nagios_comments | 0.00 |
| nagios_configfiles | 0.00 |
| nagios_configfilevariables | 0.01 |
| nagios_conninfo | 0.01 |
| nagios_contact_addresses | 0.00 |
| nagios_contact_notificationcommands | 0.01 |
| nagios_contactgroup_members | 0.00 |
| nagios_contactgroups | 0.00 |
| nagios_contactnotificationmethods | 282.09 |
| nagios_contactnotifications | 298.80 |
| nagios_contacts | 0.00 |
| nagios_contactstatus | 0.00 |
| nagios_customvariables | 0.23 |
| nagios_customvariablestatus | 0.23 |
| nagios_dbversion | 0.00 |
| nagios_downtimehistory | 0.00 |
| nagios_eventhandlers | 324.51 |
| nagios_externalcommands | 706.02 |
| nagios_flappinghistory | 0.05 |
| nagios_host_contactgroups | 0.00 |
| nagios_host_contacts | 0.19 |
| nagios_host_parenthosts | 0.00 |
| nagios_hostchecks | 0.00 |
| nagios_hostdependencies | 0.00 |
| nagios_hostescalation_contactgroups | 0.00 |
| nagios_hostescalation_contacts | 0.00 |
| nagios_hostescalations | 0.00 |
| nagios_hostgroup_members | 0.10 |
| nagios_hostgroups | 0.00 |
| nagios_hosts | 0.55 |
| nagios_hoststatus | 1.28 |
| nagios_instances | 0.00 |
| nagios_logentries | 2440.71 |
| nagios_notifications | 412.47 |
| nagios_objects | 0.32 |
| nagios_processevents | 0.03 |
| nagios_programstatus | 0.00 |
| nagios_runtimevariables | 0.00 |
| nagios_scheduleddowntime | 0.00 |
| nagios_service_contactgroups | 0.01 |
| nagios_service_contacts | 0.19 |
| nagios_service_parentservices | 0.00 |
| nagios_servicechecks | 0.00 |
| nagios_servicedependencies | 0.00 |
| nagios_serviceescalation_contactgroups | 0.00 |
| nagios_serviceescalation_contacts | 0.00 |
| nagios_serviceescalations | 0.00 |
| nagios_servicegroup_members | 0.04 |
| nagios_servicegroups | 0.00 |
| nagios_services | 0.32 |
| nagios_servicestatus | 0.73 |
| nagios_statehistory | 91.80 |
| nagios_systemcommands | 1.46 |
| nagios_timedeventqueue | 0.00 |
| nagios_timedevents | 0.00 |
| nagios_timeperiod_timeranges | 0.01 |
| nagios_timeperiods | 0.00 |
| tbl_command | 0.06 |
| tbl_contact | 0.03 |
| tbl_contactgroup | 0.03 |
| tbl_contacttemplate | 0.03 |
| tbl_domain | 0.03 |
| tbl_host | 0.48 |
| tbl_hostdependency | 0.03 |
| tbl_hostescalation | 0.03 |
| tbl_hostextinfo | 0.03 |
| tbl_hostgroup | 0.03 |
| tbl_hosttemplate | 0.03 |
| tbl_info | 0.17 |
| tbl_lnkContactToCommandHost | 0.02 |
| tbl_lnkContactToCommandService | 0.02 |
| tbl_lnkContactToContactgroup | 0.02 |
| tbl_lnkContactToContacttemplate | 0.02 |
| tbl_lnkContactToVariabledefinition | 0.02 |
| tbl_lnkContactgroupToContact | 0.02 |
| tbl_lnkContactgroupToContactgroup | 0.02 |
| tbl_lnkContacttemplateToCommandHost | 0.02 |
| tbl_lnkContacttemplateToCommandService | 0.02 |
| tbl_lnkContacttemplateToContactgroup | 0.02 |
| tbl_lnkContacttemplateToContacttemplate | 0.02 |
| tbl_lnkContacttemplateToVariabledefinition | 0.02 |
| tbl_lnkHostToContact | 0.16 |
| tbl_lnkHostToContactgroup | 0.02 |
| tbl_lnkHostToHost | 0.02 |
| tbl_lnkHostToHostgroup | 0.02 |
| tbl_lnkHostToHosttemplate | 0.11 |
| tbl_lnkHostToVariabledefinition | 0.09 |
| tbl_lnkHostdependencyToHost_DH | 0.02 |
| tbl_lnkHostdependencyToHost_H | 0.02 |
| tbl_lnkHostdependencyToHostgroup_DH | 0.02 |
| tbl_lnkHostdependencyToHostgroup_H | 0.02 |
| tbl_lnkHostescalationToContact | 0.02 |
| tbl_lnkHostescalationToContactgroup | 0.02 |
| tbl_lnkHostescalationToHost | 0.02 |
| tbl_lnkHostescalationToHostgroup | 0.02 |
| tbl_lnkHostgroupToHost | 0.09 |
| tbl_lnkHostgroupToHostgroup | 0.02 |
| tbl_lnkHosttemplateToContact | 0.02 |
| tbl_lnkHosttemplateToContactgroup | 0.02 |
| tbl_lnkHosttemplateToHost | 0.02 |
| tbl_lnkHosttemplateToHostgroup | 0.02 |
| tbl_lnkHosttemplateToHosttemplate | 0.02 |
| tbl_lnkHosttemplateToVariabledefinition | 0.02 |
| tbl_lnkServiceToContact | 0.23 |
| tbl_lnkServiceToContactgroup | 0.02 |
| tbl_lnkServiceToHost | 0.06 |
| tbl_lnkServiceToHostgroup | 0.02 |
| tbl_lnkServiceToServicegroup | 0.02 |
| tbl_lnkServiceToServicetemplate | 0.06 |
| tbl_lnkServiceToVariabledefinition | 0.05 |
| tbl_lnkServicedependencyToHost_DH | 0.02 |
| tbl_lnkServicedependencyToHost_H | 0.02 |
| tbl_lnkServicedependencyToHostgroup_DH | 0.02 |
| tbl_lnkServicedependencyToHostgroup_H | 0.02 |
| tbl_lnkServicedependencyToService_DS | 0.02 |
| tbl_lnkServicedependencyToService_S | 0.02 |
| tbl_lnkServiceescalationToContact | 0.02 |
| tbl_lnkServiceescalationToContactgroup | 0.02 |
| tbl_lnkServiceescalationToHost | 0.02 |
| tbl_lnkServiceescalationToHostgroup | 0.02 |
| tbl_lnkServiceescalationToService | 0.02 |
| tbl_lnkServicegroupToService | 0.05 |
| tbl_lnkServicegroupToServicegroup | 0.02 |
| tbl_lnkServicetemplateToContact | 0.02 |
| tbl_lnkServicetemplateToContactgroup | 0.02 |
| tbl_lnkServicetemplateToHost | 0.02 |
| tbl_lnkServicetemplateToHostgroup | 0.02 |
| tbl_lnkServicetemplateToServicegroup | 0.02 |
| tbl_lnkServicetemplateToServicetemplate | 0.02 |
| tbl_lnkServicetemplateToVariabledefinition | 0.02 |
| tbl_lnkTimeperiodToTimeperiod | 0.02 |
| tbl_logbook | 0.02 |
| tbl_mainmenu | 0.02 |
| tbl_permission | 0.02 |
| tbl_permission_inactive | 0.02 |
| tbl_service | 0.30 |
| tbl_servicedependency | 0.03 |
| tbl_serviceescalation | 0.03 |
| tbl_serviceextinfo | 0.03 |
| tbl_servicegroup | 0.03 |
| tbl_servicetemplate | 0.03 |
| tbl_session | 0.02 |
| tbl_session_locks | 0.02 |
| tbl_settings | 0.03 |
| tbl_submenu | 0.02 |
| tbl_timedefinition | 0.02 |
| tbl_timeperiod | 0.03 |
| tbl_user | 0.03 |
| tbl_variabledefinition | 0.19 |
| xi_auditlog | 2.06 |
| xi_auth_tokens | 1.03 |
| xi_cmp_trapdata | 0.03 |
| xi_cmp_trapdata_log | 0.03 |
| xi_commands | 0.02 |
| xi_eventqueue | 2348.66 |
| xi_events | 3966.00 |
| xi_incidents | 0.02 |
| xi_meta | 149175.50 |
| xi_options | 0.06 |
| xi_sessions | 0.03 |
| xi_sysstat | 0.03 |
| xi_usermeta | 0.17 |
| xi_users | 0.03 |
+--------------------------------------------+------------+

4. Below is the content of /etc/my.cnf file
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
# Settings user and group are ignored when systemd is used.
# If you need to run mysqld under a different user or group,
# customize your systemd unit file for mariadb according to the
# instructions in http://fedoraproject.org/wiki/Systemd

[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid

#
# include all files from the config directory
#
!includedir /etc/my.cnf.d

5. Please find attached fresh system profile after the above steps followed.

Once gain I really appreciate your help in assisting me resolve the issue. I have here with listed the ibdata1 file with its size as well. I need your help in reducing the size of this file as well. It grew insanely big.
157G Aug 15 20:03 ibdata1

I have also attached screen capture of Admin
Please let me know if you need anything else.
Moderator's Note: The profile has been shared with the support team but has been removed from the public forum.
Attachments
nagiosxi_capture.png
nagiosxi_capture.png (39.59 KiB) Viewed 152 times
msmulpuri
 
Posts: 26
Joined: Thu Sep 22, 2016 7:40 am

Re: Nagios XI 5.5.11 - Issues

Postby benjaminsmith » Fri Aug 16, 2019 11:13 am

Hello @msmulpuri,

Some of the tables in the nagiosxi database has grown so large it's preventing the server from operating correcting. The server may have been shutdown incorrectly, corrupting tables and causing the database tables to grow excessively large.

Run the following commands to truncate the tables, and let us know if this resolve the issue for you.
Code: Select all
systemctl stop crond
systemctl stop npcd
systemctl stop nagios
systemctl stop ndo2db
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
rm -rf /usr/local/nagios/var/ndo2db.lock
rm -rf /usr/local/nagios/var/ndo2db.pid
rm -rf /usr/local/nagios/var/ndo2db.sock
rm -rf /usr/local/nagios/var/ndo.sock
rm -rf /us/local/nagiosxi/var/subsys/ndo2db
rm -rf /var/run/nagios.lock
rm -rf /usr/local/nagios/var/nagios.lock
systemctl restart mysqld || systemctl restart mariadb
echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | mysql -uroot -pnagiosxi -h 127.0.0.1 nagiosxi
systemctl start ndo2db
systemctl start nagios
systemctl start npcd
systemctl start crond
systemctl restart httpd
systemctl restart snmptt


Also, I would recommend increasing the max connections for the database. We have step-by-step instructions for this process in the following knowledge-base article.
Nagios XI - MySQL/MariaDB - Max Connections
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
benjaminsmith
 
Posts: 1473
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Nagios XI 5.5.11 - Issues

Postby msmulpuri » Tue Aug 20, 2019 2:35 pm

Hi,

I have followed the steps to resize the ibdata1 file and then your steps to further troubleshoot the issue. The steps worked and please close the topic. Thank you very much for all your help!
msmulpuri
 
Posts: 26
Joined: Thu Sep 22, 2016 7:40 am

Re: Nagios XI 5.5.11 - Issues

Postby scottwilkerson » Tue Aug 20, 2019 3:06 pm

msmulpuri wrote:Hi,

I have followed the steps to resize the ibdata1 file and then your steps to further troubleshoot the issue. The steps worked and please close the topic. Thank you very much for all your help!

Great!

Locking
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 15795
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises


Return to Nagios XI

Who is online

Users browsing this forum: No registered users and 13 guests