Page 1 of 1
ndo2db alternatives
Posted: Wed Jun 19, 2019 9:00 am
by lexisnexis
Nagios XI: 5.2.2
ndo2db: 2.1.2
We have a number of Nagios servers which are overloaded and we are finding a bottleneck where the ndo2db processing the mqueue is not fast enough. This is causing the mqueue to fill up and cause problems, we are looking into all options including breaking up the monitors on the Nagios servers.
I wanted to find out if you have an alternative to ndo2db, so the remote database we have can get updated and we are no longer limited by the mqueue?
Server Specs:
Code: Select all
CPU = 32 x Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
Memory = 65 GB
Disk Mount = 322.1 GB
OS = CentOS Linux release 7.2.1511
Code: Select all
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Nagios Core 4.4.3
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2019-01-15
License: GPL
Website: https://www.nagios.org
Reading configuration data...
Read main config file okay...
Read object config files okay...
Running pre-flight check on configuration data...
Checking objects...
Checked 49004 services.
Checked 3273 hosts.
Checked 19016 host groups.
Checked 12167 service groups.
Checked 313 contacts.
Checked 19429 contact groups.
Checked 188 commands.
Checked 11 time periods.
Checked 0 host escalations.
Checked 0 service escalations.
Checking for circular paths...
Checked 3273 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 11 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
Code: Select all
ipcs
------ Message Queues --------
key msqid owner perms used-bytes messages
0xf2000002 294912 nagios 600 262144000 256000
0x5d000002 2031617 nagios 600 262144000 256000
0xca000002 2621442 nagios 600 259445760 253365
0xd8000002 2654211 nagios 600 261363712 255238
0x09000002 2686980 nagios 600 261277696 255154
0x65000002 2883589 nagios 600 261379072 255253
0x49000002 2916358 nagios 600 0 0
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x011416ef 163840 root 600 1000 11
0x0001610e 196609 nagios 600 4096 225
0x0114338b 131074 root 600 1000 0
------ Semaphore Arrays --------
key semid owner perms nsems
0x00000000 589824 apache 600 1
0x00000000 622593 apache 600 1
0x00000000 229378 apache 600 1
0x00000000 655363 apache 600 1
0x00000000 688132 apache 600 1
0x00000000 720901 apache 600 1
0x00000000 851974 apache 600 1
0x00000000 884743 apache 600 1
0x00000000 819208 apache 600 1
0x00000000 917513 apache 600 1
0x00000000 950282 apache 600 1
0x00000000 983051 apache 600 1
Re: ndo2db alternatives
Posted: Wed Jun 19, 2019 1:40 pm
by benjaminsmith
Hello
@lexisnexis,
I wanted to find out if you have an alternative to ndo2db, so the remote database we have can get updated and we are no longer limited by the mqueue?
We don't have an alternative to ndo2db. Generally, we recommend adding an additional XI server when the combined hosts and services exceeds 20,000.
1. Can you provide a system profile so we can take a closer look at your system?
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message or upload it to the post/ticket.
2. Are you using Mod Gearman in your environment?
3. Please post the full output to the following query.
Code: Select all
echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -uroot -pnagiosxi --table
3. Increasing the operating system message queue can allow more messages to be processed by NDOutils. See the guide below:
NDOUtils - Message Queue Exceeded
4. Our general guide for increasing system performance
Maximizing Performance In Nagios XI
Re: ndo2db alternatives
Posted: Wed Jun 19, 2019 7:34 pm
by rajasegar
lexisnexis wrote:Nagios XI: 5.2.2
ndo2db: 2.1.2
We have a number of Nagios servers which are overloaded and we are finding a bottleneck where the ndo2db processing the mqueue is not fast enough. This is causing the mqueue to fill up and cause problems, we are looking into all options including breaking up the monitors on the Nagios servers.
I wanted to find out if you have an alternative to ndo2db, so the remote database we have can get updated and we are no longer limited by the mqueue?
Server Specs:
Code: Select all
CPU = 32 x Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
Memory = 65 GB
Disk Mount = 322.1 GB
OS = CentOS Linux release 7.2.1511
Code: Select all
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Nagios Core 4.4.3
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2019-01-15
License: GPL
Website: https://www.nagios.org
Reading configuration data...
Read main config file okay...
Read object config files okay...
Running pre-flight check on configuration data...
Checking objects...
Checked 49004 services.
Checked 3273 hosts.
Checked 19016 host groups.
Checked 12167 service groups.
Checked 313 contacts.
Checked 19429 contact groups.
Checked 188 commands.
Checked 11 time periods.
Checked 0 host escalations.
Checked 0 service escalations.
Checking for circular paths...
Checked 3273 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 11 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
Code: Select all
ipcs
------ Message Queues --------
key msqid owner perms used-bytes messages
0xf2000002 294912 nagios 600 262144000 256000
0x5d000002 2031617 nagios 600 262144000 256000
0xca000002 2621442 nagios 600 259445760 253365
0xd8000002 2654211 nagios 600 261363712 255238
0x09000002 2686980 nagios 600 261277696 255154
0x65000002 2883589 nagios 600 261379072 255253
0x49000002 2916358 nagios 600 0 0
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x011416ef 163840 root 600 1000 11
0x0001610e 196609 nagios 600 4096 225
0x0114338b 131074 root 600 1000 0
------ Semaphore Arrays --------
key semid owner perms nsems
0x00000000 589824 apache 600 1
0x00000000 622593 apache 600 1
0x00000000 229378 apache 600 1
0x00000000 655363 apache 600 1
0x00000000 688132 apache 600 1
0x00000000 720901 apache 600 1
0x00000000 851974 apache 600 1
0x00000000 884743 apache 600 1
0x00000000 819208 apache 600 1
0x00000000 917513 apache 600 1
0x00000000 950282 apache 600 1
0x00000000 983051 apache 600 1
Been there and went through this hell before. Just split it into 2 instances.
Remote database sometimes is the problem. Try moving it to the local server to test.
Even after splitting and maintaining below 20000 we had this problem every once in a while.
Fast storage helps, after moving to SSD based SAN we never this problem anymore.
Re: ndo2db alternatives
Posted: Thu Jun 20, 2019 6:41 am
by scottwilkerson
rajasegar wrote:Fast storage helps, after moving to SSD based SAN we never this problem anymore.
This is the best advice here. The impact of having
super fast storage can not be over-stated, it is the best thing you can do for performance on a Nagios XI system.
Re: ndo2db alternatives
Posted: Mon Jun 24, 2019 6:55 am
by WillemDH
Imho it's not normal that the ipcs queue is the bottleneck of a Nagios XI server. Our Nagios server has all suggested performance tuning, but we still had similar ipcs queuing issues several times. This while cpu load and ram usage is completely under control.. The upgrade to 5.5 made the situation even worse. Hopefully this issue get's resolved in a future release, as it's 1 of the bigger issues Nagios currently has.
Re: ndo2db alternatives
Posted: Mon Jun 24, 2019 8:11 am
by lexisnexis
Thank you for the information. I can rule out using something other then ndo2db then do to the job. We are currently using a remote database and we have been considering testing moving the database back to the local server. We are currently evaluating the usage and resources needed to do this.
As for the information you requested it is below. Maximizing Performance In Nagios XI = Already Applied these
Code: Select all
+--------------------------------------------+------------+
| Table | Size in MB |
+--------------------------------------------+------------+
| nagios_acknowledgements | 0.17 |
| nagios_commands | 0.09 |
| nagios_commenthistory | 44.06 |
| nagios_comments | 0.06 |
| nagios_configfiles | 0.03 |
| nagios_configfilevariables | 0.05 |
| nagios_conninfo | 2.52 |
| nagios_contact_addresses | 0.03 |
| nagios_contact_notificationcommands | 0.48 |
| nagios_contactgroup_members | 0.33 |
| nagios_contactgroups | 4.03 |
| nagios_contactnotificationmethods | 32.05 |
| nagios_contactnotifications | 40.09 |
| nagios_contacts | 0.13 |
| nagios_contactstatus | 0.08 |
| nagios_customvariables | 0.09 |
| nagios_customvariablestatus | 0.09 |
| nagios_dbversion | 0.02 |
| nagios_downtimehistory | 2.92 |
| nagios_eventhandlers | 465.50 |
| nagios_externalcommands | 0.11 |
| nagios_flappinghistory | 1.52 |
| nagios_host_contactgroups | 0.39 |
| nagios_host_contacts | 0.03 |
| nagios_host_parenthosts | 0.03 |
| nagios_hostchecks | 0.03 |
| nagios_hostdependencies | 0.03 |
| nagios_hostescalation_contactgroups | 0.03 |
| nagios_hostescalation_contacts | 0.03 |
| nagios_hostescalations | 0.03 |
| nagios_hostgroup_members | 10.03 |
| nagios_hostgroups | 4.03 |
| nagios_hosts | 2.84 |
| nagios_hoststatus | 3.36 |
| nagios_instances | 0.02 |
| nagios_logentries | 97904.52 |
| nagios_notifications | 6547.00 |
| nagios_objects | 51.23 |
| nagios_processevents | 1.02 |
| nagios_programstatus | 0.03 |
| nagios_runtimevariables | 0.03 |
| nagios_scheduleddowntime | 0.03 |
| nagios_service_contactgroups | 3.03 |
| nagios_service_contacts | 0.03 |
| nagios_service_parentservices | 0.03 |
| nagios_servicechecks | 0.06 |
| nagios_servicedependencies | 0.03 |
| nagios_serviceescalation_contactgroups | 0.03 |
| nagios_serviceescalation_contacts | 0.03 |
| nagios_serviceescalations | 0.03 |
| nagios_servicegroup_members | 3.03 |
| nagios_servicegroups | 1.92 |
| nagios_services | 15.22 |
| nagios_servicestatus | 61.36 |
| nagios_statehistory | 2193.98 |
| nagios_systemcommands | 27.08 |
| nagios_timedeventqueue | 0.09 |
| nagios_timedevents | 0.09 |
| nagios_timeperiod_timeranges | 0.03 |
| nagios_timeperiods | 0.03 |
| tbl_command | 0.06 |
| tbl_contact | 0.09 |
| tbl_contactgroup | 4.00 |
| tbl_contacttemplate | 0.03 |
| tbl_domain | 0.03 |
| tbl_host | 1.59 |
| tbl_hostdependency | 0.03 |
| tbl_hostescalation | 0.03 |
| tbl_hostextinfo | 0.03 |
| tbl_hostgroup | 3.95 |
| tbl_hosttemplate | 0.03 |
| tbl_info | 0.17 |
| tbl_lnkcontactgrouptocontact | 0.02 |
| tbl_lnkcontactgrouptocontactgroup | 0.02 |
| tbl_lnkcontacttemplatetocommandhost | 0.02 |
| tbl_lnkcontacttemplatetocommandservice | 0.02 |
| tbl_lnkcontacttemplatetocontactgroup | 0.02 |
| tbl_lnkcontacttemplatetocontacttemplate | 0.02 |
| tbl_lnkcontacttemplatetovariabledefinition | 0.02 |
| tbl_lnkcontacttocommandhost | 0.02 |
| tbl_lnkcontacttocommandservice | 0.02 |
| tbl_lnkcontacttocontactgroup | 1.36 |
| tbl_lnkcontacttocontacttemplate | 0.02 |
| tbl_lnkcontacttovariabledefinition | 0.02 |
| tbl_lnkhostdependencytohost_dh | 0.02 |
| tbl_lnkhostdependencytohost_h | 0.02 |
| tbl_lnkhostdependencytohostgroup_dh | 0.02 |
| tbl_lnkhostdependencytohostgroup_h | 0.02 |
| tbl_lnkhostescalationtocontact | 0.02 |
| tbl_lnkhostescalationtocontactgroup | 0.02 |
| tbl_lnkhostescalationtohost | 0.02 |
| tbl_lnkhostescalationtohostgroup | 0.02 |
| tbl_lnkhostgrouptohost | 0.02 |
| tbl_lnkhostgrouptohostgroup | 0.02 |
| tbl_lnkhosttemplatetocontact | 0.02 |
| tbl_lnkhosttemplatetocontactgroup | 0.02 |
| tbl_lnkhosttemplatetohost | 0.02 |
| tbl_lnkhosttemplatetohostgroup | 0.02 |
| tbl_lnkhosttemplatetohosttemplate | 0.02 |
| tbl_lnkhosttemplatetovariabledefinition | 0.02 |
| tbl_lnkhosttocontact | 0.02 |
| tbl_lnkhosttocontactgroup | 0.02 |
| tbl_lnkhosttohost | 0.02 |
| tbl_lnkhosttohostgroup | 5.05 |
| tbl_lnkhosttohosttemplate | 0.16 |
| tbl_lnkhosttovariabledefinition | 0.02 |
| tbl_lnkservicedependencytohost_dh | 0.02 |
| tbl_lnkservicedependencytohost_h | 0.02 |
| tbl_lnkservicedependencytohostgroup_dh | 0.02 |
| tbl_lnkservicedependencytohostgroup_h | 0.02 |
| tbl_lnkservicedependencytoservice_ds | 0.02 |
| tbl_lnkservicedependencytoservice_s | 0.02 |
| tbl_lnkserviceescalationtocontact | 0.02 |
| tbl_lnkserviceescalationtocontactgroup | 0.02 |
| tbl_lnkserviceescalationtohost | 0.02 |
| tbl_lnkserviceescalationtohostgroup | 0.02 |
| tbl_lnkserviceescalationtoservice | 0.02 |
| tbl_lnkservicegrouptoservice | 0.02 |
| tbl_lnkservicegrouptoservicegroup | 0.02 |
| tbl_lnkservicetemplatetocontact | 0.02 |
| tbl_lnkservicetemplatetocontactgroup | 0.02 |
| tbl_lnkservicetemplatetohost | 0.02 |
| tbl_lnkservicetemplatetohostgroup | 0.02 |
| tbl_lnkservicetemplatetoservicegroup | 0.02 |
| tbl_lnkservicetemplatetoservicetemplate | 0.02 |
| tbl_lnkservicetemplatetovariabledefinition | 0.02 |
| tbl_lnkservicetocontact | 0.02 |
| tbl_lnkservicetocontactgroup | 0.34 |
| tbl_lnkservicetohost | 0.02 |
| tbl_lnkservicetohostgroup | 0.36 |
| tbl_lnkservicetoservicegroup | 0.33 |
| tbl_lnkservicetoservicetemplate | 0.50 |
| tbl_lnkservicetovariabledefinition | 0.02 |
| tbl_lnktimeperiodtotimeperiod | 0.02 |
| tbl_logbook | 0.33 |
| tbl_mainmenu | 0.02 |
| tbl_permission | 0.02 |
| tbl_permission_inactive | 0.02 |
| tbl_service | 3.41 |
| tbl_servicedependency | 0.03 |
| tbl_serviceescalation | 0.03 |
| tbl_serviceextinfo | 0.03 |
| tbl_servicegroup | 3.03 |
| tbl_servicetemplate | 0.03 |
| tbl_session | 0.01 |
| tbl_session_locks | 0.00 |
| tbl_settings | 0.03 |
| tbl_submenu | 0.02 |
| tbl_timedefinition | 0.02 |
| tbl_timeperiod | 0.03 |
| tbl_user | 0.03 |
| tbl_variabledefinition | 1.52 |
| xi_auditlog | 1.55 |
| xi_auth_tokens | 0.03 |
| xi_cmp_trapdata | 0.03 |
| xi_cmp_trapdata_log | 0.03 |
| xi_commands | 0.14 |
| xi_eventqueue | 15.77 |
| xi_events | 780.78 |
| xi_incidents | 0.02 |
| xi_meta | 12969.50 |
| xi_options | 0.03 |
| xi_sessions | 0.27 |
| xi_sysstat | 0.03 |
| xi_usermeta | 1.80 |
| xi_users | 0.08 |
+--------------------------------------------+------------+
Code: Select all
# System default settings live in /usr/lib/sysctl.d/00-system.conf.
# To override those settings, enter new settings here, or in an /etc/sysctl.d/<name>.conf file
#
# For more information, see sysctl.conf(5) and sysctl.d(5).
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1
net.ipv6.conf.default.accept_redirects = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.ip_forward = 0
kernel.exec-shield=1
kernel.randomize_va_space = 2
net.ipv4.tcp_syncookies = 1
net.ipv4.conf.default.secure_redirects = 0
kernel.msgmnb = 262144000
kernel.msgmax = 262144000
kernal.msgmni = 512000
kernel.shmmax = 4294967295
kernel.shmall = 268435456
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.log_martians = 1
net.ipv4.conf.default.log_martians = 1
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv6.conf.all.accept_ra = 0
net.ipv6.conf.default.accept_ra = 0
net.ipv6.conf.all.accept_redirects = 0
Re: ndo2db alternatives
Posted: Mon Jun 24, 2019 12:28 pm
by tgriep
I see that there are some corrupt tables in the MYSQL database that could be slowing down the writes.
To clean out the MYSQL tables, run the following as root.
Code: Select all
service crond stop
service nagios stop
service ndo2db stop
echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | mysql -u root -pnagiosxi nagiosxi
mysqlcheck -f -r -u root -pnagiosxi --databases nagiosxi
mysqlcheck -f -o -u root -pnagiosxi --databases nagiosxi
service mysqld restart
service httpd restart
service ndo2db start
service nagios start
service crond start
Let us know if this helps the server process the Kernel Message Queue better.