ndo2db alternatives

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
lexisnexis
Posts: 27
Joined: Wed Dec 30, 2015 3:19 pm

ndo2db alternatives

Post by lexisnexis »

Nagios XI: 5.2.2
ndo2db: 2.1.2

We have a number of Nagios servers which are overloaded and we are finding a bottleneck where the ndo2db processing the mqueue is not fast enough. This is causing the mqueue to fill up and cause problems, we are looking into all options including breaking up the monitors on the Nagios servers.

I wanted to find out if you have an alternative to ndo2db, so the remote database we have can get updated and we are no longer limited by the mqueue?

Server Specs:

Code: Select all

CPU = 32 x Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
Memory = 65 GB
Disk Mount = 322.1 GB
OS = CentOS Linux release 7.2.1511

Code: Select all

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Nagios Core 4.4.3
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2019-01-15
License: GPL

Website: https://www.nagios.org
Reading configuration data...
   Read main config file okay...
   Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
        Checked 49004 services.
        Checked 3273 hosts.
        Checked 19016 host groups.
        Checked 12167 service groups.
        Checked 313 contacts.
        Checked 19429 contact groups.
        Checked 188 commands.
        Checked 11 time periods.
        Checked 0 host escalations.
        Checked 0 service escalations.
Checking for circular paths...
        Checked 3273 hosts
        Checked 0 service dependencies
        Checked 0 host dependencies
        Checked 11 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check

Code: Select all

ipcs

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages    
0xf2000002 294912     nagios     600        262144000    256000      
0x5d000002 2031617    nagios     600        262144000    256000      
0xca000002 2621442    nagios     600        259445760    253365      
0xd8000002 2654211    nagios     600        261363712    255238      
0x09000002 2686980    nagios     600        261277696    255154      
0x65000002 2883589    nagios     600        261379072    255253      
0x49000002 2916358    nagios     600        0            0           

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      
0x011416ef 163840     root       600        1000       11                      
0x0001610e 196609     nagios     600        4096       225                     
0x0114338b 131074     root       600        1000       0                       

------ Semaphore Arrays --------
key        semid      owner      perms      nsems     
0x00000000 589824     apache     600        1         
0x00000000 622593     apache     600        1         
0x00000000 229378     apache     600        1         
0x00000000 655363     apache     600        1         
0x00000000 688132     apache     600        1         
0x00000000 720901     apache     600        1         
0x00000000 851974     apache     600        1         
0x00000000 884743     apache     600        1         
0x00000000 819208     apache     600        1         
0x00000000 917513     apache     600        1         
0x00000000 950282     apache     600        1         
0x00000000 983051     apache     600        1         
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: ndo2db alternatives

Post by benjaminsmith »

Hello @lexisnexis,
I wanted to find out if you have an alternative to ndo2db, so the remote database we have can get updated and we are no longer limited by the mqueue?
We don't have an alternative to ndo2db. Generally, we recommend adding an additional XI server when the combined hosts and services exceeds 20,000.

1. Can you provide a system profile so we can take a closer look at your system?

Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message or upload it to the post/ticket.

2. Are you using Mod Gearman in your environment?

3. Please post the full output to the following query.

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -uroot -pnagiosxi --table
3. Increasing the operating system message queue can allow more messages to be processed by NDOutils. See the guide below:
NDOUtils - Message Queue Exceeded

4. Our general guide for increasing system performance
Maximizing Performance In Nagios XI
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: ndo2db alternatives

Post by rajasegar »

lexisnexis wrote:Nagios XI: 5.2.2
ndo2db: 2.1.2

We have a number of Nagios servers which are overloaded and we are finding a bottleneck where the ndo2db processing the mqueue is not fast enough. This is causing the mqueue to fill up and cause problems, we are looking into all options including breaking up the monitors on the Nagios servers.

I wanted to find out if you have an alternative to ndo2db, so the remote database we have can get updated and we are no longer limited by the mqueue?

Server Specs:

Code: Select all

CPU = 32 x Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
Memory = 65 GB
Disk Mount = 322.1 GB
OS = CentOS Linux release 7.2.1511

Code: Select all

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Nagios Core 4.4.3
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2019-01-15
License: GPL

Website: https://www.nagios.org
Reading configuration data...
   Read main config file okay...
   Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
        Checked 49004 services.
        Checked 3273 hosts.
        Checked 19016 host groups.
        Checked 12167 service groups.
        Checked 313 contacts.
        Checked 19429 contact groups.
        Checked 188 commands.
        Checked 11 time periods.
        Checked 0 host escalations.
        Checked 0 service escalations.
Checking for circular paths...
        Checked 3273 hosts
        Checked 0 service dependencies
        Checked 0 host dependencies
        Checked 11 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check

Code: Select all

ipcs

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages    
0xf2000002 294912     nagios     600        262144000    256000      
0x5d000002 2031617    nagios     600        262144000    256000      
0xca000002 2621442    nagios     600        259445760    253365      
0xd8000002 2654211    nagios     600        261363712    255238      
0x09000002 2686980    nagios     600        261277696    255154      
0x65000002 2883589    nagios     600        261379072    255253      
0x49000002 2916358    nagios     600        0            0           

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      
0x011416ef 163840     root       600        1000       11                      
0x0001610e 196609     nagios     600        4096       225                     
0x0114338b 131074     root       600        1000       0                       

------ Semaphore Arrays --------
key        semid      owner      perms      nsems     
0x00000000 589824     apache     600        1         
0x00000000 622593     apache     600        1         
0x00000000 229378     apache     600        1         
0x00000000 655363     apache     600        1         
0x00000000 688132     apache     600        1         
0x00000000 720901     apache     600        1         
0x00000000 851974     apache     600        1         
0x00000000 884743     apache     600        1         
0x00000000 819208     apache     600        1         
0x00000000 917513     apache     600        1         
0x00000000 950282     apache     600        1         
0x00000000 983051     apache     600        1         
Been there and went through this hell before. Just split it into 2 instances.
Remote database sometimes is the problem. Try moving it to the local server to test.

Even after splitting and maintaining below 20000 we had this problem every once in a while.
Fast storage helps, after moving to SSD based SAN we never this problem anymore.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: ndo2db alternatives

Post by scottwilkerson »

rajasegar wrote:Fast storage helps, after moving to SSD based SAN we never this problem anymore.
This is the best advice here. The impact of having super fast storage can not be over-stated, it is the best thing you can do for performance on a Nagios XI system.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: ndo2db alternatives

Post by WillemDH »

Imho it's not normal that the ipcs queue is the bottleneck of a Nagios XI server. Our Nagios server has all suggested performance tuning, but we still had similar ipcs queuing issues several times. This while cpu load and ram usage is completely under control.. The upgrade to 5.5 made the situation even worse. Hopefully this issue get's resolved in a future release, as it's 1 of the bigger issues Nagios currently has.
Nagios XI 5.8.1
https://outsideit.net
lexisnexis
Posts: 27
Joined: Wed Dec 30, 2015 3:19 pm

Re: ndo2db alternatives

Post by lexisnexis »

Thank you for the information. I can rule out using something other then ndo2db then do to the job. We are currently using a remote database and we have been considering testing moving the database back to the local server. We are currently evaluating the usage and resources needed to do this.

As for the information you requested it is below. Maximizing Performance In Nagios XI = Already Applied these

Code: Select all

+--------------------------------------------+------------+
| Table                                      | Size in MB |
+--------------------------------------------+------------+
| nagios_acknowledgements                    |       0.17 |
| nagios_commands                            |       0.09 |
| nagios_commenthistory                      |      44.06 |
| nagios_comments                            |       0.06 |
| nagios_configfiles                         |       0.03 |
| nagios_configfilevariables                 |       0.05 |
| nagios_conninfo                            |       2.52 |
| nagios_contact_addresses                   |       0.03 |
| nagios_contact_notificationcommands        |       0.48 |
| nagios_contactgroup_members                |       0.33 |
| nagios_contactgroups                       |       4.03 |
| nagios_contactnotificationmethods          |      32.05 |
| nagios_contactnotifications                |      40.09 |
| nagios_contacts                            |       0.13 |
| nagios_contactstatus                       |       0.08 |
| nagios_customvariables                     |       0.09 |
| nagios_customvariablestatus                |       0.09 |
| nagios_dbversion                           |       0.02 |
| nagios_downtimehistory                     |       2.92 |
| nagios_eventhandlers                       |     465.50 |
| nagios_externalcommands                    |       0.11 |
| nagios_flappinghistory                     |       1.52 |
| nagios_host_contactgroups                  |       0.39 |
| nagios_host_contacts                       |       0.03 |
| nagios_host_parenthosts                    |       0.03 |
| nagios_hostchecks                          |       0.03 |
| nagios_hostdependencies                    |       0.03 |
| nagios_hostescalation_contactgroups        |       0.03 |
| nagios_hostescalation_contacts             |       0.03 |
| nagios_hostescalations                     |       0.03 |
| nagios_hostgroup_members                   |      10.03 |
| nagios_hostgroups                          |       4.03 |
| nagios_hosts                               |       2.84 |
| nagios_hoststatus                          |       3.36 |
| nagios_instances                           |       0.02 |
| nagios_logentries                          |   97904.52 |
| nagios_notifications                       |    6547.00 |
| nagios_objects                             |      51.23 |
| nagios_processevents                       |       1.02 |
| nagios_programstatus                       |       0.03 |
| nagios_runtimevariables                    |       0.03 |
| nagios_scheduleddowntime                   |       0.03 |
| nagios_service_contactgroups               |       3.03 |
| nagios_service_contacts                    |       0.03 |
| nagios_service_parentservices              |       0.03 |
| nagios_servicechecks                       |       0.06 |
| nagios_servicedependencies                 |       0.03 |
| nagios_serviceescalation_contactgroups     |       0.03 |
| nagios_serviceescalation_contacts          |       0.03 |
| nagios_serviceescalations                  |       0.03 |
| nagios_servicegroup_members                |       3.03 |
| nagios_servicegroups                       |       1.92 |
| nagios_services                            |      15.22 |
| nagios_servicestatus                       |      61.36 |
| nagios_statehistory                        |    2193.98 |
| nagios_systemcommands                      |      27.08 |
| nagios_timedeventqueue                     |       0.09 |
| nagios_timedevents                         |       0.09 |
| nagios_timeperiod_timeranges               |       0.03 |
| nagios_timeperiods                         |       0.03 |
| tbl_command                                |       0.06 |
| tbl_contact                                |       0.09 |
| tbl_contactgroup                           |       4.00 |
| tbl_contacttemplate                        |       0.03 |
| tbl_domain                                 |       0.03 |
| tbl_host                                   |       1.59 |
| tbl_hostdependency                         |       0.03 |
| tbl_hostescalation                         |       0.03 |
| tbl_hostextinfo                            |       0.03 |
| tbl_hostgroup                              |       3.95 |
| tbl_hosttemplate                           |       0.03 |
| tbl_info                                   |       0.17 |
| tbl_lnkcontactgrouptocontact               |       0.02 |
| tbl_lnkcontactgrouptocontactgroup          |       0.02 |
| tbl_lnkcontacttemplatetocommandhost        |       0.02 |
| tbl_lnkcontacttemplatetocommandservice     |       0.02 |
| tbl_lnkcontacttemplatetocontactgroup       |       0.02 |
| tbl_lnkcontacttemplatetocontacttemplate    |       0.02 |
| tbl_lnkcontacttemplatetovariabledefinition |       0.02 |
| tbl_lnkcontacttocommandhost                |       0.02 |
| tbl_lnkcontacttocommandservice             |       0.02 |
| tbl_lnkcontacttocontactgroup               |       1.36 |
| tbl_lnkcontacttocontacttemplate            |       0.02 |
| tbl_lnkcontacttovariabledefinition         |       0.02 |
| tbl_lnkhostdependencytohost_dh             |       0.02 |
| tbl_lnkhostdependencytohost_h              |       0.02 |
| tbl_lnkhostdependencytohostgroup_dh        |       0.02 |
| tbl_lnkhostdependencytohostgroup_h         |       0.02 |
| tbl_lnkhostescalationtocontact             |       0.02 |
| tbl_lnkhostescalationtocontactgroup        |       0.02 |
| tbl_lnkhostescalationtohost                |       0.02 |
| tbl_lnkhostescalationtohostgroup           |       0.02 |
| tbl_lnkhostgrouptohost                     |       0.02 |
| tbl_lnkhostgrouptohostgroup                |       0.02 |
| tbl_lnkhosttemplatetocontact               |       0.02 |
| tbl_lnkhosttemplatetocontactgroup          |       0.02 |
| tbl_lnkhosttemplatetohost                  |       0.02 |
| tbl_lnkhosttemplatetohostgroup             |       0.02 |
| tbl_lnkhosttemplatetohosttemplate          |       0.02 |
| tbl_lnkhosttemplatetovariabledefinition    |       0.02 |
| tbl_lnkhosttocontact                       |       0.02 |
| tbl_lnkhosttocontactgroup                  |       0.02 |
| tbl_lnkhosttohost                          |       0.02 |
| tbl_lnkhosttohostgroup                     |       5.05 |
| tbl_lnkhosttohosttemplate                  |       0.16 |
| tbl_lnkhosttovariabledefinition            |       0.02 |
| tbl_lnkservicedependencytohost_dh          |       0.02 |
| tbl_lnkservicedependencytohost_h           |       0.02 |
| tbl_lnkservicedependencytohostgroup_dh     |       0.02 |
| tbl_lnkservicedependencytohostgroup_h      |       0.02 |
| tbl_lnkservicedependencytoservice_ds       |       0.02 |
| tbl_lnkservicedependencytoservice_s        |       0.02 |
| tbl_lnkserviceescalationtocontact          |       0.02 |
| tbl_lnkserviceescalationtocontactgroup     |       0.02 |
| tbl_lnkserviceescalationtohost             |       0.02 |
| tbl_lnkserviceescalationtohostgroup        |       0.02 |
| tbl_lnkserviceescalationtoservice          |       0.02 |
| tbl_lnkservicegrouptoservice               |       0.02 |
| tbl_lnkservicegrouptoservicegroup          |       0.02 |
| tbl_lnkservicetemplatetocontact            |       0.02 |
| tbl_lnkservicetemplatetocontactgroup       |       0.02 |
| tbl_lnkservicetemplatetohost               |       0.02 |
| tbl_lnkservicetemplatetohostgroup          |       0.02 |
| tbl_lnkservicetemplatetoservicegroup       |       0.02 |
| tbl_lnkservicetemplatetoservicetemplate    |       0.02 |
| tbl_lnkservicetemplatetovariabledefinition |       0.02 |
| tbl_lnkservicetocontact                    |       0.02 |
| tbl_lnkservicetocontactgroup               |       0.34 |
| tbl_lnkservicetohost                       |       0.02 |
| tbl_lnkservicetohostgroup                  |       0.36 |
| tbl_lnkservicetoservicegroup               |       0.33 |
| tbl_lnkservicetoservicetemplate            |       0.50 |
| tbl_lnkservicetovariabledefinition         |       0.02 |
| tbl_lnktimeperiodtotimeperiod              |       0.02 |
| tbl_logbook                                |       0.33 |
| tbl_mainmenu                               |       0.02 |
| tbl_permission                             |       0.02 |
| tbl_permission_inactive                    |       0.02 |
| tbl_service                                |       3.41 |
| tbl_servicedependency                      |       0.03 |
| tbl_serviceescalation                      |       0.03 |
| tbl_serviceextinfo                         |       0.03 |
| tbl_servicegroup                           |       3.03 |
| tbl_servicetemplate                        |       0.03 |
| tbl_session                                |       0.01 |
| tbl_session_locks                          |       0.00 |
| tbl_settings                               |       0.03 |
| tbl_submenu                                |       0.02 |
| tbl_timedefinition                         |       0.02 |
| tbl_timeperiod                             |       0.03 |
| tbl_user                                   |       0.03 |
| tbl_variabledefinition                     |       1.52 |
| xi_auditlog                                |       1.55 |
| xi_auth_tokens                             |       0.03 |
| xi_cmp_trapdata                            |       0.03 |
| xi_cmp_trapdata_log                        |       0.03 |
| xi_commands                                |       0.14 |
| xi_eventqueue                              |      15.77 |
| xi_events                                  |     780.78 |
| xi_incidents                               |       0.02 |
| xi_meta                                    |   12969.50 |
| xi_options                                 |       0.03 |
| xi_sessions                                |       0.27 |
| xi_sysstat                                 |       0.03 |
| xi_usermeta                                |       1.80 |
| xi_users                                   |       0.08 |
+--------------------------------------------+------------+

Code: Select all

# System default settings live in /usr/lib/sysctl.d/00-system.conf.
# To override those settings, enter new settings here, or in an /etc/sysctl.d/<name>.conf file
#
# For more information, see sysctl.conf(5) and sysctl.d(5).
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1
net.ipv6.conf.default.accept_redirects = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.ip_forward = 0
kernel.exec-shield=1
kernel.randomize_va_space = 2
net.ipv4.tcp_syncookies = 1
net.ipv4.conf.default.secure_redirects = 0
kernel.msgmnb = 262144000
kernel.msgmax = 262144000
kernal.msgmni = 512000
kernel.shmmax = 4294967295
kernel.shmall = 268435456
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.log_martians = 1
net.ipv4.conf.default.log_martians = 1
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv6.conf.all.accept_ra = 0
net.ipv6.conf.default.accept_ra = 0
net.ipv6.conf.all.accept_redirects = 0
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: ndo2db alternatives

Post by tgriep »

I see that there are some corrupt tables in the MYSQL database that could be slowing down the writes.
To clean out the MYSQL tables, run the following as root.

Code: Select all

service crond stop
service nagios stop
service ndo2db stop

echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | mysql -u root -pnagiosxi nagiosxi
mysqlcheck -f -r -u root -pnagiosxi --databases nagiosxi
mysqlcheck -f -o -u root -pnagiosxi --databases nagiosxi

service mysqld restart
service httpd restart
service ndo2db start
service nagios start
service crond start
Let us know if this helps the server process the Kernel Message Queue better.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked