ndo2db - not processing messages fast enough

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
delfin
Posts: 4
Joined: Sun Apr 03, 2016 8:40 pm

ndo2db - not processing messages fast enough

Post by delfin »

We have configured our 9 nagios servers to send data to a MySQL server using Ndoutils. Everything is working great so far as we are able to collect data from all of the 9 nagios servers. The issue is that 4 out of the 9 nagios servers seems to be sending more information than what the ndo2db application can handle. This results to the IPC message queue limit getting maxed out ( kernel.msgmnb = 524288000 ) causing the nagios daemon to stall and ultimately stop on those 4 servers.

We have tried using ndoutils v2.0.0 and v2.1.b2 but we get the same result.

https://github.com/NagiosEnterprises/nd ... tils-2.1b2

Any help or suggestions to get around this issue would be much appreciated. Thanks guys :)

Below is our setup.

9 nagios servers running the following:

Red Hat : 6.7
Nagios : 4.1.1
Ndomod : 2.1b2


Code: Select all

#####################################################################
# NDOMOD CONFIG FILE
#
# Last Modified: 09-05-2007
#####################################################################
instance_name=MED-2
output_type=tcpsocket
output=<MysqlIPaddress>
tcp_port=5668
use_ssl=0
output_buffer_items=10000
buffer_file=/usr/local/nagios/var/ndomod.tmp
file_rotation_interval=14400
#file_rotation_command=rotate_ndo_log
file_rotation_timeout=60
reconnect_interval=15
reconnect_warning_interval=15
#reconnect_warning_interval=900
acknowledgement_data=1
adaptive_contact_data=0
adaptive_host_data=0
adaptive_program_data=0
adaptive_service_data=0
aggregated_status_data=0
comment_data=1
contact_status_data=0
downtime_data=1
event_handler_data=0
external_command_data=0
flapping_data=0
host_check_data=1
host_status_data=1
log_data=1
main_config_data=0
notification_data=0
object_config_data=0
process_data=0
program_status_data=1
retention_data=0
service_check_data=1
service_status_data=1
statechange_data=1
system_command_data=0
timed_event_data=0
config_output_options=0


1 MySQL server running the following:

Red Hat : 6.7
MySQL : 5.1.3
Ndo2db : 2.1b2

Code: Select all

#####################################################################
# NDO2DB DAEMON CONFIG FILE
#
# Last Modified: 01-02-2009
#####################################################################
lock_file=/usr/local/nagios/var/ndo2db.lock
ndo2db_user=nagios
ndo2db_group=nagios
#socket_type=unix
socket_type=tcp
socket_name=/usr/local/nagios/var/rw/ndo.sock
tcp_port=5668
use_ssl=0
db_servertype=mysql
db_host=localhost
db_port=5668
db_name=nagios
db_prefix=nagios_
db_user=<ndo2db_user>
db_pass=<nagiossecret>
max_timedevents_age=1440
max_systemcommands_age=10080
max_servicechecks_age=10080
max_hostchecks_age=10080
max_eventhandlers_age=44640
max_externalcommands_age=44640
max_notifications_age=44640
max_contactnotifications=44640
max_contactnotificationmethods=44640
max_logentries_age=129600
max_acknowledgements_age=44640
debug_level=-1
debug_verbosity=2
debug_file=/usr/local/nagios/var/ndo2db.debug
max_debug_file_size=1000000
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: ndo2db - not processing messages fast enough

Post by tmcdonald »

Moving to NDO 2.1b2 was the right first step, as there were some fixes in there specifically to handle the slow processing.

How many hosts+services are being handled by each of these servers? It honestly might just be too much for the hardware. For that matter, what's the hardware look like on the servers?
Former Nagios employee
delfin
Posts: 4
Joined: Sun Apr 03, 2016 8:40 pm

Re: ndo2db - not processing messages fast enough

Post by delfin »

All the servers are Virtual Machines Running on a VMWARE ESX host(s) with the following specs:
  • Server: Proliant BL660c Gen8
  • Processor: Intel(R) Xeon(R) CPU E5-4650 0 @ 2.70GHz
  • Total CPU : 4
  • CPU Cores: 32
  • Memory: 512 GB
Here are the Nagios and MySQL server VM hardware allocations and host/service check count:

5 Stable Servers
  • Server: NAG01
  • CPU Core: 8
  • Memory: 8 GB
  • Hosts: 4646
  • Service Checks: 18200
  • Server: NAG02
  • CPU Core: 8
  • Memory: 8 GB
  • Hosts: 4291
  • Service Checks: 16606
  • Server: NAG03
  • CPU Core: 8
  • Memory: 8 GB
  • Hosts: 5114
  • Service Checks: 17874
  • Server: NAG04
  • CPU Core: 8
  • Memory: 8 GB
  • Hosts: 8511
  • Service Checks: 17224
  • Server: NAG06
  • CPU Core: 8
  • Memory: 8 GB
  • Hosts: 76
  • Service Checks: 196
4 Problem Servers
  • Server: NAG05
  • CPU Core: 8
  • Memory: 8 GB
  • Hosts: 14007
  • Service Checks: 25276
  • Server: NAG07
  • CPU Core: 8
  • Memory: 16 GB
  • Hosts: 6136
  • Service Checks: 24537
  • Server: NAG08
  • CPU Core: 8
  • Memory: 16 GB
  • Hosts: 6512
  • Service Checks: 30508
  • Server: NAG09
  • CPU Core: 8
  • Memory: 16 GB
  • Hosts: 8488
  • Service Checks: 30715
1 MySQL Server
  • Server: MYSQL01
  • CPU Core: 8
  • Memory: 16 GB
Do we need to think about offloading some of the hosts/service checks from the 4 servers? Or can you think of other options?

Thanks.
bphl
Posts: 2
Joined: Tue Aug 12, 2014 1:09 am

Re: ndo2db - not processing messages fast enough

Post by bphl »

The 9 nagios servers are handling ~58000 hosts and ~181000 services devided as shown below

all servers have 8 CPU's and 16 GB of memory running as VW guests.

Code: Select all

server	hosts	services
nag01	4640	18200
nag02	4184	16606
nag03	5092	17874
nag04	8505	17224
nag05	14005	25276
nag06	76	196
nag07	6122	24537
nag08	6474	30508
nag09	8470	30715
The nagios servers which is in question is nag05, nag07, nag08 and nag09
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: ndo2db - not processing messages fast enough

Post by tmcdonald »

delfin wrote:Do we need to think about offloading some of the hosts/service checks from the 4 servers? Or can you think of other options?
I always recommend in split setups like this keeping the load as evenly-distributed as possible. Looking at your numbers, the servers are pretty uniform in resources. With that in mind, the breaking point is likely around the 20k service mark. If you can shift some of the load into your barely-used NAG06 that could alleviate some of the stress on 5/7/8/9.

Other options would include spinning up another server, *possibly* increasing the kernel queue limits (though that might just mask/delay the problem), working on adjusting the check/retry intervals (this can actually have a big impact if done properly and thoroughly), and tweaking various performance-related options in nagios.cfg.
Former Nagios employee
delfin
Posts: 4
Joined: Sun Apr 03, 2016 8:40 pm

Re: ndo2db - not processing messages fast enough

Post by delfin »

If the server is monitoring 10k hosts and has 10k service checks, does this mean we already hit the 20k breaking point? I'm thinking that the monitored hosts generates host checks and since we are also gathering host check data, do we need to add both the number of hosts and the number of service checks to compute if we're already reaching the 20k breaking point?
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: ndo2db - not processing messages fast enough

Post by tmcdonald »

Sorry, yea you're correct. Adding the hosts and the services, the new breaking point seems to be between 25k (highest of the stable) and 30k (lowest of the problems). There are many other factors as well, such as frequency of check, % in non-OK states, etc. that will cause more frequent checking, but looking at the numbers you provided there is some sort of barrier related to the number.
Former Nagios employee
bphl
Posts: 2
Joined: Tue Aug 12, 2014 1:09 am

Re: ndo2db - not processing messages fast enough

Post by bphl »

HI, are there any documents or best pratice guidelines for scaling NDO in large environments ?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: ndo2db - not processing messages fast enough

Post by scottwilkerson »

bphl wrote:HI, are there any documents or best pratice guidelines for scaling NDO in large environments ?
Not that I know of, first and foremost, move it off of your Nagios server. Secondly, give both your monitoring server and the MySQL servers the fastest disks you can afford. If you need to scale extremely lage, you should be thinking about a raid array of SSD drives.

Finally, you can also limit the information you send to the database via the data_processing_options in the ndomod.cfg
# DATA PROCESSING OPTION
# This option determines what data the NDO NEB module will process.
# Do not mess with this option unless you know what you're doing!!!!
# Read the source code (include/ndbxtmod.h) to determine what values
# to use here. Values from source code should be OR'ed to get the
# value to use here. A value of -1 will cause all data to be processed.
# Read the source code (include/ndomod.h) and look for "NDOMOD_PROCESS_"
# to determine what values to use here. Values from source code should
# be OR'ed to get the value to use here. A value of -1 will cause all
# data to be processed.
See: https://github.com/NagiosEnterprises/nd ... e/ndomod.h for details
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: ndo2db - not processing messages fast enough

Post by Box293 »

scottwilkerson wrote:Finally, you can also limit the information you send to the database via the data_processing_options in the ndomod.cfg

# DATA PROCESSING OPTION
# This option determines what data the NDO NEB module will process.
# Do not mess with this option unless you know what you're doing!!!!
# Read the source code (include/ndbxtmod.h) to determine what values
# to use here. Values from source code should be OR'ed to get the
# value to use here. A value of -1 will cause all data to be processed.
# Read the source code (include/ndomod.h) and look for "NDOMOD_PROCESS_"
# to determine what values to use here. Values from source code should
# be OR'ed to get the value to use here. A value of -1 will cause all
# data to be processed.


See: https://github.com/NagiosEnterprises/nd ... e/ndomod.h for details
This KB article might also help:

https://support.nagios.com/kb/article.php?id=113
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked