Nagios scale out options?

tonyleatwork · Post by **tonyleatwork** » Mon Jul 18, 2016 3:21 pm

Hi -

I recently tried to add a few dozen ESX hosts and it caused Nagios to hit our ESX cluster hard. It's currently consuming 30% of resource off of one node which is surprising since this is a brand new Dell PowerEdge 730xd with 24 cores and 784GB of ram. So we need to understand what we can do to scale up/scale out.

Is separating the workloads (production and dev environment) the right thing to do? I've ran through a bunch of the performance tuning docs but what else can we do in case we need to add say 2000 more checks for example? As you may know, ESX is limited to 8 virtual CPUs so we're limited there. But going physical would lose our HA ability.

Code: Select all

Nagios XI Installation Profile
System:
Nagios XI Version : 5.2.5
nwd2ng01.corp.analog.com 2.6.32-504.3.3.el6.x86_64 x86_64
CentOS release 6.6 (Final)
Gnome is not installed
Apache Information
PHP Version: 5.3.3
Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0
Server Name: nwd2ng01.corp.analog.com
Server Address: 10.64.52.120
Server Port: 80
Date/Time
PHP Timezone: America/New_York
PHP Time: Mon, 18 Jul 2016 16:16:18 -0400
System Time: Mon, 18 Jul 2016 16:16:18 -0400
Nagios XI Data
License ends in: MSTNQS

nagios (pid 13597) is running...
NPCD running (pid 1999).
ndo2db (pid 2011) is running...
CPU Load 15: 9.79
Total Hosts: 600
Total Services: 4116
Function 'get_base_uri' returns: http://nwd2ng01.corp.analog.com/nagiosxi/
Function 'get_base_url' returns: http://nwd2ng01.corp.analog.com/nagiosxi/
Function 'get_backend_url(internal_call=false)' returns: http://nwd2ng01.corp.analog.com/nagiosxi/includes/components/profile/profile.php
Function 'get_backend_url(internal_call=true)' returns: http://localhost/nagiosxi/backend/
Ping Test localhost
Running:

/bin/ping -c 3 localhost 2>&1 

PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.025 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.037 ms
64 bytes from localhost (127.0.0.1): icmp_seq=3 ttl=64 time=0.026 ms

--- localhost ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.025/0.029/0.037/0.007 ms
Test wget To localhost
WGET From URL: http://localhost/nagiosxi/includes/components/ccm/
Running:

/usr/bin/wget http://localhost/nagiosxi/includes/components/ccm/ 

--2016-07-18 16:16:20-- http://localhost/nagiosxi/includes/components/ccm/
Resolving localhost... ::1, 127.0.0.1
Connecting to localhost|::1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: "/usr/local/nagiosxi/tmp/ccm_index.tmp"

0K ......... 511M=0s

2016-07-18 16:16:20 (511 MB/s) - "/usr/local/nagiosxi/tmp/ccm_index.tmp" saved [9836]

Network Settings

1: lo:  mtu 65536 qdisc noqueue state UNKNOWN 

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

    inet 127.0.0.1/8 scope host lo

    inet6 ::1/128 scope host 

       valid_lft forever preferred_lft forever

2: eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000

    link/ether 00:50:56:9f:52:ef brd ff:ff:ff:ff:ff:ff

    inet 10.64.52.120/24 brd 10.64.52.255 scope global eth0

    inet6 fe80::250:56ff:fe9f:52ef/64 scope link 

       valid_lft forever preferred_lft forever


10.64.52.0/24 dev eth0  proto kernel  scope link  src 10.64.52.120 

169.254.0.0/16 dev eth0  scope link  metric 1002 

default via 10.64.52.1 dev eth0

tmcdonald · Post by **tmcdonald** » Mon Jul 18, 2016 4:25 pm

I've always recommended using the @Box293 plugin for ESX monitoring:

https://exchange.nagios.org/directory/P ... re/details

It's very detailed, updated regularly, and has an extensive manual. Been known to consume far fewer resources than the standard active check methods.

Post by **Box293** » Mon Jul 18, 2016 4:29 pm

The last release also had some performance improvements for the box293_check_vmware that were quite significant. With this in mind, the VMware SDK it uses is quite resource intensive, hence why my plugin is designed to be run from a vMA.

Have a look at the manual, it's all covered in there.

There is also a wizard for it:
http://exchange.nagios.org/directory/Ad ... rd/details

Nagios Support Forum

Nagios scale out options?

Nagios scale out options?

Re: Nagios scale out options?

Re: Nagios scale out options?