Switching to distributed monitoring

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
cwscribner
Posts: 316
Joined: Thu Mar 31, 2011 9:54 am
Location: Patten, ME
Contact:

Switching to distributed monitoring

Post by cwscribner »

Are there any server spec guidelines for switching to a distributed monitoring system? What would be a recommended hardware setup for offloading MySQL from a primary Nagios server and only having two servers running?
Last edited by cwscribner on Tue Aug 09, 2011 11:43 pm, edited 1 time in total.
User avatar
nscott
Posts: 1040
Joined: Wed May 11, 2011 8:54 am

Re: Switching to distributed monitoring

Post by nscott »

The unfortunate thing is is that there isn't a good hard fast metric due to different checks making different demands on the computer, so these requirements are largely conjecture, but still something that should be worth considering. Monitoring switches is very taxing on the CPU and disk where as receiving passive checks is going to be significantly easier on the CPU. However, I would venture to guess that the 'average' install of NagiosXI has about 8 active checks per host. In your situation, thats about ~12800 service checks. Assuming the service period of those checks in 5 minutes, thats 12800 / 5 /60 = 42.66 checks per second.

Now, on our benchmark we had ~3500 checks before the UI became sluggish enough to deem it borderline. Our benchmark box is a single core, Pentium 4, 3GHz, 32-bit processor with 3GB of RAM. That boils down to 8.6 checks per second. Upon offloading the MySQL database, the CPU usage halved, so conceivably, taking into account possible diminishing returns, we could (conservatively) ramp that up to 13 checks per second. All of this on single core, 32-bit pentium 4 machine.

Now for the MySQL slave, that would take significantly less horsepower, however I don't have any hard numbers for that. If I had to venture a guess I would say that a dual core process with a hefty amount of RAM would take beyond your needs as the biggest thing you'd need to worry about was I/O times and filling the cache.

However, I am currently in the middle of researching this issue for a presentation, so I'd expect a doc sometime mid September on this issue.
only having two servers running?
Do you mean one Nagios XI server with a MySQL server slaved?
Nicholas Scott
Former Nagios employee
cwscribner
Posts: 316
Joined: Thu Mar 31, 2011 9:54 am
Location: Patten, ME
Contact:

Re: Switching to distributed monitoring

Post by cwscribner »

Yes. One for checks, one for mysql.

My system specs are in my signature. Its a quadcore with several gigs of RAM checking ~2150 hosts and ~750 services. The monitoring widget is showing ~300 (+/- 15) checks per minute. Something seems amiss here as your benchmark machine is much more modest but outperforming the server that I'm on. I don't entirely understand why that's happening.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Switching to distributed monitoring

Post by mguthrie »

Did you increase the reaper frequency in the nagios.cfg to process a higher volume of check results?

Code: Select all

check_result_reaper_frequency=3
max_check_result_reaper_time=10
If you're using checks with SNMP, WMI, or ESX, these take substantially more CPU than the checks used for the benchmark (check_icmp, check_http,check_DNS).

Also, make sure you tweak the settings in the Admin->Performance settings page. Using the unified views and increasing the refresh multiplier can make a substantial difference in performance. If you have multiple users accessing XI at once, this will also multiply the amount of AJAX calls to the server being made and take it's toll on performance.
cwscribner
Posts: 316
Joined: Thu Mar 31, 2011 9:54 am
Location: Patten, ME
Contact:

Re: Switching to distributed monitoring

Post by cwscribner »

reaper changes have been made. I do have a very long list of SNMP checks for UPS devices...~600 services. Any way I can get around that to speed things up?
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Switching to distributed monitoring

Post by mguthrie »

Hmm, yeah that's a big CPU grab. What's your average CPU load, and do you see anything consistently showing up in the top 5 CPU% time when you run top? (Mysql should be #1, and httpd #2).
cwscribner
Posts: 316
Joined: Thu Mar 31, 2011 9:54 am
Location: Patten, ME
Contact:

Re: Switching to distributed monitoring

Post by cwscribner »

Mysql, httpd, nagios, n2odb, and settroubleshootd are the top 5. When I issued the top command, settroubleshootd was hogging %50 of the CPU and then bounced around from 50% to 30%
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Switching to distributed monitoring

Post by mguthrie »

What is settroubleshootd? I'm not familiar with that process, that might be your CPU culprit....
cwscribner
Posts: 316
Joined: Thu Mar 31, 2011 9:54 am
Location: Patten, ME
Contact:

Re: Switching to distributed monitoring

Post by cwscribner »

From my brief research, it looks like its related to web serving. I haven't the slightest as to what to do about it though. Here's a semi-helpful link: http://www.linuxquestions.org/questions ... ry-634347/

Another:http://www.ezlinuxadmin.com/2011/05/set ... th-cpanel/
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Switching to distributed monitoring

Post by mguthrie »

Here would be a thought that I hadn't considered for your CPU usage. I don't know why I didn't think of it earlier, but it's possible that's something that comes bundled with Gnome. I would consider uninstalling it. Also, if you want to get probably 30-40% of your CPU and Memory back, I would consider restarting your server at runlevel 3 so that Gnome doesn't load at all. Gnome will eat your system resources and serve no real purpose for your monitoring server. All of our documentation and instructions are written for systems without it, so there isn't a real reason to use it.
Locked