Performance Problems

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
afitch
Posts: 6
Joined: Wed Aug 25, 2010 7:46 am

Performance Problems

Post by afitch »

I've got nagiosxi monitoring about 800 hosts with 3300 services. Most of the services are the provided perl snmp checks with a lot of ssh checks running perl or bash scripts and returning information. I've thrown lots of CPU and Ram at the problem, but it's not fixed the problem. I now have it setup with 8 CPU and 16GB with an upgraded PAE kernel to utilize the RAM. Anyway, mysqld runs from 80-210% CPU all the time. Nagios will run 50-100% as well. There will be times when I click on the page and it takes forever to get a response and I can never view a detail page.

I'm running the ssh checks because of snmp problems with some hosts. I've made various tweeks to no avail. Any suggestions?
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Performance Problems

Post by mguthrie »

I think for a server that's running that kind of load might be a good candidate for a distributed monitoring setup. There are a few options for that.

DNX is a load distributor for Nagios, I haven't used it yet myself but we just got it tested and documented.
http://library.nagios.com/library/speci ... ith-nagios

Also, we're coming out with a new product called Nagios Fusion, which is used for a distributed monitoring setup. Here's the info page for it, and I believe the release date is set for Oct. 1st.
http://www.nagios.com/products/nagiosfusion

Our other techs might have some other ideas for tweaking your existing server, but I'll have to defer to them for performance tweaking ideas.
afitch
Posts: 6
Joined: Wed Aug 25, 2010 7:46 am

Re: Performance Problems

Post by afitch »

Fusion looks like the way to go. So for each server (or node) you have, you need an Xi license, right? And then one license of Fusion or two depending on level of HA? I agree with your diagnosis, I probably need 1 server per 400-500 hosts depending on the number of checks. Eventually our server team will be handing off Nagios to the operations team, so Fusion would help.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Performance Problems

Post by mguthrie »

Yeah I couldn't tell you off hand what the licensing situation would be, but I do know we offer a discount that goes up with the number of licenses you buy. Feel free to fire any questions on that to our sales team (sales(at)nagios.com).
mmestnik
Posts: 972
Joined: Mon Feb 15, 2010 2:23 pm

Re: Performance Problems

Post by mmestnik »

You need faster/more disks. Try disabling mysql sync and flushes, with the ram you have many applications won't make use of it because they insist on transactional concurrency. Nagios can also be told to flush less often.

How are you getting these usage statistics? I'd assume that your IO"Wait" time is the leading usage of CPU.

Try using a ramfs and calling rsync every 5min or so to flush that to disk backing store that's loaded into ram on boot.
afitch
Posts: 6
Joined: Wed Aug 25, 2010 7:46 am

Re: Performance Problems

Post by afitch »

All of your responses make sense. We have an enormous VMware cluster and Petabytes worth of storage. VMware is ultimately managing all the disk. EMC screwed us with pricing of their fiber driver and we only have a single path to all our Tier 1 storage. (So that's the bottleneck). I'll find the sync'ing and flush'ing vars and make some adjustments.

I don't have a good enough understanding of the nagios backend to understand the below statement.

"Try using a ramfs and calling rsync every 5min or so to flush that to disk backing store that's loaded into ram on boot."

Does the ndo module automatically import and export the config and state (using rsync) to the database?

Thanks again. -jb
mmestnik
Posts: 972
Joined: Mon Feb 15, 2010 2:23 pm

Re: Performance Problems

Post by mmestnik »

"Try using a ramfs and calling rsync every 5min or so to flush that to disk backing store that's loaded into ram on boot."
This comment was actually unrelated to Nagios, see these links for more information.

http://www.thegeekstuff.com/2008/11/ove ... -on-linux/
http://sial.org/howto/rsync/ OR http://oreilly.com/pub/h/41
afitch
Posts: 6
Joined: Wed Aug 25, 2010 7:46 am

Re: Performance Problems

Post by afitch »

I installed two more xi machines for a total of three and installed the DNX extension (http://dnx.sourceforge.net/). I now have one master and two clients. This dropped the load average from 12 down to 4 on the master with the 2nd and 3rd machines running around 2.50. Big help. I'm still implementing the other changes and chasing down plugin timeouts. I currently have 800 hosts and 2300 checks. I've got 500 windows servers to add yet. I'll probably add another two or three DNX clients for that. Thanks everyone.
Locked