Page 1 of 1

Hardware specification for 40k checks

Posted: Thu Sep 01, 2011 9:02 am
by uniwan_bde
Dear all,

We have a customer who needs to check more or less 40000 checks every 5 minutes. It's a lot, I know.

We contacted the nagios sales team to have an idea of which hardware specification a server could handle this kind of load. The answer was immediate: no server can handle it, but you can use DNX or Fusion or ... Our choice was then turn to DNX.

One of the requirement of the customer is to have a minimum of server in the DNX setup.

So the next question was what's the requirements to perform 10000 checks per slave dnx machine? The answer was a dual-core processor with 2+ GB RAM.

The customer is ready to put a quad-core with 6GB of memory as slave dnx machine. How many check do you think this spec could handle? Should we need also a multi-NIC card to perform link aggregation and rise the throughput on the LAN?

The checks will probably be PING and SNMP.

Or do you have other idea.

Thank you for your suggestions and best regards,
Bénoni Delfosse.

Re: Hardware specification for 40k checks

Posted: Sun Sep 04, 2011 10:43 am
by crfriend
We have a customer who needs to check more or less 40000 checks every 5 minutes. It's a lot, I know.
Egad! That's not just a lot, that's an almost insane number. Your customer wouldn't happen to be Google, would it? ;)

First and foremost, I believe that you will ultimately find that ICMP Echo Requests ("pings") are not a good indicator that a machine is either healthy or even up; I have seen plenty of instances where a host has been wedged completely solid and still happily answer pings. It's a valid enough test for comms gear, but for hosts that likely have NICs that offload the protocols from the hosts themselves the check becomes meaningless at best and misinformative at worst.

SNMP, on the other hand, does require the host to actually do something so that's more useful; however, an SNMP check usually takes slightly more time. There's also the notion of what OIDs the customer is interested in; if the only point of the check is to see if the host can answer a request than SNMPv2-MIB::sysDescr.0 (.1.3.6.1.2.1.1.1.0) is a decent enough choice. For the number of hosts it sounds like you're dealing with, you'll probably want to custom-compile a check rather than rely on the stock "check_snmp" plugin which is a wrapper around the NET-SNMP client (at least on UNIX-based monitoring systems).

Re: Hardware specification for 40k checks

Posted: Wed Sep 07, 2011 5:02 pm
by mguthrie
I've heard of much larger ; )

SNMP checks are going to be the CPU killer. Those checks eat a lot of CPU power. If you were running all pings, you could get 10k check per slave without much trouble, but I've never run any kind of benchmark with SNMP on a large scale. Increase your reaper frequency in the main nagios.cfg file, it will process a higher volume of check results per minute. I've tested 5000 checks every 5mn on a single core 3.0gz CPU, but that was on a Nagios XI server (which takes more CPU) and my CPU load ran steadily around 5-6, so it was right on the verge of choking.

Re: Hardware specification for 40k checks

Posted: Fri Sep 09, 2011 3:57 am
by uniwan_bde
Thank for the replies.

To cfriend, the customer wouldn't happen to be Google, he distributes it... it's an ISP. And I know that pings are not a good indicator but they can tell you that the host is reachable on a simple and basic manner. And yes I know that a host can answer to pings but is completely dead (kernel panic under Linux, for instance).

I think we will recomend a quad core, 4GB RAM and a dual NIC card in LAG to try to perform the 10k tests.

Other suggestions are still welcome.

Best regards,
Bénoni Delfosse

Re: Hardware specification for 40k checks

Posted: Fri Sep 09, 2011 9:47 am
by mguthrie
This first doc is for Nagios XI, but the first half of it will also apply to Nagios Core.
http://assets.nagios.com/downloads/nagi ... rmance.pdf

Here's the Core documentation on performance tuning.
http://nagios.sourceforge.net/docs/3_0/tuning.html