Distibuted Monitoring Requirements Question

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
axj0187
Posts: 14
Joined: Thu Jul 07, 2011 10:47 am

Distibuted Monitoring Requirements Question

Post by axj0187 »

Guten Tag,

How should I go about determining the system requirements for a distributed monitoring setup? Do the "worker nodes" require less resources than the master node, or is it about the same?

Right now our setup is:
Dual Core Processor
4 GB Memory
50 GB Hard Drive Space

Can we get away with 600 servers with 17 service checks every 5 minutes on this setup?

Also, how much of a performance hit do you take by using a long of agent less (SNMP and WMI) checks as apposed to only using agents?

Help is much appreciated.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Distibuted Monitoring Requirements Question

Post by mguthrie »

How should I go about determining the system requirements for a distributed monitoring setup? Do the "worker nodes" require less resources than the master node, or is it about the same?
There's a lot of variation here, I would say you can load up your servers based on their hardware power. For example, a single core 3gz CPU maxes out at around 20 checks per second when everything is localized on a single XI install. I would keep the machine power comparable for all systems.
Also, how much of a performance hit do you take by using a long of agent less (SNMP and WMI) checks as apposed to only using agents?
There will be substantial difference in CPU usage for 9500 checks using SNMP/WMI vs agent checks. However, if you've got worker nodes doing the checks, it shouldn't be a problem.

Here are some performance ideas to consider.
http://library.nagios.com/library/produ ... erformance
axj0187
Posts: 14
Joined: Thu Jul 07, 2011 10:47 am

Re: Distibuted Monitoring Requirements Question

Post by axj0187 »

There's a lot of variation here, I would say you can load up your servers based on their hardware power. For example, a single core 3gz CPU maxes out at around 20 checks per second when everything is localized on a single XI install. I would keep the machine power comparable for all systems.
This is good to know. Do you know if Nagios is designed to properly distribute work across multiple cores? In theory with a dual core processor could I come close to doubling the max service checks per second?

According to your benchmark, ive come up with the following:

Operating on the assumption that Nagios automatically spaces out performance checks with its scheduling algorithms, and that a system with a single core at 3 GHz maxes out at 20 checks per second and assuming the hardware block is in CPU power, not memory, our max number of systems to service checks would be:

((6000 Service Checks / 5 Minute Intervals) / 60 Seconds = 20 Checks Per Second. So your benchmark says 6000 Service checks per second on a single 3 Ghz core, if Nagios is parallel ready, then it could potentially be ~40 checks per second and 12000 service checks?

Another intereting note: I spoke to a sales rep who told me that "With the recommended system setup you can get away with 8,000 – 10,000 service checks per minute" That would be ~150 service checks per second!

I understand that predicting system usage is tough to gauge considering how many variables need to be taken into account but I REALLY appreciate the advice.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Distibuted Monitoring Requirements Question

Post by mguthrie »

I'll clarify with the sales team, I think that was either misunderstanding or miscommunication somewhere on our end. We've got customers running 8-10 thousand checks on the 5 minute interval on a single server (but on a server class machine, not a desktop). We've developed a few new tricks since my original benchmark test that make a noticeable improvement on performance on a single machine.
http://assets.nagios.com/downloads/nagi ... p#boosting

Note that the original benchmark test in that "Maximizing Performance" doc is based purely on active checks.

Here are some other options to consider, depending on the size of your environment.
http://assets.nagios.com/downloads/nagi ... istributed
Locked