nagios to 5000 nodes

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
pamuro01
Posts: 30
Joined: Fri Aug 09, 2013 8:16 am

nagios to 5000 nodes

Post by pamuro01 »

Hi.

I have plans to expand the use of Nagios within my company to other IT services, the number of nodes could reach 5000.

What would be your recommendation regarding the hardware nagios?

1. Have a dedicated server with enough hardware to support processing

2. Distribute the load on multiple servers

Thank you for your opinion
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: nagios to 5000 nodes

Post by jolson »

the number of nodes could reach 5000.
I assume this means that the number of hosts could reach ~5000, while the number of services could be far greater? How many checks are we talking about total? I assume over 20,000?

I recommend getting yourself familiar with mod_gearman - this is a project that allows you to distribute Nagios checks among several servers to help you handle the type of load that we're talking about here.

We have some mod_gearman installation instructions here:
https://assets.nagios.com/downloads/nag ... ios_XI.pdf

I'd like to know that ballpark number of total checks once you get the time. Thanks!
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
pamuro01
Posts: 30
Joined: Fri Aug 09, 2013 8:16 am

Re: nagios to 5000 nodes

Post by pamuro01 »

Hi

Yes. The number of checks could easily reach the 25000 services, considering an average of five services per node.

I had planned use Mod_gearman, but I have requested to have high availability for Nagios, was simpler to have a great nagios server for this.

This brings me to another question:

I thought of to solve the HA with LimBit. If i have a distributed installation with mod_gearman do I synchronize only the Nagios XI server or also the mod_gearman servers ?
OptimusB
Posts: 146
Joined: Mon Oct 27, 2014 10:08 pm
Location: Canada
Contact:

Re: nagios to 5000 nodes

Post by OptimusB »

I would highly recommend setting up a test environment and ensures mod_gearman will suit your needs. I have deployed XI in an environment where we have over 20,000 service checks on our main XI box. However the load is spread to mod_gearman workers.

Out of the box, mod_gearman has some limitations depending on how you plan your service checks and what you are checking. I found that unless you mount all MRTG and perfdata on a shared volume, you will have to limit network bandwidth checks to your main XI server. We are also doing CPU usage checks along with disk io via WMI which saves data in tmp folders, so you will have to configure and restrict which mod_gearman worker does which type of service checks. Otherwise you might not get proper data into your XI.

Lastly, with a bigger implementation, you might consider offloading components like the database. Just a few challenges we ran into and sharing them with you. Otherwise you don't want to run into these issues midway through. Good luck!
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: nagios to 5000 nodes

Post by jolson »

If i have a distributed installation with mod_gearman do I synchronize only the Nagios XI server or also the mod_gearman servers ?
In this case you would only need to synchronize with the Nagios XI server. Mod_gearman uses a server/worker model where all of the mod_gearman workers will pull checks out of the mod_gearman server. In this case, your mod_gearman server would be your Nagios XI server. The workers execute the checks using their local plugins, and return the results back to Nagios XI.

In the case of high availability, you would need to tell your workers to connect to the secondary Nagios XI instance if the first one were to fail. The easiest way to do this would be by using a shared virtual IP address, or simply a domain name that gets re-pointed after the primary XI server fails.

@OptimusB has excellent input - the only thing that I can think to add is the use of a ramdisk. You can find some excellent instructions here:
https://labs.nagios.com/2015/08/14/util ... -easy-way/
https://assets.nagios.com/downloads/nag ... giosXI.pdf
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
pamuro01
Posts: 30
Joined: Fri Aug 09, 2013 8:16 am

Re: nagios to 5000 nodes

Post by pamuro01 »

Thanks for the answers.

I only doubt remains. How I can size the hardware that you must have the Mod_Gearman server?

I searched information but I have references.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: nagios to 5000 nodes

Post by jolson »

The answer to your question depends on how many checks each individual node will be performing, and what types of checks they are. A server that performs 5,000 perl script checks will need to be more powerful than one that performs 5,000 C binary checks, for instance.

If you're in a virtual environment, I would say to start conservatively and scale up per your requirements. (4CPU/8GB seems like a good starting place)
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
Locked