Hi.
I have plans to expand the use of Nagios within my company to other IT services, the number of nodes could reach 5000.
What would be your recommendation regarding the hardware nagios?
1. Have a dedicated server with enough hardware to support processing
2. Distribute the load on multiple servers
Thank you for your opinion
nagios to 5000 nodes
Re: nagios to 5000 nodes
I assume this means that the number of hosts could reach ~5000, while the number of services could be far greater? How many checks are we talking about total? I assume over 20,000?the number of nodes could reach 5000.
I recommend getting yourself familiar with mod_gearman - this is a project that allows you to distribute Nagios checks among several servers to help you handle the type of load that we're talking about here.
We have some mod_gearman installation instructions here:
https://assets.nagios.com/downloads/nag ... ios_XI.pdf
I'd like to know that ballpark number of total checks once you get the time. Thanks!
Re: nagios to 5000 nodes
Hi
Yes. The number of checks could easily reach the 25000 services, considering an average of five services per node.
I had planned use Mod_gearman, but I have requested to have high availability for Nagios, was simpler to have a great nagios server for this.
This brings me to another question:
I thought of to solve the HA with LimBit. If i have a distributed installation with mod_gearman do I synchronize only the Nagios XI server or also the mod_gearman servers ?
Yes. The number of checks could easily reach the 25000 services, considering an average of five services per node.
I had planned use Mod_gearman, but I have requested to have high availability for Nagios, was simpler to have a great nagios server for this.
This brings me to another question:
I thought of to solve the HA with LimBit. If i have a distributed installation with mod_gearman do I synchronize only the Nagios XI server or also the mod_gearman servers ?
Re: nagios to 5000 nodes
I would highly recommend setting up a test environment and ensures mod_gearman will suit your needs. I have deployed XI in an environment where we have over 20,000 service checks on our main XI box. However the load is spread to mod_gearman workers.
Out of the box, mod_gearman has some limitations depending on how you plan your service checks and what you are checking. I found that unless you mount all MRTG and perfdata on a shared volume, you will have to limit network bandwidth checks to your main XI server. We are also doing CPU usage checks along with disk io via WMI which saves data in tmp folders, so you will have to configure and restrict which mod_gearman worker does which type of service checks. Otherwise you might not get proper data into your XI.
Lastly, with a bigger implementation, you might consider offloading components like the database. Just a few challenges we ran into and sharing them with you. Otherwise you don't want to run into these issues midway through. Good luck!
Out of the box, mod_gearman has some limitations depending on how you plan your service checks and what you are checking. I found that unless you mount all MRTG and perfdata on a shared volume, you will have to limit network bandwidth checks to your main XI server. We are also doing CPU usage checks along with disk io via WMI which saves data in tmp folders, so you will have to configure and restrict which mod_gearman worker does which type of service checks. Otherwise you might not get proper data into your XI.
Lastly, with a bigger implementation, you might consider offloading components like the database. Just a few challenges we ran into and sharing them with you. Otherwise you don't want to run into these issues midway through. Good luck!
Re: nagios to 5000 nodes
In this case you would only need to synchronize with the Nagios XI server. Mod_gearman uses a server/worker model where all of the mod_gearman workers will pull checks out of the mod_gearman server. In this case, your mod_gearman server would be your Nagios XI server. The workers execute the checks using their local plugins, and return the results back to Nagios XI.If i have a distributed installation with mod_gearman do I synchronize only the Nagios XI server or also the mod_gearman servers ?
In the case of high availability, you would need to tell your workers to connect to the secondary Nagios XI instance if the first one were to fail. The easiest way to do this would be by using a shared virtual IP address, or simply a domain name that gets re-pointed after the primary XI server fails.
@OptimusB has excellent input - the only thing that I can think to add is the use of a ramdisk. You can find some excellent instructions here:
https://labs.nagios.com/2015/08/14/util ... -easy-way/
https://assets.nagios.com/downloads/nag ... giosXI.pdf
Re: nagios to 5000 nodes
Thanks for the answers.
I only doubt remains. How I can size the hardware that you must have the Mod_Gearman server?
I searched information but I have references.
I only doubt remains. How I can size the hardware that you must have the Mod_Gearman server?
I searched information but I have references.
Re: nagios to 5000 nodes
The answer to your question depends on how many checks each individual node will be performing, and what types of checks they are. A server that performs 5,000 perl script checks will need to be more powerful than one that performs 5,000 C binary checks, for instance.
If you're in a virtual environment, I would say to start conservatively and scale up per your requirements. (4CPU/8GB seems like a good starting place)
If you're in a virtual environment, I would say to start conservatively and scale up per your requirements. (4CPU/8GB seems like a good starting place)