Distributed NagiosXI with central Configuration manager

jnilsson · Post by **jnilsson** » Mon Jan 13, 2014 4:52 am

Hi Guys,
we are looking at implementing Nagios XI across all our data centers, but are having some issues with the design. I know there already are several posts about distributed installations, but as always there are a few differences in the way we want to do it, so I wanted to make a new post.

- Central Nagios XI (Active / Passive Cluster)
- Nagios XI cluster (Active / Passive) in each datacenter.
- All configurations would be made in the central system and then have them apply to the corresponding datacenter installations.
- Status information should also be synchronized upwards to the central system.

I have been looking but can not find anything that fits 100% for the type of install we want. I have looked at something such as Gearman, but we would like the data centers to keep working if the internal network to the datacenter goes down, and then get the cached results once the line is working again.

Once way I was thinking of doing this, would be to grab the systems directly from the Central XI's mySql database, and just query for hostgroup: datacenter_%dc_name%. Then we would format that information and send it to each NagiosXI and add it to the import folder or try to add it to the DB directly. However, we would prefer to use some already made tools for this.

In the future, we might also want to system scan functions (auto discovery and service detection) to be able to run on the DC Nagios XI systems, and then update the central system for that as well. However, that would be something to implement farther down the road, but i believe its important to have in mind when designing the system.

We might add a Gearman installation in each DC just to help with the load, and the possibility to add workers in the future if need be. I have attached a image of the design (Just ignore the service now part, as its the incident manager). I know its a bit messy but it helps a little in the understanding of the "flow" data.

Link to image:
https://www.nilsson.so/nagios_design_v1.png

Thanks!!

Jonas

abrist · Post by **abrist** » Mon Jan 13, 2014 12:15 pm

Before we drill down into the configuration, have you considered using nagios fusion instead of the nagios central server?

jnilsson · Post by **jnilsson** » Tue Jan 14, 2014 3:24 am

HI, yes we are considering that as a second option, but we would have the issue that there is no centralized configuration option, only links to each sites Nagios Server, correct?

tmcdonald · Post by **tmcdonald** » Tue Jan 14, 2014 2:31 pm

That is correct. We've discussed centralized configuration a lot, and there are some good arguments for and against it. Right now it's not in the plans.

slansing · Post by **slansing** » Tue Jan 14, 2014 2:36 pm

While we agree that having a built in central configuration verification, and writing tool from which you would be able to remotely manage multiple nagios XI installations there are a lot of smaller facets that make it a large undertaking. As tmcdonald mentioned there are a lot of good reasons to work on adding this, and it will possibly added in the future, but as of right now there is nothing that we have in house that is built and ready for customer use. We would love to hear any ideas you have, or theories on how you would perceive this being done, and you can either post them here, or in a feature request at:

tracker.nagios.com

jnilsson · Post by **jnilsson** » Tue Jan 14, 2014 3:23 pm

Hi,
my idea would be to query the sql tables directly, and then use something like JSON to transfer it to the local Nagios System and then generate the needed config files in the import folder. To know what has been synced and what not, i would create a new DB on the sql server with a few tables (these are still brainstorming ideas) where i would log what has been sent to each local nagios server. the tables would include both host information and service information. When my process runs, it would scan for differences (unsure how exactly yet, might use timestamps or something similar). I might just take a "snapshot" of current sql data and store it in the current table and then compare for changes. However, i still need to look at the DB, for example, if there is a last updated column for each service and host, i could just store that and compare for updates.

I would then read my DB / table and send the info on to the corresponding nagios server based on hostgroup_%site% assignment. The checks would be inactive on the central server, but activated when imported to the destination server. This would also work in the oposite direction, as the local nagios servers would also have a similar db / table where i can look for changes, and push them to the central system. We will also have integration with our CMDB system so this would help us keep that data in check and only add whats new. the flow from the CMDB is a bit advanced but would give us a huge advantage. The flow would be something like this:

1. Host is added to CMDB and assigned to a nagios server.
2. Services can be added as well by using templates, or none at all.
3. Information would be synced to central nagios server
4. Info would be imported to central server and assigned to correct nagios server.
5. Config info would be synced to local assigned nagios server.
6. host would be scanned for available / discovered services.
7. Service info would be disabled and synced to central nagios server.
8. Service info would be synced to our CMDB server where the client / admin can select which ones to activate.
9. Updated service info would be synced back to nagios to activate the checks.

I understand that i will need something to keep all the info in check, and as the mysql db is available, i was thinking of using that. We had thought that maybe a noSql db would be more fitting as we would be able to use JSON info for most storage and sync checks, but that would just add more overhead.

One clarification: before the info is added to the central or local nagios servers using the import process, it would be checked against the sql db to make sure its not already there. Is there way to directly add a host to the db, bypassing the import script? im considering having a look at the import script to see how it does it and maybe copy it and do it directly from my code.

So, whats your thoughts on this process? Its a bit complicated as you both say, but it gives a lot more possibles in the future for personalization.

slansing · Post by **slansing** » Wed Jan 15, 2014 2:39 pm

1. Host is added to CMDB and assigned to a nagios server.
2. Services can be added as well by using templates, or none at all.
3. Information would be synced to central nagios server
4. Info would be imported to central server and assigned to correct nagios server.
5. Config info would be synced to local assigned nagios server.
6. host would be scanned for available / discovered services.
7. Service info would be disabled and synced to central nagios server.
8. Service info would be synced to our CMDB server where the client / admin can select which ones to activate.
9. Updated service info would be synced back to nagios to activate the checks.

This is about what we were thinking, as far as the logic flow goes.

One clarification: before the info is added to the central or local nagios servers using the import process, it would be checked against the sql db to make sure its not already there. Is there way to directly add a host to the db, bypassing the import script? im considering having a look at the import script to see how it does it and maybe copy it and do it directly from my code.

I was going to mention that, you might just be able to pick the core functions out of the script and hack your own solution to make this work. You would need some way to verify configuration changes from the distributed nagios servers as well. I am going to add this to a feature request so that the devs can take a look at your ideas.

jnilsson · Post by **jnilsson** » Thu Jan 16, 2014 10:15 am

Yea that would be great, I would not be hard to see config changes, just add 2 or 3 columns to the tables: last_modified_date, last_modified_by and maybe synced. That way, if the local nagios modifies something, they mark it as local_internal for example. That way you can easly make a query to only get the changed items to sync or no items at all of there are none. This can also be used later to add information from other sources and only find updates other then them. The last_update_date is used to verify when it was updated and synced means that its been sent to the central server or to the local server depending on where we are working.

So on central I would select * where last_updated_by = 'local' and synced='false'. Once the data is selected and ready to send, the synced column is updated to true. On the remote host, when you are updating, fitrst compare the last_update_date field and make sure they are different. If so, the update with the settings from central, and set last_modified_by to remote for example, and mark synced as true.

playing with these true columns, most sync directions, etc should be able to be handled without to much overhead of checking all central config agains local ones etc.

Post by **lmiltchev** » Thu Jan 16, 2014 3:23 pm

These are all great ideas! Thanks for sharing!

jnilsson · Post by **jnilsson** » Tue Jan 21, 2014 12:10 pm

Hi, a quick question:

If we go with a central XI system and an external MySQL cluster for the DB, whats the limit on the server? I mean, if I have 25 Datacenters and each local Nagios XI server is producing at least 50,000 checks every 5 min. Will there be a limit to what the central server can handle? How much data can the database handle (I know this depends on what i setup) before the applications starts to slow down when doing queries and reports?

Do you have any numbers for these questions as I dont want to setup a system only to have it crash or be way to slow to use because the database is overloaded, or we would have to delete historic data every two weeks instead of saving up to at least 6 months / 1 year (or more).

Thanks!

Nagios Support Forum

Distributed NagiosXI with central Configuration manager

Distributed NagiosXI with central Configuration manager

Re: Distributed NagiosXI with central Configuration manager

Re: Distributed NagiosXI with central Configuration manager

Re: Distributed NagiosXI with central Configuration manager

Re: Distributed NagiosXI with central Configuration manager

Re: Distributed NagiosXI with central Configuration manager

Re: Distributed NagiosXI with central Configuration manager

Re: Distributed NagiosXI with central Configuration manager

Re: Distributed NagiosXI with central Configuration manager

Re: Distributed NagiosXI with central Configuration manager