Page 1 of 2

Distributed Nagios Architecture

Posted: Wed Jun 04, 2014 1:08 pm
by Smark
We have multiple DataCenters across the globe, does Nagios provide a solution for distributing checks across the globe to local Nagios servers with a central management server coordinating the work?

I could imagine it working like this:
* Central Nagios server that includes the web interface, CCM, and some sort of a coordination agent
* Local Nagios server at each DataCenter that checks just the hosts "near" it and reports back to the central server.
* The Local Nagios server would continue to check the local hosts and collect data even if it could not connect to the central server.

I was looking at DNX but that prefers them all to be in the same DataCenter and is just for distributing the load of the checks, not for reducing WAN traffic or reliance on a WAN connection for checks.

Any help you could provide here would definitely be beneficial!

Thanks!

Re: Distributed Nagios Architecture

Posted: Wed Jun 04, 2014 3:25 pm
by scottwilkerson
In Nagios XI many clients use Inbound / Outbound data transfer to send the distributed items between Nagios servers.

Re: Distributed Nagios Architecture

Posted: Wed Jun 04, 2014 5:21 pm
by Smark
scottwilkerson wrote:In Nagios XI many clients use Inbound / Outbound data transfer to send the distributed items between Nagios servers.
Hi Scott,

Can you clarify this at all? Maybe a doc link or a screenshot?

Thanks for your prompt response.

Re: Distributed Nagios Architecture

Posted: Thu Jun 05, 2014 10:44 am
by abrist
XI servers can send and receive checks to/from other XI servers. Using the mechanisms outlined below, you can essentially use local nagios servers for regional offices and then have all them push their checkresults to a central server (or multiple central servers).
Inbound
Outbound

Re: Distributed Nagios Architecture

Posted: Thu Jun 05, 2014 11:15 am
by BanditBBS
I of course have to throw this out there...I use md_gearman for this exact reason. That way I only have to configure one XI server and I have the host/service checks going to specific workers where the machines are located.

Re: Distributed Nagios Architecture

Posted: Thu Jun 05, 2014 11:33 am
by Smark
abrist wrote:XI servers can send and receive checks to/from other XI servers. Using the mechanisms outlined below, you can essentially use local nagios servers for regional offices and then have all them push their checkresults to a central server (or multiple central servers).
Inbound
Outbound
This is perfect! For some reason I had it in my head that Passive Checks had to do with the Nagios Agent sending results to the server. I wasn't aware it was also used for server-to-server communication.

I've read through those documents and a few of the ones they reference. I'm still a little confused on how the configuration is managed. Based on those documents each Nagios Server has it's own independent host/service/etc config and is set to forward results to another server. I assume there is no way to have central management of all of the configuration? How are configs generally managed in these types of environments?

Typically, how is a solution like this architected? One master server and N Nagios nodes that all forward their check results to the master server?
BanditBBS wrote:I of course have to throw this out there...I use md_gearman for this exact reason. That way I only have to configure one XI server and I have the host/service checks going to specific workers where the machines are located.
So I'm looking at their documentation and as I understand it, you only need one Nagios server and then a bunch of Gearman Job Servers, one in each location to actually execute the checks, right? What have you seen are the drawbacks? Right now this looks like a good direction to pursue.

Re: Distributed Nagios Architecture

Posted: Thu Jun 05, 2014 11:57 am
by BanditBBS
Smark wrote:So I'm looking at their documentation and as I understand it, you only need one Nagios server and then a bunch of Gearman Job Servers, one in each location to actually execute the checks, right? What have you seen are the drawbacks? Right now this looks like a good direction to pursue.
The compatability with Nagios Core 4.x(XI 2014) requires a special version and I haven't fully tested it yet myself as I am still running 2012 here. Other than that, I can't think of any drawbacks, it works great in my environment.

Re: Distributed Nagios Architecture

Posted: Thu Jun 05, 2014 11:58 am
by Smark
BanditBBS wrote:
Smark wrote:So I'm looking at their documentation and as I understand it, you only need one Nagios server and then a bunch of Gearman Job Servers, one in each location to actually execute the checks, right? What have you seen are the drawbacks? Right now this looks like a good direction to pursue.
The compatability with Nagios Core 4.x(XI 2014) requires a special version and I haven't fully tested it yet myself as I am still running 2012 here. Other than that, I can't think of any drawbacks, it works great in my environment.
Awesome! I'm spinning up some servers now to play with it. I'll report back soon.

Re: Distributed Nagios Architecture

Posted: Thu Jun 05, 2014 12:13 pm
by slansing
You can run local gearman workers as well, but if you want a truly distributed checking environment then yes, you will want to have a handful of job servers, if you are using 2014/Core4 you will want to follow my post at the bottom of the second page here:

http://support.nagios.com/forum/viewtop ... n&start=10

Re: Distributed Nagios Architecture

Posted: Mon Jun 09, 2014 11:20 am
by Smark
So I have everything working, sort of. When looking at the hostgroups and servicegroups option in mod_gearman_worker.conf you can specify which hostgroups and which servicegroups should be executed by which workers.

In our environment we have servers dispersed around the world so it makes more sense to say "any services on hosts in this hostgroup should be checked by this worker". Does that functionality exist?