I note there is still no documentation available for Fusion - when can we expect this to be ready? Will you offer trial extensions?
Also, your overview page states:
Distributed Monitoring Made Easy: Alleviates the need for complex configurations, data transfer problems, and having to manage changes on both central and distributed nodes.
Can I get more information on that? Is this a plugin/component to add to Fusion? I have not seen any options for configuring distributed monitoring via the Fusion GUI.
Currently Fusion acts as a central viewer for all of the servers, the monitoring configurations are still managed by the distributed servers, but the unified view allows you to easily click into any of the servers.
Ah, I see. Could you point me in the direction of what we should be using to have a distributed Nagios system then?
Our goal is for Nagios to tolerate failure of the primary Nagios box, so we'd need something that could sync up the config across multiple physical/virtual boxes, ideally across multiple sites. Or even having two redundant Nagios boxes acting as parents, with subsequent child Nagios boxes at other sites reporting back to the parents.
Oh Ok, I think what you are looking for would be more along the lines of "High Availability" options. Are you wanting to use Nagios XI (commercial), or Nagios Core (community) to do this? We're currently working on a streamline way to sync two XI servers, but there are external options like VMotion (VMWare) and there are also probably a few community options out there as well. Take a look at this and see if it points you in a better direction.
We were going to use XI. The idea was to have two Nagios servers - one at DC1 the other at DC2, and have the system remain online should one DC go down.
I have seen that HA document. The VMware HA is not really what we're looking for as it would only take care of hardware failure.
The DR licensing option intrigues me - could you provide more info on that?
If it works how I think it does, I think we could do this:
DC1: Nagios XI monitoring everything
DC2: Nagios XI DR install on a VM/spare server, Nagios Core monitoring DC1/primary Nagios XI box
EDIT: Ah, does the 'Backing up and Restoring' library document cover this? Essentially just another XI install that is inactive until the latest backup is restored to it?
The "Backing Up and Restoring XI" is written and tested for backing up a single server, not necessarily making a DR copy of a system. It might work, but to be honest I don't think we've tested it. Another possibility would be to import the object configuration files to the second server. The redundancy setup that you're describing sounds fairly solid. You could even just run a cron job that checks the main server and then turns on Active Checks if it detects a problem.
Okay, management is sounding receptive to all this, but I'll need more information on the DR stuff. Is it DR how I've described, or is there a specific method to setting up the DR host?
Currently we don't have a documented and streamlined way to set up a DR system. Our lead developer has begun a solution that will help with this process, but our challenge in the past is the huge variance in people's monitoring environment. The main concepts include:
-Create a mirrored monitoring server that remains dormant
-Have some form of check in place for if the primary server goes down
-Initialize secondary server upon detection that the first one went down