Page 1 of 1
Documentation & distributed monitoring question
Posted: Wed Sep 07, 2011 8:44 am
by chris.trotter
Hello,
I note there is still no documentation available for Fusion - when can we expect this to be ready? Will you offer trial extensions?
Also, your overview page states:
Distributed Monitoring Made Easy: Alleviates the need for complex configurations, data transfer problems, and having to manage changes on both central and distributed nodes.
Can I get more information on that? Is this a plugin/component to add to Fusion? I have not seen any options for configuring distributed monitoring via the Fusion GUI.
Thanks!
Re: Documentation & distributed monitoring question
Posted: Wed Sep 07, 2011 9:41 am
by mguthrie
Currently Fusion acts as a central viewer for all of the servers, the monitoring configurations are still managed by the distributed servers, but the unified view allows you to easily click into any of the servers.
Re: Documentation & distributed monitoring question
Posted: Wed Sep 07, 2011 10:48 am
by chris.trotter
Ah, I see. Could you point me in the direction of what we should be using to have a distributed Nagios system then?
Our goal is for Nagios to tolerate failure of the primary Nagios box, so we'd need something that could sync up the config across multiple physical/virtual boxes, ideally across multiple sites. Or even having two redundant Nagios boxes acting as parents, with subsequent child Nagios boxes at other sites reporting back to the parents.
Re: Documentation & distributed monitoring question
Posted: Wed Sep 07, 2011 11:55 am
by mguthrie
Oh Ok, I think what you are looking for would be more along the lines of "High Availability" options. Are you wanting to use Nagios XI (commercial), or Nagios Core (community) to do this? We're currently working on a streamline way to sync two XI servers, but there are external options like VMotion (VMWare) and there are also probably a few community options out there as well. Take a look at this and see if it points you in a better direction.
http://library.nagios.com/library/produ ... ty-options
It's probably worth mentioning that a single XI license covers a production install, a test install, and a disaster recovery install.
Re: Documentation & distributed monitoring question
Posted: Thu Sep 08, 2011 7:45 am
by chris.trotter
We were going to use XI. The idea was to have two Nagios servers - one at DC1 the other at DC2, and have the system remain online should one DC go down.
I have seen that HA document. The VMware HA is not really what we're looking for as it would only take care of hardware failure.
The DR licensing option intrigues me - could you provide more info on that?
If it works how I think it does, I think we could do this:
DC1: Nagios XI monitoring everything
DC2: Nagios XI DR install on a VM/spare server, Nagios Core monitoring DC1/primary Nagios XI box
EDIT: Ah, does the 'Backing up and Restoring' library document cover this? Essentially just another XI install that is inactive until the latest backup is restored to it?
Re: Documentation & distributed monitoring question
Posted: Thu Sep 08, 2011 9:38 am
by mguthrie
The "Backing Up and Restoring XI" is written and tested for backing up a single server, not necessarily making a DR copy of a system. It might work, but to be honest I don't think we've tested it. Another possibility would be to import the object configuration files to the second server. The redundancy setup that you're describing sounds fairly solid. You could even just run a cron job that checks the main server and then turns on Active Checks if it detects a problem.
Re: Documentation & distributed monitoring question
Posted: Thu Sep 08, 2011 9:53 am
by chris.trotter
Okay, management is sounding receptive to all this, but I'll need more information on the DR stuff. Is it DR how I've described, or is there a specific method to setting up the DR host?
Re: Documentation & distributed monitoring question
Posted: Thu Sep 08, 2011 4:57 pm
by mguthrie
Currently we don't have a documented and streamlined way to set up a DR system. Our lead developer has begun a solution that will help with this process, but our challenge in the past is the huge variance in people's monitoring environment. The main concepts include:
-Create a mirrored monitoring server that remains dormant
-Have some form of check in place for if the primary server goes down
-Initialize secondary server upon detection that the first one went down
Re: Documentation & distributed monitoring question
Posted: Mon Sep 12, 2011 7:31 am
by chris.trotter
Ok, thanks a lot, I'll post in the XI forum about my other licensing questions.
Re: Documentation & distributed monitoring question
Posted: Mon Sep 12, 2011 11:04 am
by mguthrie
Ok, sounds good.