High Availability

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

High Availability

Post by BanditBBS »

For HA what does everyone think is better, Mike Weber's talk here: http://www.slideshare.net/nagiosinc/mike-weber-failover

or using VMware HA/FT? We want to do this between two datacenters so if storage and/or host and/or dc itself goes down we can have a working Nagios up and running quickly.

Thanks!
Last edited by BanditBBS on Tue Sep 30, 2014 2:18 pm, edited 1 time in total.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
User avatar
mikew
Posts: 243
Joined: Sun Feb 05, 2012 7:05 pm

Re: High Availability

Post by mikew »

Since I did that talk on failover much has changed. Many companies now use clustered VMWare instances which reduces the chances of problems. Larger companies are now opting for a replication of their master instead of doing failover to a running machine as there is far less maintenance and downtime is minimal.

Advantages of Failover:
Relatively Simple Set Up
The set up described is relatively simple and works well for a mature system that will not change much. The actual time to failover is usually less than 7 minutes. This is a very short amount of time to be blind.

Disadvantages to a Failover:
Passive Checks
One thing that will impact your decision is if you use passive. With passive checks you end up with a problem in that you do not want to be sendig output to two servers. So in the failover option that I used in the example, does not work well with passive.

Constant Changes
If you make significant changes to the Nagios master you will need those replicated on the slave. If you add a new plugin with dependencies, you have to add it to the slave.
Mike Weber

Nagios Training/Consulting
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: High Availability

Post by abrist »

mikew wrote:Disadvantages to a Failover:
Passive Checks
One thing that will impact your decision is if you use passive. With passive checks you end up with a problem in that you do not want to be sendig output to two servers. So in the failover option that I used in the example, does not work well with passive.
This problem can be mitigated with a virtual ip through ucarp, pacemaker, keepalived, etc.
mikew wrote: Constant Changes
If you make significant changes to the Nagios master you will need those replicated on the slave. If you add a new plugin with dependencies, you have to add it to the slave.
This can be reduced with a shared volume. You will still need to install packages on the secondary, but all of the nagios data/plugin locations can be moved to a shared volume (like drbd/nfs/etc).

I would suggest vmware HA if it is an option. Just make sure your san/disk io is *fast*. Otherwise I would suggest looking at the linux HA stack - specifically drbd/pacemaker.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: High Availability

Post by Box293 »

BanditBBS wrote:using VMware HA/FT? We want to do this between two datacenters so if storage and/or host and/or dc itself goes down we can have a working Nagios up and running quickly.
This is actually a good use or VMware technology however you need some very low latency links between the datacenters for this to work. This is because the hosts at each datacenter need access to the same storage for HA/FT to work. Which kinda makes you wonder what location the storage is in ...
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: High Availability

Post by BanditBBS »

Yeah, I'm trying to think of ways to do this without same storage access issue.

The reason - What if our Chicago DC goes down(cable broken, tornado, volcano, etc). That is where our Nagios server and its storage is located. Customers that have servers in our other DCs or our managed/not hosted customers will want monitoring to be active until the Chicago DC comes back online. If the San Fran Nagios was using same storage we wouldn't be able to failover in that case. I'm starting to lean towards the daily XI backup that is ssh'd to the SF server and if CHI ever goes down we can spend the few minutes to ru nthe XI restore script on the other server. Sure, we won't have history and stuff, but we'd have active monitoring. We'd have the history back once CHI came back online(as long as it wasn't a volcano).

Thoughts?
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: High Availability

Post by abrist »

BanditBBS wrote:Thoughts?
This is a decent method. You could use rsync instead of (or in combination with) the backup script to keep things more up to date. The big questions deal with the databases - you could offload them and then replicate them to the other location. Or just replicate the most important tables for monitoring (the ql tables), and use a daily backup for the rest.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: High Availability

Post by BanditBBS »

I've never had to use the restore script, so not sure on how well it operates.

What all is included when doing the backup in XI?
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: High Availability

Post by abrist »

everything important:
databases, nagiosxi dir, core configs, libexec, mrtg configs, rrds, mrtg rrds.
You just need to pay attention to any third party additions like the oracle/vmware sdks, java, etc.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
eloyd
Cool Title Here
Posts: 2190
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: High Availability

Post by eloyd »

BanditBBS wrote:I've never had to use the restore script, so not sure on how well it operates.

What all is included when doing the backup in XI?
Backing up XI and restoring XI is piece of cake with backup/restore script - so long as you're just using XI. As Andy said, you need to watch for add-ons and so forth. We just did this for a customer to prove that our disaster recovery worked (snapshotted the machine first, of course). Even installed it to different box. Easy peasy, "one and done" kind of operation.
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: High Availability

Post by abrist »

eloyd wrote: Easy peasy, "one and done" kind of operation.
The light side of disaster recovery is very simple. It gets much more complicated in HA/minimal downtime configurations, and even more difficult in large federated models.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Locked