For HA what does everyone think is better, Mike Weber's talk here: http://www.slideshare.net/nagiosinc/mike-weber-failover
or using VMware HA/FT? We want to do this between two datacenters so if storage and/or host and/or dc itself goes down we can have a working Nagios up and running quickly.
Thanks!
High Availability
High Availability
Last edited by BanditBBS on Tue Sep 30, 2014 2:18 pm, edited 1 time in total.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Re: High Availability
Since I did that talk on failover much has changed. Many companies now use clustered VMWare instances which reduces the chances of problems. Larger companies are now opting for a replication of their master instead of doing failover to a running machine as there is far less maintenance and downtime is minimal.
Advantages of Failover:
Relatively Simple Set Up
The set up described is relatively simple and works well for a mature system that will not change much. The actual time to failover is usually less than 7 minutes. This is a very short amount of time to be blind.
Disadvantages to a Failover:
Passive Checks
One thing that will impact your decision is if you use passive. With passive checks you end up with a problem in that you do not want to be sendig output to two servers. So in the failover option that I used in the example, does not work well with passive.
Constant Changes
If you make significant changes to the Nagios master you will need those replicated on the slave. If you add a new plugin with dependencies, you have to add it to the slave.
Advantages of Failover:
Relatively Simple Set Up
The set up described is relatively simple and works well for a mature system that will not change much. The actual time to failover is usually less than 7 minutes. This is a very short amount of time to be blind.
Disadvantages to a Failover:
Passive Checks
One thing that will impact your decision is if you use passive. With passive checks you end up with a problem in that you do not want to be sendig output to two servers. So in the failover option that I used in the example, does not work well with passive.
Constant Changes
If you make significant changes to the Nagios master you will need those replicated on the slave. If you add a new plugin with dependencies, you have to add it to the slave.
Mike Weber
Nagios Training/Consulting
Nagios Training/Consulting
Re: High Availability
This problem can be mitigated with a virtual ip through ucarp, pacemaker, keepalived, etc.mikew wrote:Disadvantages to a Failover:
Passive Checks
One thing that will impact your decision is if you use passive. With passive checks you end up with a problem in that you do not want to be sendig output to two servers. So in the failover option that I used in the example, does not work well with passive.
This can be reduced with a shared volume. You will still need to install packages on the secondary, but all of the nagios data/plugin locations can be moved to a shared volume (like drbd/nfs/etc).mikew wrote: Constant Changes
If you make significant changes to the Nagios master you will need those replicated on the slave. If you add a new plugin with dependencies, you have to add it to the slave.
I would suggest vmware HA if it is an option. Just make sure your san/disk io is *fast*. Otherwise I would suggest looking at the linux HA stack - specifically drbd/pacemaker.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: High Availability
This is actually a good use or VMware technology however you need some very low latency links between the datacenters for this to work. This is because the hosts at each datacenter need access to the same storage for HA/FT to work. Which kinda makes you wonder what location the storage is in ...BanditBBS wrote:using VMware HA/FT? We want to do this between two datacenters so if storage and/or host and/or dc itself goes down we can have a working Nagios up and running quickly.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: High Availability
Yeah, I'm trying to think of ways to do this without same storage access issue.
The reason - What if our Chicago DC goes down(cable broken, tornado, volcano, etc). That is where our Nagios server and its storage is located. Customers that have servers in our other DCs or our managed/not hosted customers will want monitoring to be active until the Chicago DC comes back online. If the San Fran Nagios was using same storage we wouldn't be able to failover in that case. I'm starting to lean towards the daily XI backup that is ssh'd to the SF server and if CHI ever goes down we can spend the few minutes to ru nthe XI restore script on the other server. Sure, we won't have history and stuff, but we'd have active monitoring. We'd have the history back once CHI came back online(as long as it wasn't a volcano).
Thoughts?
The reason - What if our Chicago DC goes down(cable broken, tornado, volcano, etc). That is where our Nagios server and its storage is located. Customers that have servers in our other DCs or our managed/not hosted customers will want monitoring to be active until the Chicago DC comes back online. If the San Fran Nagios was using same storage we wouldn't be able to failover in that case. I'm starting to lean towards the daily XI backup that is ssh'd to the SF server and if CHI ever goes down we can spend the few minutes to ru nthe XI restore script on the other server. Sure, we won't have history and stuff, but we'd have active monitoring. We'd have the history back once CHI came back online(as long as it wasn't a volcano).
Thoughts?
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Re: High Availability
This is a decent method. You could use rsync instead of (or in combination with) the backup script to keep things more up to date. The big questions deal with the databases - you could offload them and then replicate them to the other location. Or just replicate the most important tables for monitoring (the ql tables), and use a daily backup for the rest.BanditBBS wrote:Thoughts?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: High Availability
I've never had to use the restore script, so not sure on how well it operates.
What all is included when doing the backup in XI?
What all is included when doing the backup in XI?
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Re: High Availability
everything important:
databases, nagiosxi dir, core configs, libexec, mrtg configs, rrds, mrtg rrds.
You just need to pay attention to any third party additions like the oracle/vmware sdks, java, etc.
databases, nagiosxi dir, core configs, libexec, mrtg configs, rrds, mrtg rrds.
You just need to pay attention to any third party additions like the oracle/vmware sdks, java, etc.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: High Availability
Backing up XI and restoring XI is piece of cake with backup/restore script - so long as you're just using XI. As Andy said, you need to watch for add-ons and so forth. We just did this for a customer to prove that our disaster recovery worked (snapshotted the machine first, of course). Even installed it to different box. Easy peasy, "one and done" kind of operation.BanditBBS wrote:I've never had to use the restore script, so not sure on how well it operates.
What all is included when doing the backup in XI?
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
Re: High Availability
The light side of disaster recovery is very simple. It gets much more complicated in HA/minimal downtime configurations, and even more difficult in large federated models.eloyd wrote: Easy peasy, "one and done" kind of operation.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.