XI installation and failover design

gormank · Post by **gormank** » Fri Apr 03, 2015 1:03 pm

It doesn't look like there's a simple answer.

I'd guess I can live w/o passive checks.
As for having a single live system and a failover, the failover would be on another network in another state. It seems like the configs may be different, but maybe not. I guess the agents would be set up with the address of both systems.
Andy Brist sent links to a few presentations. After watching those, a DRBD cluster looks attractive, but I don't have the luxury of a lab or a ton of time to develop something.

jdalrymple · Post by **jdalrymple** » Fri Apr 03, 2015 1:29 pm

gormank - you're right, there is no simple answer.

The solutions I've proposed have worked under the assumption that you want to use 2 Nagios servers to monitor their own respective geographical site and have a standby for that server in the opposing site only for DR type purposes. Is that what you'd like, or do you really only want one monitoring server.

If you think you can go on for eternity without passive checks then your configs will indeed be mostly identical.

gormank · Post by **gormank** » Fri Apr 03, 2015 1:50 pm

I'm trying to determine the best way to reliably monitor my systems without adding my preconceptions. Since Nagios is not HA, or fault tolerant, I see no way but to have 2 or more systems.

From my reading of Nagios documentation that says that it can be a good idea to have a 2nd nagios system monitoring the first, I decided that 2 systems were needed at least. Based on the fact that we have more licenses than we need, I came up with the idea to have 4 systems. I've since realized that at least in my mind, 4 may be too many due to user and config sync needed.

The basic design of our systems to be monitored is there's essentially a hot standby in the secondary location. Failover is manual.

What I certainly don't want is to have a single monitoring system fail over the weekend, and have no one notice.

I read a minute ago that passive checks were for asynchronous monitoring of devices that may be behind a firewall, that can't be monitored synchronously. I don't see a real need for this.

abrist · Post by **abrist** » Fri Apr 03, 2015 2:03 pm

I found virtual ips and a vip manager service (like uCarp) to be extremely valuable when creating failover solutions. You can bypass drbd, shared volumes, etc, and just go with pushed backups to restore, and still minimize the config on the agents (even passives) with a virtual ip. Just a thought

mp4783 · Post by **mp4783** » Fri Apr 03, 2015 5:36 pm

The backup could no doubt be pared down significantly, but in our case, safety first. You just need to identify dynamically changing files and send them over. Most likely just the following:

- All files in /usr/local/nagios/etc
- MySQL database dump
- PostgreSQL database dump

The standard backup utility is what my stuff is based on and it just backs up everything. However, it is intended to be "restored" to a dead system, not used as a DR backup. Our circumstances are different from most because we used modified installations and service controls (which sucks).

abrist · Post by **abrist** » Mon Apr 06, 2015 10:51 am

mp4783 wrote:The backup could no doubt be pared down significantly, but in our case, safety first. You just need to identify dynamically changing files and send them over. Most likely just the following:

- All files in /usr/local/nagios/etc
- MySQL database dump
- PostgreSQL database dump

I would like to add retention.dat to this as well, as your acknowledgements, comments, and runtime state options will be in this file as well. You may also want to consider the perfdata rrds/mrtg rrds/mrtg config if your want to minimize losses in your performance/bandwidth data. But as always, it all depends on your requirements etc. This cat can be skinned in so many ways.

mp4783 · Post by **mp4783** » Wed Apr 08, 2015 8:14 am

Good points. With those files, you get a near lossless failover.

The only concerns I have would be files that are rigidly tied to the original host. I solved most of those issues by substituting hostnames where necessary.

abrist · Post by **abrist** » Wed Apr 08, 2015 10:41 am

Hostnames or virtual ips will indeed solve that issue.
Let me know if you have other questions. If you open a ticket, I am also will to chat over the phone/remote session about this stuff as well.

mp4783 · Post by **mp4783** » Sat Apr 11, 2015 5:21 pm

I'm not allowed to open tickets per my company's agreement with Nagios LLC.

For what it's worth, my long term goal will be to provide a truly automated, fault tolerant configuration conceptually similar to a MySQL cluster, but distributed over the network. If I could stop time for about 6 months, I could get around to building it. Architecturally, I think I know how to do it.

Post by **WillemDH** » Sun Apr 12, 2015 3:55 pm

I found virtual ips and a vip manager service (like uCarp) to be extremely valuable when creating failover solutions. You can bypass drbd, shared volumes, etc, and just go with pushed backups to restore, and still minimize the config on the agents (even passives) with a virtual ip. Just a thought

I agree with Andy. Also an F5 Load Balancer or even DNS Load Balancing with for example Infoblox: https://www.infoblox.com/products/netwo ... er-manager would certainly do the trick.

Personally did I not yet have to implement this for Nagios, as we have a ha vmware solution which is sufficient for now.

Nagios Support Forum

XI installation and failover design

Re: XI installation and failover design

Re: XI installation and failover design

Re: XI installation and failover design

Re: XI installation and failover design

Re: XI installation and failover design

Re: XI installation and failover design

Re: XI installation and failover design

Re: XI installation and failover design

Re: XI installation and failover design

Re: XI installation and failover design