XI installation and failover design

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: XI installation and failover design

Post by gormank »

It doesn't look like there's a simple answer.

I'd guess I can live w/o passive checks.
As for having a single live system and a failover, the failover would be on another network in another state. It seems like the configs may be different, but maybe not. I guess the agents would be set up with the address of both systems.
Andy Brist sent links to a few presentations. After watching those, a DRBD cluster looks attractive, but I don't have the luxury of a lab or a ton of time to develop something.
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: XI installation and failover design

Post by jdalrymple »

gormank - you're right, there is no simple answer.

The solutions I've proposed have worked under the assumption that you want to use 2 Nagios servers to monitor their own respective geographical site and have a standby for that server in the opposing site only for DR type purposes. Is that what you'd like, or do you really only want one monitoring server.

If you think you can go on for eternity without passive checks then your configs will indeed be mostly identical.
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: XI installation and failover design

Post by gormank »

I'm trying to determine the best way to reliably monitor my systems without adding my preconceptions. Since Nagios is not HA, or fault tolerant, I see no way but to have 2 or more systems.

From my reading of Nagios documentation that says that it can be a good idea to have a 2nd nagios system monitoring the first, I decided that 2 systems were needed at least. Based on the fact that we have more licenses than we need, I came up with the idea to have 4 systems. I've since realized that at least in my mind, 4 may be too many due to user and config sync needed.

The basic design of our systems to be monitored is there's essentially a hot standby in the secondary location. Failover is manual.

What I certainly don't want is to have a single monitoring system fail over the weekend, and have no one notice.

I read a minute ago that passive checks were for asynchronous monitoring of devices that may be behind a firewall, that can't be monitored synchronously. I don't see a real need for this.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: XI installation and failover design

Post by abrist »

I found virtual ips and a vip manager service (like uCarp) to be extremely valuable when creating failover solutions. You can bypass drbd, shared volumes, etc, and just go with pushed backups to restore, and still minimize the config on the agents (even passives) with a virtual ip. Just a thought :)
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
mp4783
Posts: 116
Joined: Wed May 14, 2014 11:11 am

Re: XI installation and failover design

Post by mp4783 »

The backup could no doubt be pared down significantly, but in our case, safety first. You just need to identify dynamically changing files and send them over. Most likely just the following:

- All files in /usr/local/nagios/etc
- MySQL database dump
- PostgreSQL database dump

The standard backup utility is what my stuff is based on and it just backs up everything. However, it is intended to be "restored" to a dead system, not used as a DR backup. Our circumstances are different from most because we used modified installations and service controls (which sucks).
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: XI installation and failover design

Post by abrist »

mp4783 wrote:The backup could no doubt be pared down significantly, but in our case, safety first. You just need to identify dynamically changing files and send them over. Most likely just the following:

- All files in /usr/local/nagios/etc
- MySQL database dump
- PostgreSQL database dump
I would like to add retention.dat to this as well, as your acknowledgements, comments, and runtime state options will be in this file as well. You may also want to consider the perfdata rrds/mrtg rrds/mrtg config if your want to minimize losses in your performance/bandwidth data. But as always, it all depends on your requirements etc. This cat can be skinned in so many ways.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
mp4783
Posts: 116
Joined: Wed May 14, 2014 11:11 am

Re: XI installation and failover design

Post by mp4783 »

Good points. With those files, you get a near lossless failover.

The only concerns I have would be files that are rigidly tied to the original host. I solved most of those issues by substituting hostnames where necessary.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: XI installation and failover design

Post by abrist »

Hostnames or virtual ips will indeed solve that issue.
Let me know if you have other questions. If you open a ticket, I am also will to chat over the phone/remote session about this stuff as well.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
mp4783
Posts: 116
Joined: Wed May 14, 2014 11:11 am

Re: XI installation and failover design

Post by mp4783 »

I'm not allowed to open tickets per my company's agreement with Nagios LLC.

For what it's worth, my long term goal will be to provide a truly automated, fault tolerant configuration conceptually similar to a MySQL cluster, but distributed over the network. If I could stop time for about 6 months, I could get around to building it. Architecturally, I think I know how to do it.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: XI installation and failover design

Post by WillemDH »

I found virtual ips and a vip manager service (like uCarp) to be extremely valuable when creating failover solutions. You can bypass drbd, shared volumes, etc, and just go with pushed backups to restore, and still minimize the config on the agents (even passives) with a virtual ip. Just a thought :)
I agree with Andy. Also an F5 Load Balancer or even DNS Load Balancing with for example Infoblox: https://www.infoblox.com/products/netwo ... er-manager would certainly do the trick.

Personally did I not yet have to implement this for Nagios, as we have a ha vmware solution which is sufficient for now.
Nagios XI 5.8.1
https://outsideit.net
Locked