Page 1 of 1

Nagios XI failover modeled on backup_xi.sh

Posted: Thu Sep 04, 2014 9:23 pm
by gurkakrieg
I'm trying to set up failover for Nagios XI. The general plan is imitate what backup_xi.sh does. Does anyone see any potential pitfalls with the following plan?

1. Use rsync to keep the various flat files sync'd between active and failover servers (/etc/nagiosql, /usr/local/nagios, etc.)
2. Use pg_dump and mysqldump to backup the databases, then scp them across

When failover occurs, the dbs get restored and the failover Nagios goes live.

1 and 2 would run as part of a cron job -- I'm not sure what the cycle time would be, but I'm guess 15 to 30 minutes between cron runs. I'll have to see how much potential perfdata loss we're willing to live with.

Alternatively, I could just have backup_xi.sh run as a cron job, then copy it over to our backup server. The solution above might use less bandwidth between the servers, plus use less CPU on the master. That's without any benchmarking, of course.

There are other parts involved -- getting some slave Nagios servers to change masters for their passive checks, for example -- but I've got a pretty good idea how to handle those parts.

Thanks

Re: Nagios XI failover modeled on backup_xi.sh

Posted: Fri Sep 05, 2014 9:40 am
by tmcdonald
As Einstein would say:
Albert E wrote:In theory, theory and practice are the same. In practice, they are not.
So in theory, this solution sounds solid. It seems like you have a firm grasp on what moving parts would be involved and how to handle them. My two cents:

Offloading the DB will somewhat reduce the need to do all the rsync - just share the DB between the XI servers. The only problem then is if the DB goes down, so now you need to have a backup for that. I would argue that having the DB offloaded will reduce load on the XI machines themselves, and since the DB will likely be under-utilized anyway the overhead of running the syncs directly between them would not be too bad.