The official document about the HA options for Nagios is available here:
http://assets.nagios.com/downloads/nagi ... ptions.pdf
Unfortunately I didn't find it very informative:(
If you're implementing HA solution from the beginning (i.e. you haven't deployed Nagios yet), and you have the needed resources (shared storage), it is relatively straight-forward when you use Linux-HA and DRBD. These solutions are true "heavy-weight" HA implementations and are well documented both on offical pages and additional tutorials.
However, if you already have one Nagios XI server in your environment and you want to add another, without shared storage, disk drivers, etc., you're on your own:(
When implementing custom DR/HA solution without shared storage, you have to take these things into account (at minimum) when syncing configuration:
- NDO database
- Nagios Core configuration files, state files and plugins
- PNP4Nagios perfdata files
- Nagios XI database
For NDO database (since it's a MySQL database), it's best to use integrated MySQL replication. This replication must be dual-master, since another node (slave) can become master at any time. This procedure is not so straight-forward unfortunately and takes some time to get it right.
For plain files you can setup cron job and rsync the files between the nodes.
Nagios XI database is a PostgreSQL database. I found PostgreSQL replication mechanism rather cumbersome and, since the database is relatively "low-activity", it is easier to dump the database, rsync it to other node and import it there.
When you have all relevant files synced, you have to implement a watchdog.
We implemented a robust watchdog/heartbeat service that runs on both nodes and checks the other node. Based on different conditions (for example, host unreachable or Nagios process down) and different logic (for example, Condition1 AND Condition2 OR Condition3), service can automatically start failover or perform additional actions.
Hope this helps.
Luka