High availability Nagios XI

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
maxwellmiranda
Posts: 113
Joined: Thu Mar 22, 2012 3:24 pm

High availability Nagios XI

Post by maxwellmiranda »

can you give the details for setting up High availability for Nagios XI server...
we are using Linux 64 bit server....
lmilkovic
Posts: 15
Joined: Wed Oct 13, 2010 2:31 am

Re: High availability Nagios XI

Post by lmilkovic »

The official document about the HA options for Nagios is available here: http://assets.nagios.com/downloads/nagi ... ptions.pdf
Unfortunately I didn't find it very informative:(

If you're implementing HA solution from the beginning (i.e. you haven't deployed Nagios yet), and you have the needed resources (shared storage), it is relatively straight-forward when you use Linux-HA and DRBD. These solutions are true "heavy-weight" HA implementations and are well documented both on offical pages and additional tutorials.

However, if you already have one Nagios XI server in your environment and you want to add another, without shared storage, disk drivers, etc., you're on your own:(

When implementing custom DR/HA solution without shared storage, you have to take these things into account (at minimum) when syncing configuration:
  • NDO database
  • Nagios Core configuration files, state files and plugins
  • PNP4Nagios perfdata files
  • Nagios XI database
For NDO database (since it's a MySQL database), it's best to use integrated MySQL replication. This replication must be dual-master, since another node (slave) can become master at any time. This procedure is not so straight-forward unfortunately and takes some time to get it right.
For plain files you can setup cron job and rsync the files between the nodes.
Nagios XI database is a PostgreSQL database. I found PostgreSQL replication mechanism rather cumbersome and, since the database is relatively "low-activity", it is easier to dump the database, rsync it to other node and import it there.

When you have all relevant files synced, you have to implement a watchdog.
We implemented a robust watchdog/heartbeat service that runs on both nodes and checks the other node. Based on different conditions (for example, host unreachable or Nagios process down) and different logic (for example, Condition1 AND Condition2 OR Condition3), service can automatically start failover or perform additional actions.

Hope this helps.

Luka
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: High availability Nagios XI

Post by scottwilkerson »

Thanks for chiming in Luka
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked