FYI, I have been working towards a process of fail-over for a Nagios XI instance (APP node + off-box dB pair), to another (geographically separate) APP node / off-box dB pair.
While I have a process that has worked in a PROD context, it very manual, and is rather inefficient (un-tar / tar is taking the large balance of total exec time.)
Specifically I face a number of challenges, some of which I expect are unavoidable (due to the nature of my ENV), but I would hope some are perhaps addressable?
For context, I am using
Code: Select all
/usr/local/nagiosxi/scripts/backup_xi.shCode: Select all
/usr/local/nagiosxi/scripts/restore_xi.shThis is all performed in such a way, that I try to minimize downtime by:
* disabling notifications on the SOURCE (via the WebGUI)
* run a "Apply Config" to ensure enable_notifications is disabled in the subsequent backup
* take the "backup" from the SOURCE (APP & off-box dB) node pair
* rsync the backup over to the RESTORE (APP & off-box dB) node pair
* "restore" the tar-ball
* fiddle with some bits of the nested tar-balls
* nagiosxi.tar.gz
* cfg__db_info__nagiosxi__dbserver
* cfg__db_info__ndoutils__dbserver
* cfg__db_info__nagiosql__dbserver
* dbserver
* nagios.tar.gz
* db_host
* update themysqlpass in
Code: Select all
/usr/local/nagiosxi/scripts/restore_xi.shCode: Select all
/usr/local/nagiosxi/scripts/restore_xi.sh* make sure the RESTORE XI instance is alive & stable
* enable notifications on the restore node pair
* live another day
I aim to (eventually) drive this solution via ansible, but am of the mind-set that I need to understand the my options with "the guts" before I may automate successfully (or with operational confidence.)
0.) Is there a method / API call to disable `notifications` for XI (using non-human / non-XI login users via the postfix facility, not the XI notifications sub-system for "human" users)?
I realize I could
Code: Select all
sed -i'' 's/enable_notifications=0/enable_notifications=1/' /usr/local/nagios/etc/nagios.cfgThis however, does not appear to "survive" the obligatory recycle of the nagios sub-service (triggered by the mysqldump part of backup_xi.sh, as this setting "persists" based on what was previously recorded in
Code: Select all
/usr/local/nagios/var/retention.datAt present I am required to manually disable the flag for notifications within the XI WebGUI, wait until it "takes", than proceed (less than ideal for automation.)
Given the warning at the top of the retention.dat file, I expect it unwise to "fiddle" with this file (prior to the nagios sub-service recycle.)
1.) Decompression / Compression of
Code: Select all
/usr/local/nagiosxi/This is a "low pain" section, but a required step non-the-less.
Any way to pre-define a target IP for cfg__db_info__nagiosxi__dbserver, cfg__db_info__ndoutils__dbserver, and cfg__db_info__nagiosql__dbserver in
Code: Select all
/usr/local/nagiosxi/var/xi-sys.cfg2.) Decompression / Compression times for
Code: Select all
/usr/local/nagios/Within the context of my PROD ENV, I have a (what I would consider a large) quantity of HOST (active, ~4.3K monitors) && SERVICE (~95%+ passive, ~35K monitors).
This takes an "extended period" to decompress / re-compress (/usr/local/nagios/ tar-ball) for the purposes of my efforts (due to the serial nature of tar in this context, not expecting a re-write of tar.)
Specifically, the "land-mine" is the 4.3K separate HOST records in /usr/local/nagios/hosts/ & ~35K separate SERVICE records in /usr/local/nagios/etc/services/, which is a part of this "sub" tar-ball.
Is there any way to update / stage
Code: Select all
/usr/local/nagios/etc/ndo2db.cfgConstructive feed-back & analysis is invited.
Have a great day!
# ENV details:
OS: all involved nodes (APP & dB) run RHEL 7.x, on VMWare VMs
XI: 5.6.xx
dB: MaridDB 5.x (will go to 10.x once fail-over / back works well)
* NOTE: SOURCE && TARGET dB (do not yet) replicate, this will soon be rectified
mod_gearman: no (exploring, but not in the equation at this time)
# REF:
[Backing Up And Restoring Your Nagios XI System](https://assets.nagios.com/downloads/nag ... ios-XI.pdf)
P.S. Apologies for the "janky" read, had this all formatted in markdown, and converted as best able.