Page 1 of 1

XI: queries regarding backup & restore (off-site)

Posted: Tue Jul 14, 2020 5:08 am
by inversecow
Ahoy folks,

FYI, I have been working towards a process of fail-over for a Nagios XI instance (APP node + off-box dB pair), to another (geographically separate) APP node / off-box dB pair.
While I have a process that has worked in a PROD context, it very manual, and is rather inefficient (un-tar / tar is taking the large balance of total exec time.)
Specifically I face a number of challenges, some of which I expect are unavoidable (due to the nature of my ENV), but I would hope some are perhaps addressable?

For context, I am using

Code: Select all

/usr/local/nagiosxi/scripts/backup_xi.sh
and

Code: Select all

/usr/local/nagiosxi/scripts/restore_xi.sh
for the purposes of this activity (open to suggestions if a better option exists.)
This is all performed in such a way, that I try to minimize downtime by:

* disabling notifications on the SOURCE (via the WebGUI)
* run a "Apply Config" to ensure enable_notifications is disabled in the subsequent backup
* take the "backup" from the SOURCE (APP & off-box dB) node pair
* rsync the backup over to the RESTORE (APP & off-box dB) node pair
* "restore" the tar-ball
* fiddle with some bits of the nested tar-balls
* nagiosxi.tar.gz
* cfg__db_info__nagiosxi__dbserver
* cfg__db_info__ndoutils__dbserver
* cfg__db_info__nagiosql__dbserver
* dbserver
* nagios.tar.gz
* db_host
* update themysqlpass in

Code: Select all

/usr/local/nagiosxi/scripts/restore_xi.sh
* run

Code: Select all

/usr/local/nagiosxi/scripts/restore_xi.sh
with my "adjusted" tar-ball
* make sure the RESTORE XI instance is alive & stable
* enable notifications on the restore node pair
* live another day

I aim to (eventually) drive this solution via ansible, but am of the mind-set that I need to understand the my options with "the guts" before I may automate successfully (or with operational confidence.)

0.) Is there a method / API call to disable `notifications` for XI (using non-human / non-XI login users via the postfix facility, not the XI notifications sub-system for "human" users)?

I realize I could

Code: Select all

sed -i'' 's/enable_notifications=0/enable_notifications=1/' /usr/local/nagios/etc/nagios.cfg
(for enable_notifications).
This however, does not appear to "survive" the obligatory recycle of the nagios sub-service (triggered by the mysqldump part of backup_xi.sh, as this setting "persists" based on what was previously recorded in

Code: Select all

/usr/local/nagios/var/retention.dat
.
At present I am required to manually disable the flag for notifications within the XI WebGUI, wait until it "takes", than proceed (less than ideal for automation.)
Given the warning at the top of the retention.dat file, I expect it unwise to "fiddle" with this file (prior to the nagios sub-service recycle.)

1.) Decompression / Compression of

Code: Select all

/usr/local/nagiosxi/
directory?

This is a "low pain" section, but a required step non-the-less.
Any way to pre-define a target IP for cfg__db_info__nagiosxi__dbserver, cfg__db_info__ndoutils__dbserver, and cfg__db_info__nagiosql__dbserver in

Code: Select all

/usr/local/nagiosxi/var/xi-sys.cfg
?

2.) Decompression / Compression times for

Code: Select all

/usr/local/nagios/
directory?

Within the context of my PROD ENV, I have a (what I would consider a large) quantity of HOST (active, ~4.3K monitors) && SERVICE (~95%+ passive, ~35K monitors).
This takes an "extended period" to decompress / re-compress (/usr/local/nagios/ tar-ball) for the purposes of my efforts (due to the serial nature of tar in this context, not expecting a re-write of tar.)
Specifically, the "land-mine" is the 4.3K separate HOST records in /usr/local/nagios/hosts/ & ~35K separate SERVICE records in /usr/local/nagios/etc/services/, which is a part of this "sub" tar-ball.
Is there any way to update / stage

Code: Select all

/usr/local/nagios/etc/ndo2db.cfg
(update value for db_host=IP) in such a way that I do not need to perform a decompress / re-compress on the (nested) tar-ball?

Constructive feed-back & analysis is invited.

Have a great day! :-)

# ENV details:

OS: all involved nodes (APP & dB) run RHEL 7.x, on VMWare VMs
XI: 5.6.xx
dB: MaridDB 5.x (will go to 10.x once fail-over / back works well)
* NOTE: SOURCE && TARGET dB (do not yet) replicate, this will soon be rectified
mod_gearman: no (exploring, but not in the equation at this time)

# REF:

[Backing Up And Restoring Your Nagios XI System](https://assets.nagios.com/downloads/nag ... ios-XI.pdf)

P.S. Apologies for the "janky" read, had this all formatted in markdown, and converted as best able.

Re: XI: queries regarding backup & restore (off-site)

Posted: Wed Jul 15, 2020 10:58 am
by benjaminsmith
Hi,

Thanks for sharing your process! Lots of information here, so just let me know if I didn't catch something.

Is there a method / API call to disable `notifications` for XI (using non-human / non-XI login users via the postfix facility, not the XI notifications sub-system for "human" users)?

You can globally disable notifications by sending the following Nagios Core external command.
DISABLE_NOTIFICATIONS

Additionally, this same command can be sent using the REST API, check out the System Reference API by going to Help > API Docs in the Nagios XI user interface. If you setting up a script, you can run this command then sleep for some period until it fully takes effect.

As far as not compressing and decompressing those folders, some trades off here between data size and time required to transfer. This is out of scope for product support but it's certainly possible to modify the backup and restore scripts to skip that part.

This is a "low pain" section, but a required step non-the-less.
Any way to pre-define a target IP for cfg__db_info__nagiosxi__dbserver, cfg__db_info__ndoutils__dbserver, and cfg__db_info__nagiosql__dbserver in


Not currently, but I could submit a feature request if you'd like.

Is there any way to update / stage /usr/local/nagios/etc/ndo2db.cfg?
Can you clarify this step, what information and is this for the backup or restore? Thanks, Benjamin

Re: XI: queries regarding backup & restore (off-site)

Posted: Mon Jul 20, 2020 1:15 pm
by inversecow
Ahoy Benjamin,

Thanks for your response and apologies for the delay.
I went down a rabbit hole and figured some things out on this matter in terms of some of my queries.

You can globally disable notifications by sending the following Nagios Core external command.
DISABLE_NOTIFICATIONS

Additionally, this same command can be sent using the REST API, check out the System Reference API by going to Help > API Docs in the Nagios XI user interface. If you setting up a script, you can run this command then sleep for some period until it fully takes effect.


Excellent, the API route is preferable given my eventual goal of automation (EG: via ansible).
I looked over my API docs on-system, but failed to note anything specific for

Code: Select all

DISABLE_NOTIFICATIONS
.
I also checked in the Commands section of XI, and failed to observe anything there.
Is this (sort of) what you had in mind in terms of this assertion (with different URL & API key of course):

Code: Select all

curl -XPOST "https://nagios.domain.fqdn/nagiosxi/api/v1/config/command?apikey=4p14dm1nk3y&pretty=1" -d "command_name=DISABLE_NOTIFICATIONS&command_line=/usr/local/nagios/var/rw/nagios.cmd something_here&applyconfig=1"
Building a sleep in for 30 - 60 seconds does seem prudent, based on what I have observed with manual operations of this type.

---

On the compression / decompression matter, I realized I was "soft in the head" (to use the local dialect of my home region) on this matter, and was not thinking of the "UNIX Way" of doing things.
Specifically,

Code: Select all

tar
has long possessed the ability to "update" archives, without having to unpack / re-pack the whole lot, the trick is the requirement to decompress the tar-ball first.
Put another way, I can keep the files together in the tarball (EG: xi_big_backup.tar) for the update, just have to take it out of the compressed state (EG: xi_big_backup.tar.gz).
Realizing this, I was able to come-up with some basic shell code to handle this, on the premise that all I care about is replacing a known IP value, with another (source vs fail-over dB IP).

This works rather well in the three instance I noted having to make such changes (using control variables to easily change the target tarball, and the target config files).
Assumption, one has already decompressed / unpacked the master backup file (EG: xi_big_backup.tar.gz), and moved within the unpacked base directory.
The sudo references could be dropped also if this was wrapped inside of a script, which would be a logical next step (due to the runtimes involved, vs sudo timeout before a re-auth is enforced.)

# nagiosxi.tar.gz: decompress / update / recompress (skip unpack / repack)

Code: Select all

SOURCE_DB_IP='XXX.XXX.XXX.XXX'; \
TARGET_DB_IP='YYY.YYY.YYY.YYY'; \
TARGET_ARCHIVE='nagiosxi.tar'; \
sudo gunzip "$TARGET_ARCHIVE".gz; \
TARGET_CONFIG='usr/local/nagiosxi/var/xi-sys.cfg'; \
sudo tar -xf "$TARGET_ARCHIVE" "$TARGET_CONFIG"; \
sudo sed -i'' "s/"$SOURCE_DB_IP"/"$TARGET_DB_IP"/" "$TARGET_CONFIG"; \
TARGET_CONFIG='usr/local/nagiosxi/html/config.inc.php'; \
sudo tar -xf "$TARGET_ARCHIVE" "$TARGET_CONFIG"; \
sudo sed -i'' "s/"$SOURCE_DB_IP"/"$TARGET_DB_IP"/" "$TARGET_CONFIG"; \
sudo tar -uf "$TARGET_ARCHIVE" "usr/local/nagiosxi/"; \
sudo gzip "$TARGET_ARCHIVE"; sudo rm -rf usr/;
# nagios.tar.gz: decompress / update / recompress (skip unpack / repack)

Code: Select all

SOURCE_DB_IP='XXX.XXX.XXX.XXX'; \
TARGET_DB_IP='YYY.YYY.YYY.YYY'; \
TARGET_ARCHIVE='nagios.tar'; \
sudo gunzip "$TARGET_ARCHIVE".gz; \
TARGET_CONFIG='usr/local/nagios/etc/ndo2db.cfg'; \
sudo tar -xf "$TARGET_ARCHIVE" "$TARGET_CONFIG"; \
sudo sed -i'' "s/"$SOURCE_DB_IP"/"$TARGET_DB_IP"/" "$TARGET_CONFIG"; \
sudo tar -uf "$TARGET_ARCHIVE" "$TARGET_CONFIG"; \
sudo gzip "$TARGET_ARCHIVE"; sudo rm -rf usr/;
This saved me approx 10 mins for the [nagios.tar.gz] updates (normally takes ~30 mins in my large installation, due to the serial nature of tar).
Of course this lacks any manner of error handling or related coding best practices, given the "prototyping" nature of the solution at present.

I would think in addition to augmenting / writing better code for the scenario, taking the arguments at the outset for SOURCE vs FAILOVER IP for the dB target would be expected inputs.
Debate-ably, one might "harvest" the SOURCE IPs from the system, but one may not count on this (as the SOURCE system may potentially be inaccessible, leading to the need for the fail-over.)
In my view, there are only 3 config files that would need to be updated, and those are known values.
Perhaps something useful to pass along in the form of a feature request?

---

The ask regarding /usr/local/nagios/etc/ndo2db.cfg was to update the one config file in that sub-tarball (which in my ENV rates around 5.4 GB at present).
However, as noted above I figured out a solution.
Alternately, a copy of this file could be made and staged to the FS, the value updated, and the file named something like /usr/local/nagios/etc/ndo2db.cfg.failover (before the archive it taken).
Than on the other side, have the script swap that file in place of the original.

Have a great day!

Re: XI: queries regarding backup & restore (off-site)

Posted: Tue Jul 21, 2020 1:24 pm
by benjaminsmith
Hi,

Some good stuff here! Thanks for the update. To disable the notification using the Core command from the REST API, it would look like this:

Code: Select all

 curl -XPOST "https://192.168.23.112/nagiosxi/api/v1/system/corecommand?apikey=XYZ" -d  "cmd=DISABLE_NOTIFICATIONS" 
To Enable Again

Code: Select all

 curl -XPOST "https://192.168.23.112/nagiosxi/api/v1/system/corecommand?apikey=XYZ" -d  "cmd=ENABLE_NOTIFICATIONS"