XI: adjustments to support large deployments & APP backups
Posted: Tue Nov 09, 2021 4:45 pm
Ahoy folks,
We maintain several Nagios XI instances within our ENVs (some large deployments, some small).
For most of our ENVs, we can rely on either the built-in XI level "Scheduled Backups (Local Backups)" facility.
This works well (for the most part), in that it seems to have a built-in "syslog rotate" type facility (so as to keep the backup store from becoming burdensome).
That said, it would be nice to book for more than one backup (EG: ideal would be twice daily, before start of business & after close of business).
Anyway, while all this works well for our small && medium sized deployments, this does not work for our large deployment.
Specifically, the XI level backup facility strikes upon some sort of built-in "backup timeout" value, and bails.
Most likely this is due to the quantity of data being backed up.
Interestingly, this does not appear to be a problem for manually initiated CLI level backups (via `/usr/local/nagiosxi/scripts/backup_xi.sh`).
It of course just takes a really long time (again due to size of data being backed up).
In reviewing the forums, I see mention of some tuning parameters available (EG: backup_timeout).
I also recall seeing a note that for large deployments, additional tuning was required (not sure if config.inc.php, or php.ini)?
For sake of calibration, `/usr/local/` in our large deployment tends to average around 100GB at present.
This covers ~4.2K ACTIVE checks (host up/down via check_icmp), and ~42,300 PASSIVE checks (via NRDP and check_dummy).
That mount is on its own SAN disk, and we are in the process of working in RAMDisks in our ENVs (in progress).
Thoughts on how we might tune the APP parameters to permit backups against ~100GB+ data mount for XI installations?
Perhaps it would be better to switch to initiating CLI level backups (EG: via cron or scheduled ansible execution), with some sort of built-in clean-up (purge anything older than X days)?
Nagios XI 5.8.x release series (5.8.6 as of tonight)
MariaDB 5.5.64 (off-box for this ENV)
Thank you,
- Rowan
We maintain several Nagios XI instances within our ENVs (some large deployments, some small).
For most of our ENVs, we can rely on either the built-in XI level "Scheduled Backups (Local Backups)" facility.
This works well (for the most part), in that it seems to have a built-in "syslog rotate" type facility (so as to keep the backup store from becoming burdensome).
That said, it would be nice to book for more than one backup (EG: ideal would be twice daily, before start of business & after close of business).
Anyway, while all this works well for our small && medium sized deployments, this does not work for our large deployment.
Specifically, the XI level backup facility strikes upon some sort of built-in "backup timeout" value, and bails.
Most likely this is due to the quantity of data being backed up.
Interestingly, this does not appear to be a problem for manually initiated CLI level backups (via `/usr/local/nagiosxi/scripts/backup_xi.sh`).
It of course just takes a really long time (again due to size of data being backed up).
In reviewing the forums, I see mention of some tuning parameters available (EG: backup_timeout).
I also recall seeing a note that for large deployments, additional tuning was required (not sure if config.inc.php, or php.ini)?
For sake of calibration, `/usr/local/` in our large deployment tends to average around 100GB at present.
This covers ~4.2K ACTIVE checks (host up/down via check_icmp), and ~42,300 PASSIVE checks (via NRDP and check_dummy).
That mount is on its own SAN disk, and we are in the process of working in RAMDisks in our ENVs (in progress).
Thoughts on how we might tune the APP parameters to permit backups against ~100GB+ data mount for XI installations?
Perhaps it would be better to switch to initiating CLI level backups (EG: via cron or scheduled ansible execution), with some sort of built-in clean-up (purge anything older than X days)?
Nagios XI 5.8.x release series (5.8.6 as of tonight)
MariaDB 5.5.64 (off-box for this ENV)
Thank you,
- Rowan