POSTGRES_RELATION_SIZE WARNING: DB "nagiosxi"

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
TBT
Posts: 625
Joined: Wed May 18, 2011 1:26 pm

POSTGRES_RELATION_SIZE WARNING: DB "nagiosxi"

Post by TBT »

In the past we've had this same POSTGRES_RELATION_SIZE WARNING, but this time around the vacuum solution didn't work. One change to the system is that we recently upgraded from XI 5.6.7 to 5.7.5 (Still on CentOS 6), today one of the XI hosts started to warn:
POSTGRES_RELATION_SIZE WARNING: DB "nagiosxi" (host:127.0.0.1) largest relation is table "public.xi_auditlog": 51 MB
There are no errors in /usr/local/nagiosxi/var/dbmaint.log and ran a vacuum on the postgres database with no resolve, please advise.
Nagios XI 2024R2.2.1 (8 Servers)
Nagios Fusion 2024R1.0.2
dchurch
Posts: 858
Joined: Wed Oct 07, 2020 12:46 pm
Location: Yo mama

Re: POSTGRES_RELATION_SIZE WARNING: DB "nagiosxi"

Post by dchurch »

It's normal for xi_auditlog to increase in size as the number of hosts and services increases and Nagios XI monitors more and more things. It's not unheard of that a system monitoring around 1000 hosts to have xi_auditlog balloon out to gigabytes, at which point it becomes a problem.

Decreasing the amount of time the audit logs stay in the database can help decrease the size of "paper trail" type data tables such as xi_auditlog.
The Max Audit Log Age setting is controlled through the admin screens. You can get to it through Admin => System Config -> Performance Settings, then click on the Database tab. The default in Nagios XI 5.7.3 and later is 180 days.

Here's some good defaults for keeping larger systems from becoming bogged down with excessive "paper trail" type data:
- Max Log Entries Age: change to 10
- Max Audit Log Age: change to 10
- Max State History Age: change to 30

It might take up to a day for the "cleaner" process to run depending on how your system is configured, but it'll eventually run and clean your database of all these for you.

See this document: Nagios XI Database Optimization
If you didn't get an 8% raise over the course of the pandemic, you took a pay cut.

Discussion of wages is protected speech under the National Labor Relations Act, and no employer can tell you you can't disclose your pay with your fellow employees.
TBT
Posts: 625
Joined: Wed May 18, 2011 1:26 pm

Re: POSTGRES_RELATION_SIZE WARNING: DB "nagiosxi"

Post by TBT »

dchurch wrote:It's normal for xi_auditlog to increase in size as the number of hosts and services increases and Nagios XI monitors more and more things. It's not unheard of that a system monitoring around 1000 hosts to have xi_auditlog balloon out to gigabytes, at which point it becomes a problem.

Decreasing the amount of time the audit logs stay in the database can help decrease the size of "paper trail" type data tables such as xi_auditlog.
The Max Audit Log Age setting is controlled through the admin screens. You can get to it through Admin => System Config -> Performance Settings, then click on the Database tab. The default in Nagios XI 5.7.3 and later is 180 days.

Here's some good defaults for keeping larger systems from becoming bogged down with excessive "paper trail" type data:
- Max Log Entries Age: change to 10
- Max Audit Log Age: change to 10
- Max State History Age: change to 30

It might take up to a day for the "cleaner" process to run depending on how your system is configured, but it'll eventually run and clean your database of all these for you.

See this document: Nagios XI Database Optimization
1. This particular server is low utilization with only 43 Hosts / 51 Services but it's hitting 56MB. Comparing to another server with 1613 Hosts / 7215 Services I'm seeing 25MB.

2. Max Audit Log Age is currently set to 30. Below are the current settings which are applied to all our XI servers:
  • Max Audit Log Age: 30
    Max Log Entries Age: 90
    Max State History Age: 730
3. Keeping in mind the vacuum has always worked in the past, this status has been in warning for 23h 26m 57s, how much longer should we wait?

4. This Postgres Table Size service check is set for --warning=50MB --critical=100MB, are these values to low?
Nagios XI 2024R2.2.1 (8 Servers)
Nagios Fusion 2024R1.0.2
dchurch
Posts: 858
Joined: Wed Oct 07, 2020 12:46 pm
Location: Yo mama

Re: POSTGRES_RELATION_SIZE WARNING: DB "nagiosxi"

Post by dchurch »

Did you make changes to your monitoring configuration recently? Did you add a bunch of hosts and/or services to the mix?

The audit log, in addition to storing security audits, also stores metadata about when a host goes down and why; if you added a lot of hosts it would balloon up the size of xi_auditlog.

You could try increasing the threshold to meet with your current monitoring load.

See this video for more on what the audit log is used for: https://www.youtube.com/watch?v=diDDm3BTQoQ
If you didn't get an 8% raise over the course of the pandemic, you took a pay cut.

Discussion of wages is protected speech under the National Labor Relations Act, and no employer can tell you you can't disclose your pay with your fellow employees.
TBT
Posts: 625
Joined: Wed May 18, 2011 1:26 pm

Re: POSTGRES_RELATION_SIZE WARNING: DB "nagiosxi"

Post by TBT »

dchurch wrote:Did you make changes to your monitoring configuration recently? Did you add a bunch of hosts and/or services to the mix?

The audit log, in addition to storing security audits, also stores metadata about when a host goes down and why; if you added a lot of hosts it would balloon up the size of xi_auditlog.

You could try increasing the threshold to meet with your current monitoring load.

See this video for more on what the audit log is used for: https://www.youtube.com/watch?v=diDDm3BTQoQ
No recent config changes, however, as indicated in the initial post we upgraded from XI 5.6.7 to 5.7.5. A few days later the warning started to appear on 1 of 9 XI Hosts.

Comparing this affected low utilization XI host with to another high usage host. The Audit Log has 25,426 total records (62MB) vs 52,189 total records (32MB).

What would you recommend the Postgres Table Sizes service check values (--warning=50MB --critical=100MB) be set to?
Nagios XI 2024R2.2.1 (8 Servers)
Nagios Fusion 2024R1.0.2
dchurch
Posts: 858
Joined: Wed Oct 07, 2020 12:46 pm
Location: Yo mama

Re: POSTGRES_RELATION_SIZE WARNING: DB "nagiosxi"

Post by dchurch »

TBT wrote:Comparing this affected low utilization XI host with to another high usage host. The Audit Log has 25,426 total records (62MB) vs 52,189 total records (32MB).
This makes me think that perhaps the database maintenance isn't running.

Try running these commands as root and post the output:

Code: Select all

rm -f /usr/local/nagiosxi/var/dbmaint.lock
time php /usr/local/nagiosxi/cron/dbmaint.php
If you didn't get an 8% raise over the course of the pandemic, you took a pay cut.

Discussion of wages is protected speech under the National Labor Relations Act, and no employer can tell you you can't disclose your pay with your fellow employees.
TBT
Posts: 625
Joined: Wed May 18, 2011 1:26 pm

Re: POSTGRES_RELATION_SIZE WARNING: DB "nagiosxi"

Post by TBT »

dchurch wrote:
TBT wrote:Comparing this affected low utilization XI host with to another high usage host. The Audit Log has 25,426 total records (62MB) vs 52,189 total records (32MB).
This makes me think that perhaps the database maintenance isn't running.

Try running these commands as root and post the output:

Code: Select all

rm -f /usr/local/nagiosxi/var/dbmaint.lock
time php /usr/local/nagiosxi/cron/dbmaint.php
I ask again, what would you recommend the Postgres Table Sizes service check values (--warning=50MB --critical=100MB) be set to, perhaps these values are too low?

Over the weekend it jumped from 62MB to 80MB, and the high usage host we're comparing to is now in warning up from 32MB to 76MB.

No dbmaint.lock present, below is the output.

Code: Select all

CREATING: /usr/local/nagiosxi/var/dbmaint.lock
CLEANING ndoutils TABLE 'commenthistory'...
SQL: DELETE FROM nagios_commenthistory WHERE entry_time < FROM_UNIXTIME(1561207658)
CLEANING ndoutils TABLE 'processevents'...
SQL: DELETE FROM nagios_processevents WHERE event_time < FROM_UNIXTIME(1592743658)
CLEANING ndoutils TABLE 'externalcommands'...
SQL: DELETE FROM nagios_externalcommands WHERE entry_time < FROM_UNIXTIME(1623674858)
CLEANING ndoutils TABLE 'logentries'...
SQL: DELETE FROM nagios_logentries WHERE logentry_time < FROM_UNIXTIME(1616503658)
CLEANING ndoutils TABLE 'notifications'...
SQL: DELETE FROM nagios_notifications WHERE start_time < FROM_UNIXTIME(1616503658)
CLEANING ndoutils TABLE 'contactnotifications'...
SQL: DELETE FROM nagios_contactnotifications WHERE start_time < FROM_UNIXTIME(1616503658)
CLEANING ndoutils TABLE 'contactnotificationmethods'...
SQL: DELETE FROM nagios_contactnotificationmethods WHERE start_time < FROM_UNIXTIME(1616503658)
CLEANING ndoutils TABLE 'statehistory'...
SQL: DELETE FROM nagios_statehistory WHERE state_time < FROM_UNIXTIME(1561207658)
CLEANING ndoutils TABLE 'timedevents'...
SQL: DELETE FROM nagios_timedevents WHERE event_time < FROM_UNIXTIME(1624279358)
CLEANING ndoutils TABLE 'systemcommands'...
SQL: DELETE FROM nagios_systemcommands WHERE start_time < FROM_UNIXTIME(1624279358)
CLEANING ndoutils TABLE 'servicechecks'...
SQL: DELETE FROM nagios_servicechecks WHERE start_time < FROM_UNIXTIME(1624279358)
CLEANING ndoutils TABLE 'hostchecks'...
SQL: DELETE FROM nagios_hostchecks WHERE start_time < FROM_UNIXTIME(1624279358)
CLEANING ndoutils TABLE 'eventhandlers'...
SQL: DELETE FROM nagios_eventhandlers WHERE start_time < FROM_UNIXTIME(1624279358)
LASTOPT:  1624276201
INTERVAL: 60
NOW:      1624279658
OPTTIME:  1624279801
CLEANING nagiosxi TABLE 'commands'...
SQL: DELETE FROM xi_commands WHERE processing_time < 1624250858::abstime::timestamp without time zone AND status_code = 2
CLEANING nagiosxi TABLE 'events'...
SQL: DELETE FROM xi_events WHERE processing_time < 1624250858::abstime::timestamp without time zone AND status_code = 2
CLEANING nagiosxi TABLE 'auth_tokens'...
SQL: DELETE FROM xi_auth_tokens WHERE auth_valid_until < 1624193258::abstime::timestamp without time zone
CLEANING nagiosxi TABLE 'cmp_trapdata_log'...
SQL: DELETE FROM xi_cmp_trapdata_log WHERE trapdata_log_datetime < 1616503658::abstime::timestamp without time zone
SQL1: SELECT xi_meta.meta_id FROM xi_meta LEFT JOIN xi_events ON xi_meta.metaobj_id=xi_events.event_id WHERE metatype_id='1' AND event_id IS NULL
SQL2: Deleted 0 (DELETE FROM xi_meta WHERE meta_id IN (SELECT xi_meta.meta_id FROM xi_meta LEFT JOIN xi_events ON xi_meta.metaobj_id=xi_events.event_id WHERE metatype_id='1' AND event_id IS NULL))
CLEANING nagiosxi TABLE 'auditlog'...
SQL: DELETE FROM xi_auditlog WHERE log_time < 1621687658::abstime::timestamp without time zone
CLEANING nagiosql TABLE 'logbook'...
SQL: DELETE FROM tbl_logbook WHERE time < FROM_UNIXTIME(1624250858)




Repair Complete: Removing Lock File

real    0m0.207s
user    0m0.137s
sys     0m0.038s
Nagios XI 2024R2.2.1 (8 Servers)
Nagios Fusion 2024R1.0.2
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: POSTGRES_RELATION_SIZE WARNING: DB "nagiosxi"

Post by ssax »

Your xi_auditlog table could get pretty large depending on how active your system is and how long you're storing the logs for in Admin > Performance Settings > Database, you would need to increase the --warning=50MB --critical=100MB thresholds to be higher than however large your xi_auditlog table gets to be based on the retention settings.

In my opinion this check isn't really that useful as the auditlog and trap tables could grow to multiple GB so a single threshold to check all DB table sizes could be upwards of 10GB (again, depending on how active system is and your retention settings).

I would probably modify your service check and do this:

Code: Select all

--exclude=xi_auditlog,xi_cmp_trapdata,xi_cmp_trapdata_log
Then I would adjust the thresholds to:

Code: Select all

--warning=450MB --critical=500MB
Then you could setup another service to monitor those other large tables and do like:
- This would need to be adjusted to whatever your retention settings allow the size to grow to

Code: Select all

--warning=2GB --critical=3GB
TBT
Posts: 625
Joined: Wed May 18, 2011 1:26 pm

Re: POSTGRES_RELATION_SIZE WARNING: DB "nagiosxi"

Post by TBT »

ssax wrote:Your xi_auditlog table could get pretty large depending on how active your system is and how long you're storing the logs for in Admin > Performance Settings > Database, you would need to increase the --warning=50MB --critical=100MB thresholds to be higher than however large your xi_auditlog table gets to be based on the retention settings.

In my opinion this check isn't really that useful as the auditlog and trap tables could grow to multiple GB so a single threshold to check all DB table sizes could be upwards of 10GB (again, depending on how active system is and your retention settings).

I would probably modify your service check and do this:

Code: Select all

--exclude=xi_auditlog,xi_cmp_trapdata,xi_cmp_trapdata_log
Then I would adjust the thresholds to:

Code: Select all

--warning=450MB --critical=500MB
Then you could setup another service to monitor those other large tables and do like:
- This would need to be adjusted to whatever your retention settings allow the size to grow to

Code: Select all

--warning=2GB --critical=3GB
I've gone ahead and modified the current check.

Before:

Code: Select all

check_xi_postgres_db!-H 127.0.0.1 --port=5432 --dbuser=nagiosxi --dbname=nagiosxi --action=relation_size --warning=50MB --critical=100MB
After:

Code: Select all

check_xi_postgres_db!-H 127.0.0.1 --port=5432 --dbuser=nagiosxi --dbname=nagiosxi --action=relation_size --warning=450MB --critical=500MB --exclude=xi_auditlog,xi_cmp_trapdata,xi_cmp_trapdata_log
Moving forward we'll also consider an additional check as per your suggestion. Thank you.
Nagios XI 2024R2.2.1 (8 Servers)
Nagios Fusion 2024R1.0.2
dchurch
Posts: 858
Joined: Wed Oct 07, 2020 12:46 pm
Location: Yo mama

Re: POSTGRES_RELATION_SIZE WARNING: DB "nagiosxi"

Post by dchurch »

Let us know if you have any related questions or if we're okay to lock this up and mark it as resolved.
If you didn't get an 8% raise over the course of the pandemic, you took a pay cut.

Discussion of wages is protected speech under the National Labor Relations Act, and no employer can tell you you can't disclose your pay with your fellow employees.
Locked