Enormous Daily files in /usr/local/nagios/var/archives (~1G)

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
wrobj0
Posts: 17
Joined: Fri Dec 20, 2019 2:47 pm

Enormous Daily files in /usr/local/nagios/var/archives (~1G)

Post by wrobj0 »

I have an issue with files in /usr/local/nagios/var/archives consuming an enormous amount of disk space.

In October, files were around 275M each day, and by May, the log files had grown to nearly 1G, now they are 1.1G per day.

Installed Version: 5.6.14
Red Hat Enterprise Linux Server release 7.8 (Maipo) on VMWare

We have 320 hosts, with 6520 services being checked. We have not had a threefold increase in hosts or services, nor have we increased the frequency of the checks being performed.

I'm not really sure how to begin troubleshooting this issue, but I've looked in the log file, and I've found a number of hosts that have long since been removed from Nagios that appear in each day's log file. The last timestamp on one of the hosts is 1607061600 (very annoying that this logs using the epoch instead of standard times, or, preferably, ISO8601), and that converts to Fri Dec 04 2020 00:00:00 GMT-0600 (Central Standard Time).

If I search the system for files with the names of these systems, I find them in /usr/local/nagios/share/perfdata/. Is this log rewriting all of the data from that directory every day? If so, why is the entry in perfdata not removed when we delete the host and its services from the system? Is there an additional step we should be taking?
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Enormous Daily files in /usr/local/nagios/var/archives (

Post by ssax »

Please PM me a copy of your profile, you can download it from Admin > System Profile by clicking the Download Profile button.

Please compress one of the large archive files so I can see what's consuming the space:

Code: Select all

GZIP=-9 tar czvf /tmp/archive.tar.gz /usr/local/nagios/var/archives/nagios-06-15-2021-00.log
Then PM me the resulting /tmp/archive.tar.gz file.

The /usr/local/nagios/share/perfdata files are where the graphing data is stored (in the RRD files), they are not auto-cleaned up from that directory.

You could do this to clean them up:

https://support.nagios.com/kb/article/n ... s-854.html
wrobj0
Posts: 17
Joined: Fri Dec 20, 2019 2:47 pm

Re: Enormous Daily files in /usr/local/nagios/var/archives (

Post by wrobj0 »

Any progress on identifying the cause? Were you able to download the files from the links I provided?
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Enormous Daily files in /usr/local/nagios/var/archives (

Post by ssax »

A lot of your checks are failing, they are timing out which causes them to be rechecked more often because of the retry_interval resulting in a ton of logs.

What is the output of these commands as root:

Code: Select all

ulimit -a
su -s /bin/bash -c 'ulimit -a' nagios
su -s /bin/bash -c 'ulimit -a' mysql
sysctl -p
chage -l nagios
grep standard /var/lib/pgsql/data/postgresql.conf
Additionally, please send the output of these commands:
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the first command if your DB is offloaded to another server and/or you've changed the root mysql password

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table

echo "SELECT relname as Table, pg_size_pretty(pg_total_relation_size(relid)) As Size, pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as ExternalSize FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;" | psql nagiosxi nagiosxi
wrobj0
Posts: 17
Joined: Fri Dec 20, 2019 2:47 pm

Re: Enormous Daily files in /usr/local/nagios/var/archives (

Post by wrobj0 »

Okay. That was a concern of mine. These are definitely checks on existing hosts only, right? I'm still not sure why we're seeing hosts that have been deleted in the archives files.

Here are the outputs you requested.

Code: Select all

[root@nagiosxi ~]# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 31192
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 10000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 31192
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited


[root@nagiosxi ~]# su -s /bin/bash -c 'ulimit -a' nagios
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 31192
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 10000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited


[root@nagiosxi ~]# su -s /bin/bash -c 'ulimit -a' mysql
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 31192
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 10000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited


[root@nagiosxi ~]# sysctl -p
kernel.msgmnb = 131072000
kernel.msgmax = 131072000
kernel.shmmax = 4294967295
kernel.shmall = 268435456


[root@nagiosxi ~]# chage -l nagios
Last password change					: May 01, 2020
Password expires					: never
Password inactive					: never
Account expires						: never
Minimum number of days between password change		: 0
Maximum number of days between password change		: 99999
Number of days of warning before password expires	: 7


[root@nagiosxi ~]# grep standard /var/lib/pgsql/data/postgresql.conf
#standard_conforming_strings = on

Code: Select all

# echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table
+--------------------------------------------+------------+
| Table                                      | Size in MB |
+--------------------------------------------+------------+
| nagios_acknowledgements                    |       0.04 |
| nagios_commands                            |       0.02 |
| nagios_commenthistory                      |     295.40 |
| nagios_comments                            |       0.00 |
| nagios_configfiles                         |       0.00 |
| nagios_configfilevariables                 |       0.01 |
| nagios_conninfo                            |       0.48 |
| nagios_contact_addresses                   |       0.00 |
| nagios_contact_notificationcommands        |       0.03 |
| nagios_contactgroup_members                |       0.01 |
| nagios_contactgroups                       |       0.00 |
| nagios_contactnotificationmethods          |       8.97 |
| nagios_contactnotifications                |       9.43 |
| nagios_contacts                            |       0.01 |
| nagios_contactstatus                       |       0.00 |
| nagios_customvariables                     |       0.49 |
| nagios_customvariablestatus                |       0.52 |
| nagios_dbversion                           |       0.00 |
| nagios_downtimehistory                     |       9.09 |
| nagios_eventhandlers                       |       0.01 |
| nagios_externalcommands                    |       0.01 |
| nagios_flappinghistory                     |       7.08 |
| nagios_host_contactgroups                  |       0.02 |
| nagios_host_contacts                       |       0.00 |
| nagios_host_parenthosts                    |       0.00 |
| nagios_hostchecks                          |       0.00 |
| nagios_hostdependencies                    |       0.00 |
| nagios_hostescalation_contactgroups        |       0.00 |
| nagios_hostescalation_contacts             |       0.00 |
| nagios_hostescalations                     |       0.00 |
| nagios_hostgroup_members                   |       0.01 |
| nagios_hostgroups                          |       0.00 |
| nagios_hosts                               |       0.07 |
| nagios_hoststatus                          |       0.17 |
| nagios_instances                           |       0.00 |
| nagios_logentries                          |     346.42 |
| nagios_notifications                       |      15.34 |
| nagios_objects                             |       1.24 |
| nagios_processevents                       |       0.26 |
| nagios_programstatus                       |       0.00 |
| nagios_runtimevariables                    |       0.00 |
| nagios_scheduleddowntime                   |       0.00 |
| nagios_service_contactgroups               |       0.27 |
| nagios_service_contacts                    |       0.03 |
| nagios_service_parentservices              |       0.00 |
| nagios_servicechecks                       |       0.00 |
| nagios_servicedependencies                 |       0.00 |
| nagios_serviceescalation_contactgroups     |       0.00 |
| nagios_serviceescalation_contacts          |       0.00 |
| nagios_serviceescalations                  |       0.00 |
| nagios_servicegroup_members                |       0.00 |
| nagios_servicegroups                       |       0.00 |
| nagios_services                            |       1.33 |
| nagios_servicestatus                       |       3.20 |
| nagios_statehistory                        |     469.92 |
| nagios_systemcommands                      |       0.04 |
| nagios_timedeventqueue                     |       0.00 |
| nagios_timedevents                         |       0.00 |
| nagios_timeperiod_timeranges               |       0.01 |
| nagios_timeperiods                         |       0.00 |
| tbl_command                                |       0.04 |
| tbl_contact                                |       0.01 |
| tbl_contactgroup                           |       0.01 |
| tbl_contacttemplate                        |       0.01 |
| tbl_domain                                 |       0.01 |
| tbl_host                                   |       0.07 |
| tbl_hostdependency                         |       0.00 |
| tbl_hostescalation                         |       0.00 |
| tbl_hostextinfo                            |       0.00 |
| tbl_hostgroup                              |       0.01 |
| tbl_hosttemplate                           |       0.02 |
| tbl_info                                   |       0.13 |
| tbl_lnkContactToCommandHost                |       0.00 |
| tbl_lnkContactToCommandService             |       0.00 |
| tbl_lnkContactToContactgroup               |       0.00 |
| tbl_lnkContactToContacttemplate            |       0.00 |
| tbl_lnkContactToVariabledefinition         |       0.00 |
| tbl_lnkContactgroupToContact               |       0.00 |
| tbl_lnkContactgroupToContactgroup          |       0.00 |
| tbl_lnkContacttemplateToCommandHost        |       0.00 |
| tbl_lnkContacttemplateToCommandService     |       0.00 |
| tbl_lnkContacttemplateToContactgroup       |       0.00 |
| tbl_lnkContacttemplateToContacttemplate    |       0.00 |
| tbl_lnkContacttemplateToVariabledefinition |       0.00 |
| tbl_lnkHostToContact                       |       0.00 |
| tbl_lnkHostToContactgroup                  |       0.01 |
| tbl_lnkHostToHost                          |       0.00 |
| tbl_lnkHostToHostgroup                     |       0.00 |
| tbl_lnkHostToHosttemplate                  |       0.01 |
| tbl_lnkHostToVariabledefinition            |       0.01 |
| tbl_lnkHostdependencyToHost_DH             |       0.00 |
| tbl_lnkHostdependencyToHost_H              |       0.00 |
| tbl_lnkHostdependencyToHostgroup_DH        |       0.00 |
| tbl_lnkHostdependencyToHostgroup_H         |       0.00 |
| tbl_lnkHostescalationToContact             |       0.00 |
| tbl_lnkHostescalationToContactgroup        |       0.00 |
| tbl_lnkHostescalationToHost                |       0.00 |
| tbl_lnkHostescalationToHostgroup           |       0.00 |
| tbl_lnkHostgroupToHost                     |       0.01 |
| tbl_lnkHostgroupToHostgroup                |       0.00 |
| tbl_lnkHosttemplateToContact               |       0.00 |
| tbl_lnkHosttemplateToContactgroup          |       0.00 |
| tbl_lnkHosttemplateToHost                  |       0.00 |
| tbl_lnkHosttemplateToHostgroup             |       0.00 |
| tbl_lnkHosttemplateToHosttemplate          |       0.00 |
| tbl_lnkHosttemplateToVariabledefinition    |       0.00 |
| tbl_lnkServiceToContact                    |       0.02 |
| tbl_lnkServiceToContactgroup               |       0.13 |
| tbl_lnkServiceToHost                       |       0.16 |
| tbl_lnkServiceToHostgroup                  |       0.00 |
| tbl_lnkServiceToServicegroup               |       0.00 |
| tbl_lnkServiceToServicetemplate            |       0.22 |
| tbl_lnkServiceToVariabledefinition         |       0.16 |
| tbl_lnkServicedependencyToHost_DH          |       0.00 |
| tbl_lnkServicedependencyToHost_H           |       0.00 |
| tbl_lnkServicedependencyToHostgroup_DH     |       0.00 |
| tbl_lnkServicedependencyToHostgroup_H      |       0.00 |
| tbl_lnkServicedependencyToService_DS       |       0.00 |
| tbl_lnkServicedependencyToService_S        |       0.00 |
| tbl_lnkServicedependencyToServicegroup_DS  |       0.02 |
| tbl_lnkServicedependencyToServicegroup_S   |       0.02 |
| tbl_lnkServiceescalationToContact          |       0.00 |
| tbl_lnkServiceescalationToContactgroup     |       0.00 |
| tbl_lnkServiceescalationToHost             |       0.00 |
| tbl_lnkServiceescalationToHostgroup        |       0.00 |
| tbl_lnkServiceescalationToService          |       0.00 |
| tbl_lnkServiceescalationToServicegroup     |       0.02 |
| tbl_lnkServicegroupToService               |       0.00 |
| tbl_lnkServicegroupToServicegroup          |       0.00 |
| tbl_lnkServicetemplateToContact            |       0.00 |
| tbl_lnkServicetemplateToContactgroup       |       0.00 |
| tbl_lnkServicetemplateToHost               |       0.00 |
| tbl_lnkServicetemplateToHostgroup          |       0.00 |
| tbl_lnkServicetemplateToServicegroup       |       0.00 |
| tbl_lnkServicetemplateToServicetemplate    |       0.01 |
| tbl_lnkServicetemplateToVariabledefinition |       0.00 |
| tbl_lnkTimeperiodToTimeperiod              |       0.00 |
| tbl_logbook                                |       0.00 |
| tbl_mainmenu                               |       0.00 |
| tbl_permission                             |       0.02 |
| tbl_permission_inactive                    |       0.02 |
| tbl_service                                |       1.24 |
| tbl_servicedependency                      |       0.00 |
| tbl_serviceescalation                      |       0.00 |
| tbl_serviceextinfo                         |       0.00 |
| tbl_servicegroup                           |       0.01 |
| tbl_servicetemplate                        |       0.02 |
| tbl_session                                |       0.01 |
| tbl_session_locks                          |       0.00 |
| tbl_settings                               |       0.00 |
| tbl_submenu                                |       0.00 |
| tbl_timedefinition                         |       0.01 |
| tbl_timeperiod                             |       0.01 |
| tbl_user                                   |       0.01 |
| tbl_variabledefinition                     |       0.47 |
| xi_auditlog                                |       0.08 |
| xi_auth_tokens                             |       0.03 |
| xi_cmp_trapdata                            |       0.03 |
| xi_cmp_trapdata_log                        |       0.03 |
| xi_commands                                |       0.02 |
| xi_eventqueue                              |       0.03 |
| xi_events                                  |       0.05 |
| xi_meta                                    |       0.02 |
| xi_mibs                                    |       0.05 |
| xi_options                                 |       0.03 |
| xi_sessions                                |       0.03 |
| xi_sysstat                                 |       0.03 |
| xi_usermeta                                |       0.05 |
| xi_users                                   |       0.03 |
+--------------------------------------------+------------+

Code: Select all

# echo "SELECT relname as Table, pg_size_pretty(pg_total_relation_size(relid)) As Size, pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as ExternalSize FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;" | psql nagiosxi nagiosxi
        table        |  size   | externalsize 
---------------------+---------+--------------
 xi_meta             | 133 MB  | 119 MB
 xi_events           | 60 MB   | 60 MB
 xi_auth_tokens      | 4264 kB | 3864 kB
 xi_auditlog         | 1480 kB | 1000 kB
 xi_usermeta         | 360 kB  | 232 kB
 xi_commands         | 128 kB  | 72 kB
 xi_sysstat          | 104 kB  | 72 kB
 xi_options          | 104 kB  | 72 kB
 xi_users            | 72 kB   | 64 kB
 xi_mibs             | 72 kB   | 64 kB
 xi_sessions         | 40 kB   | 40 kB
 xi_eventqueue       | 32 kB   | 24 kB
 xi_cmp_trapdata     | 24 kB   | 24 kB
 xi_cmp_trapdata_log | 16 kB   | 16 kB
 xi_incidents        | 0 bytes | 0 bytes
(15 rows)
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Enormous Daily files in /usr/local/nagios/var/archives (

Post by ssax »

Please edit this file:

Code: Select all

/var/lib/pgsql/data/postgresql.conf
Change this:

Code: Select all

#standard_conforming_strings = on
To this:

Code: Select all

standard_conforming_strings = off
Then restart the postgresql/httpd/crond services:

Code: Select all

systemctl restart postgresql httpd crond
Then run this command:

Code: Select all

echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | psql nagiosxi nagiosxi
Let me know if that resolves the issue with the old hosts showing up.

If it doesn't, try doing this as well:

Please go to Configure > Core Config Manager > Tools > Config File Management:
- Click the Delete Files button (don't worry, it's safe, they will be rewritten)
- Then click the Write Configs button
- Then apply configuration
wrobj0
Posts: 17
Joined: Fri Dec 20, 2019 2:47 pm

Re: Enormous Daily files in /usr/local/nagios/var/archives (

Post by wrobj0 »

Unfortunately, even after rewriting the config files, servers that no longer exist are appearing in the archive files. And the files are still the same size, 1.1GB as of today.
gsmith
Posts: 1253
Joined: Tue Mar 02, 2021 11:15 am

Re: Enormous Daily files in /usr/local/nagios/var/archives (

Post by gsmith »

Hi

From a command line:

Code: Select all

mysql -u root -p nagios;
select display_name from nagios_hosts;
Do you see the "old" hosts in that list?

Thanks
wrobj0
Posts: 17
Joined: Fri Dec 20, 2019 2:47 pm

Re: Enormous Daily files in /usr/local/nagios/var/archives (

Post by wrobj0 »

I do not.

As an example, I see this in the log file:

Code: Select all

[1596036618] Warning: Check of host 'centos3' timed out after 30.00 seconds
[1596036678] wproc:   host=centos3; service=(null);
And that does not exist in the display_name column.

Code: Select all

MariaDB [nagios]> select * from nagios_hosts where display_name like '%centos%' ;
Empty set (0.01 sec)
But the same query syntax will find "localhost," which exists there.

Code: Select all

MariaDB [nagios]> select * from nagios_hosts where display_name like '%local%' ;
...
-------+----------------+------+------+----------------+------+------+------+------------+
1 row in set (0.00 sec)
gsmith
Posts: 1253
Joined: Tue Mar 02, 2021 11:15 am

Re: Enormous Daily files in /usr/local/nagios/var/archives (

Post by gsmith »

Hi

From a command line:

Code: Select all

mysql -u root -p nagios;
select name1 from nagios_objects;
Do you see the "old" hosts in that list?

Thanks
Locked