Page 1 of 2

LOG server running out of space, but this is not Indexes

Posted: Mon May 31, 2021 1:17 pm
by dlukinski
Hello Nagios Log team

Our Log Server is running out of space (500 GB volume, only 2 months of rather small indexes, backups forwarded to the separate NFS volume)
Not sure where all the usage is coming from and need your advice fast

Thank you

Re: LOG server running out of space, but this is not Indexes

Posted: Tue Jun 01, 2021 3:37 pm
by ssax
What version of Log Server are you using? You can grab it from the bottom left hand side of the web interface.

Please PM me a copy of your profile, you can download it from Admin > System Status by clicking the Download System Profile button.

Include the output of these commands in that PM:

Code: Select all

df -h
df -i
uname -a
cat /etc/*release

Re: LOG server running out of space, but this is not Indexes

Posted: Thu Jun 03, 2021 10:23 am
by dlukinski
ssax wrote:What version of Log Server are you using? You can grab it from the bottom left hand side of the web interface.

Please PM me a copy of your profile, you can download it from Admin > System Status by clicking the Download System Profile button.

Include the output of these commands in that PM:

Code: Select all

df -h
df -i
uname -a
cat /etc/*release
Please see the output as follow:

Code: Select all

/dev/sda1                                                       976M  284M  626M                                                                        32% /boot
fikc-isilon01.res.kcg.global:/ifs/data/fikc-nagxiprod01-backup  300G  239G   62G                                                                        80% /mnt/nfs/backup
tmpfs                                                           3.2G     0  3.2G                                                                         0% /run/user/1000
tmpfs                                                           3.2G     0  3.2G                                                                         0% /run/user/48
tmpfs                                                           3.2G     0  3.2G                                                                         0% /run/user/0
[root@fikc-naglsprod11 ~]# df -h
Filesystem                                                      Size  Used Avail Use% Mounted on
devtmpfs                                                         16G     0   16G   0% /dev
tmpfs                                                            16G  220K   16G   1% /dev/shm
tmpfs                                                            16G  1.6G   15G  11% /run
tmpfs                                                            16G     0   16G   0% /sys/fs/cgroup
/dev/mapper/centos-root                                         490G  427G   42G  92% /
/dev/sda1                                                       976M  284M  626M  32% /boot
fikc-isilon01.res.kcg.global:/ifs/data/fikc-nagxiprod01-backup  300G  239G   62G  80% /mnt/nfs/backup
tmpfs                                                           3.2G     0  3.2G   0% /run/user/1000
tmpfs                                                           3.2G     0  3.2G   0% /run/user/48
tmpfs                                                           3.2G     0  3.2G   0% /run/user/0
[root@fikc-naglsprod11 ~]# clear
[root@fikc-naglsprod11 ~]# df -h
Filesystem                                                      Size  Used Avail Use% Mounted on
devtmpfs                                                         16G     0   16G   0% /dev
tmpfs                                                            16G  220K   16G   1% /dev/shm
tmpfs                                                            16G  1.6G   15G  11% /run
tmpfs                                                            16G     0   16G   0% /sys/fs/cgroup
/dev/mapper/centos-root                                         490G  428G   42G  92% /
/dev/sda1                                                       976M  284M  626M  32% /boot
fikc-isilon01.res.kcg.global:/ifs/data/fikc-nagxiprod01-backup  300G  239G   62G  80% /mnt/nfs/backup
tmpfs                                                           3.2G     0  3.2G   0% /run/user/1000
tmpfs                                                           3.2G     0  3.2G   0% /run/user/48
tmpfs                                                           3.2G     0  3.2G   0% /run/user/0
[root@fikc-naglsprod11 ~]# df -i
Filesystem                                                        Inodes  IUsed     IFree IUse% Mounted on
devtmpfs                                                         4093495    397   4093098    1% /dev
tmpfs                                                            4096421     13   4096408    1% /dev/shm
tmpfs                                                            4096421   1298   4095123    1% /run
tmpfs                                                            4096421     16   4096405    1% /sys/fs/cgroup
/dev/mapper/centos-root                                         32571392 131408  32439984    1% /
/dev/sda1                                                          65536    365     65171    1% /boot
fikc-isilon01.res.kcg.global:/ifs/data/fikc-nagxiprod01-backup 629145600    221 629145379    1% /mnt/nfs/backup
tmpfs                                                            4096421      1   4096420    1% /run/user/1000
tmpfs                                                            4096421      1   4096420    1% /run/user/0
[root@fikc-naglsprod11 ~]# uname -a
Linux fikc-naglsprod11 3.10.0-1160.15.2.el7.x86_64 #1 SMP Wed Feb 3 15:06:38 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[root@fikc-naglsprod11 ~]# cat /etc/*release
CentOS Linux release 7.9.2009 (Core)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

CentOS Linux release 7.9.2009 (Core)
CentOS Linux release 7.9.2009 (Core)
[root@fikc-naglsprod11 ~]#


Re: LOG server running out of space, but this is not Indexes

Posted: Thu Jun 03, 2021 4:51 pm
by ssax
Please PM me a copy of your profile, you can download it from Admin > System Status by clicking the Download System Profile button.

Send the output of this command as well:

Code: Select all

sudo du -ah /* | sort -rn | head -n 50

Re: LOG server running out of space, but this is not Indexes

Posted: Fri Jun 04, 2021 8:05 am
by dlukinski
ssax wrote:Please PM me a copy of your profile, you can download it from Admin > System Status by clicking the Download System Profile button.

Send the output of this command as well:

Code: Select all

sudo du -ah /* | sort -rn | head -n 50
Profile attached.

Here is the command output:

Code: Select all

login as: root
[email protected]'s password:
Last login: Thu Jun  3 15:21:27 2021 from pf1rfv2r.res.kcg.global
[root@fikc-naglsprod11 ~]# sudo du -ah /* | sort -rn | head -n 50
du: cannot access ‘/proc/1193/task/1320/fdinfo/2379’: No such file or directory
du: cannot access ‘/proc/1193/task/1320/fdinfo/2386’: No such file or directory
du: cannot access ‘/proc/1193/task/1328/fdinfo/2011’: No such file or directory
du: cannot access ‘/proc/1193/task/1328/fdinfo/2032’: No such file or directory
du: cannot access ‘/proc/1193/task/1570/fdinfo/2388’: No such file or directory
du: cannot access ‘/proc/1193/task/1570/fdinfo/2397’: No such file or directory
du: cannot access ‘/proc/1193/task/1584/fd/2124’: No such file or directory
du: cannot access ‘/proc/1193/task/1630/fdinfo/1958’: No such file or directory
du: cannot access ‘/proc/1193/task/1630/fdinfo/2366’: No such file or directory
du: cannot access ‘/proc/1193/task/1630/fdinfo/2379’: No such file or directory
du: cannot access ‘/proc/1193/task/1630/fdinfo/2397’: No such file or directory
du: cannot access ‘/proc/1193/task/1635/fdinfo/2397’: No such file or directory
du: cannot access ‘/proc/46882/task/46882/fd/3’: No such file or directory
du: cannot access ‘/proc/46882/task/46882/fdinfo/3’: No such file or directory
du: cannot access ‘/proc/46882/fd/3’: No such file or directory
du: cannot access ‘/proc/46882/fdinfo/3’: No such file or directory
du: cannot access ‘/proc/61005/task/54828/fd/627’: No such file or directory
1020K   /var/www/html/nagioslogserver/application/language/pt_PT
1020K   /usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/zookeeper-1.4.11-java/ext
1020K   /usr/local/nagioslogserver/elasticsearch/data/a62b3b39-5815-4f18-82ab-828b9c557090/nodes/0/indices/logstash-2021.05.31/4/index/_ev4.cfs
1020K   /usr/local/nagioslogserver/elasticsearch/data/a62b3b39-5815-4f18-82ab-828b9c557090/nodes/0/indices/logstash-2021.05.15/0/index/_eyh_Lucene41_0.tip
1020K   /usr/local/nagioslogserver/elasticsearch/data/a62b3b39-5815-4f18-82ab-828b9c557090/nodes/0/indices/logstash-2021.05.13/0/index/_f0q_Lucene41_0.tip
1020K   /usr/lib/firmware/mellanox/mlxsw_spectrum-13.2000.2714.mfa2
1020K   /usr/lib/firmware/mellanox/mlxsw_spectrum-13.2000.2308.mfa2
1020K   /usr/lib/firmware/iwlwifi-cc-a0-46.ucode
1020K   /usr/lib/firmware/dpaa2/mc/mc_10.16.2_lx2160a.itb
1020K   /opt/puppetlabs/puppet/lib/ruby/gems/2.7.0/gems/ffi-1.13.1/ext/ffi_c/libffi-x86_64-linux
1020K   /mnt/nfs/backups/indices/logstash-2021.05.13/0/__5
1016K   /var/www/html/nagioslogserver/application/language/pt_PT/LC_MESSAGES
1016K   /usr/share/locale/pa
1016K   /usr/local/nagioslogserver/elasticsearch/data/a62b3b39-5815-4f18-82ab-828b9c557090/nodes/0/indices/logstash-2021.05.26/3/index/_gd3.fdx
1016K   /usr/local/nagioslogserver/elasticsearch/data/a62b3b39-5815-4f18-82ab-828b9c557090/nodes/0/indices/logstash-2021.05.26/1/index/_g2x_Lucene410_0.dvd
1016K   /usr/local/nagioslogserver/elasticsearch/data/a62b3b39-5815-4f18-82ab-828b9c557090/nodes/0/indices/logstash-2021.05.26/0/index/_g86_Lucene410_0.dvd
1016K   /usr/lib/firmware/iwlwifi-7265D-29.ucode
1016K   /usr/lib/firmware/iwlwifi-3168-29.ucode
1016K   /usr/lib/firmware/dpaa2/mc/mc_10.16.2_ls2088a.itb
1016K   /usr/bin/grub2-mkrescue
1012K   /usr/share/locale/pa/LC_MESSAGES
1012K   /usr/local/nagioslogserver/elasticsearch/data/a62b3b39-5815-4f18-82ab-828b9c557090/nodes/0/indices/logstash-2021.05.21/0/index/_f1d_Lucene41_0.tip
1012K   /usr/local/nagioslogserver/elasticsearch/data/a62b3b39-5815-4f18-82ab-828b9c557090/nodes/0/indices/logstash-2021.05.15/2/index/_ewa_Lucene41_0.tip
1012K   /usr/local/nagioslogserver/elasticsearch/data/a62b3b39-5815-4f18-82ab-828b9c557090/nodes/0/indices/logstash-2021.05.07/4/index/_cd4_Lucene41_0.tip
1012K   /usr/lib/firmware/iwlwifi-7265D-27.ucode
1012K   /usr/lib/firmware/iwlwifi-3168-27.ucode
1012K   /opt/puppetlabs/puppet/lib/ruby/gems/2.7.0/gems/concurrent-ruby-1.1.5/lib
1012K   /mnt/nfs/backups/nagioslogserver/indices/logstash-2021.05.07/4/__t
1008K   /usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/slyphon-zookeeper_jar-3.3.5-java/lib/zookeeper-3.3.5.jar
1008K   /usr/local/nagioslogserver/elasticsearch/data/a62b3b39-5815-4f18-82ab-828b9c557090/nodes/0/indices/logstash-2021.05.13/1/index/_eyu_Lucene41_0.tip
1008K   /usr/local/nagioslogserver/elasticsearch/data/a62b3b39-5815-4f18-82ab-828b9c557090/nodes/0/indices/logstash-2021.04.24/2/index/_ew0_Lucene41_0.tip
1008K   /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.282.b08-1.el7_9.x86_64/jre/lib/jfr.jar
1008K   /usr/lib/firmware/mediatek/mt8183
1008K   /usr/lib/firmware/iwlwifi-7265D-22.ucode
1008K   /usr/libexec/openssh
1008K   /mnt/nfs/backups/nagioslogserver/indices/logstash-2021.04.24/2/__2k
1008K   /mnt/nfs/backups/indices/logstash-2021.05.13/1/__i
1007M   /usr/local/nagioslogserver/elasticsearch/data/a62b3b39-5815-4f18-82ab-828b9c557090/nodes/0/indices/nagioslogserver_history/0/index/_an610_Lucene41_0.pos
1004K   /usr/local/nagioslogserver/logstash/vendor/jruby/lib/ruby/shared/jopenssl.jar
1004K   /usr/local/nagioslogserver/elasticsearch/data/a62b3b39-5815-4f18-82ab-828b9c557090/nodes/0/indices/logstash-2021.06.01/0/index/_f2x.cfs
1004K   /usr/local/nagioslogserver/elasticsearch/data/a62b3b39-5815-4f18-82ab-828b9c557090/nodes/0/indices/logstash-2021.05.21/2/index/_f0t_Lucene41_0.tip
1004K   /usr/lib/firmware/mediatek/mt8183/scp.img
1004K   /usr/lib/firmware/iwlwifi-3168-22.ucode
1004K   /opt/puppetlabs/puppet/lib/ruby/vendor_gems/gems/gettext-3.2.2/samples/locale
1004K   /mnt/nfs/backups/nagioslogserver/indices/logstash-2021.05.08/2/__2e
1000K   /usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/edn-1.1.1/spec/exemplars
1000K   /usr/local/nagioslogserver/elasticsearch/data/a62b3b39-5815-4f18-82ab-828b9c557090/nodes/0/indices/logstash-2021.05.02/0/index/_el4_Lucene41_0.tip
1000K   /usr/local/nagioslogserver/elasticsearch/data/a62b3b39-5815-4f18-82ab-828b9c557090/nodes/0/indices/logstash-2021.04.24/4/index/_ew3_Lucene41_0.tip
1000K   /usr/local/nagioslogserver/elasticsearch/data/a62b3b39-5815-4f18-82ab-828b9c557090/nodes/0/indices/logstash-2021.04.12/4/index/_ego_Lucene41_0.tip
1000K   /mnt/nfs/backups/nagioslogserver/indices/logstash-2021.05.02/0/__1e
[root@fikc-naglsprod11 ~]#


Re: LOG server running out of space, but this is not Indexes

Posted: Mon Jun 07, 2021 11:04 am
by ssax
What is the output of this command?

Code: Select all

lsof | grep deleted
You'll need to find out where all the space is being consumed by doing this:

Code: Select all

cd /
du -sh *
cd /largedirectory
du -sh *
cd /largedirectory/nextlargestdirectory
du -sh *
etc ..
Continue doing that for the large directories until you find where all the data is being consumed, I'm unable to see what it is consuming it from the output or from your profile.

Re: LOG server running out of space, but this is not Indexes

Posted: Tue Jun 08, 2021 10:26 am
by dlukinski
ssax wrote:What is the output of this command?

Code: Select all

lsof | grep deleted
You'll need to find out where all the space is being consumed by doing this:

Code: Select all

cd /
du -sh *
cd /largedirectory
du -sh *
cd /largedirectory/nextlargestdirectory
du -sh *
etc ..
Continue doing that for the large directories until you find where all the data is being consumed, I'm unable to see what it is consuming it from the output or from your profile.
This is what the problem is (appr 200 GBs are "missing" in between df -h and du sh *
[root@fikc-naglsprod11 data]# cd /
[root@fikc-naglsprod11 /]# du -sh *
0 bin
281M boot
136K dev
40M etc
176K home
0 lib
0 lib64
16K lost+found
4.0K media
531G mnt
468M opt
du: cannot access ‘proc/1193/task/1561/fdinfo/1830’: No such file or directory
du: cannot access ‘proc/1193/task/1561/fdinfo/1866’: No such file or directory
du: cannot access ‘proc/1193/task/1561/fdinfo/1985’: No such file or directory
du: cannot access ‘proc/1193/task/1568/fdinfo/1830’: No such file or directory
du: cannot access ‘proc/1193/task/1569/fdinfo/1866’: No such file or directory
du: cannot access ‘proc/1193/task/1569/fdinfo/2312’: No such file or directory
du: cannot access ‘proc/1193/task/1570/fd/1985’: No such file or directory
du: cannot access ‘proc/1193/task/1577/fdinfo/1985’: No such file or directory
du: cannot access ‘proc/1193/task/1581/fdinfo/1866’: No such file or directory
du: cannot access ‘proc/1193/task/1582/fd/1866’: No such file or directory
du: cannot access ‘proc/1193/task/1881/fd/205’: No such file or directory
du: cannot access ‘proc/1193/task/1881/fd/209’: No such file or directory
du: cannot access ‘proc/1193/task/1881/fd/349’: No such file or directory
du: cannot access ‘proc/1193/task/1881/fd/402’: No such file or directory
du: cannot access ‘proc/1193/task/1895/fd/209’: No such file or directory
du: cannot access ‘proc/1193/task/1895/fd/2211’: No such file or directory
du: cannot access ‘proc/11730’: No such file or directory
du: cannot access ‘proc/11731’: No such file or directory
du: cannot access ‘proc/11736’: No such file or directory
du: cannot access ‘proc/11981/task/11981/fd/3’: No such file or directory
du: cannot access ‘proc/11981/task/11981/fdinfo/3’: No such file or directory
du: cannot access ‘proc/11981/fd/3’: No such file or directory
du: cannot access ‘proc/11981/fdinfo/3’: No such file or directory
du: cannot access ‘proc/61005/task/61079/fdinfo/377’: No such file or directory
du: cannot access ‘proc/61005/task/61123/fd/441’: No such file or directory
du: cannot access ‘proc/61005/task/61169/fdinfo/627’: No such file or directory
0 proc
864K root
1.7G run
0 sbin
4.0K srv
12K store
0 sys
2.2M tmp
200G usr # ------------ > only large directory with ElasticSearch data !
967M var
[root@fikc-naglsprod11 /]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 16G 0 16G 0% /dev
tmpfs 16G 220K 16G 1% /dev/shm
tmpfs 16G 1.7G 15G 11% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/mapper/centos-root 490G 410G 60G 88% / # ---------------------- > 500 GBs volume !!!
/dev/sda1 976M 284M 626M 32% /boot
fikc-isilon01.res.kcg.global:/ifs/data/fikc-nagxiprod01-backup 300G 241G 60G 81% /mnt/nfs/backup
tmpfs 3.2G 0 3.2G 0% /run/user/1000
tmpfs 3.2G 0 3.2G 0% /run/user/0
tmpfs 3.2G 0 3.2G 0% /run/user/48
[root@fikc-naglsprod11 /]#

Re: LOG server running out of space, but this is not Indexes

Posted: Tue Jun 08, 2021 2:44 pm
by ssax
Did this output anything? Sometimes the files can be deleted but still consuming the space because something still has them open.

Code: Select all

lsof | grep deleted
Continue down the path:

Code: Select all

cd /usr
du -sh *
cd /usr/nextlargestone
du -sh *
etc
That's the only way I know how to find what exactly consuming the space.

Re: LOG server running out of space, but this is not Indexes

Posted: Tue Jun 08, 2021 6:44 pm
by dlukinski
ssax wrote:Did this output anything? Sometimes the files can be deleted but still consuming the space because something still has them open.

Code: Select all

lsof | grep deleted
Continue down the path:

Code: Select all

cd /usr
du -sh *
cd /usr/nextlargestone
du -sh *
etc
That's the only way I know how to find what exactly consuming the space.
No deleted files, /usr contains 190 GBs of Elastic search.
200 GBs are somewhere invisible

Re: LOG server running out of space, but this is not Indexes

Posted: Wed Jun 09, 2021 2:05 pm
by ssax
Please create a ticket for this and include a link back to this forum thread so we can get a remote session setup:

https://support.nagios.com/tickets/

Thank you!