Page 1 of 2

Disk space is filling up repidly

Posted: Thu Nov 28, 2019 11:18 pm
by snapon_admin
I just got a warning alert that my disk space was filling up on our XI server, which is weird because I check that fairly regularly and when I checked it a few weeks ago it wasn't even close. It appears that something changed last week friday around 3:30 (I honestly have no idea what that could be at this point as it's been a week and I don't recall what I did) that is causing the disk to rapidly fill up. How do I find out what is causing this and correct it?

Last 30 days chart for root partition:
root filling up.png

Re: Disk space is filling up repidly

Posted: Mon Dec 02, 2019 7:50 am
by scottwilkerson
We likely are going to have to do some rudimentary checking to see which base directories contains the large files and then work our way in

Code: Select all

du -hs /*|grep G

Re: Disk space is filling up repidly

Posted: Mon Dec 02, 2019 11:59 am
by snapon_admin
I think I figured out the issue and something really strange is going on with snmptt. On the Friday this all started I had an issue with snmptt (/var/spool/snmptt was filling up and snmptt wasn't processing those traps) so I restarted it and it appeared everything was fine. I checked the spool directory and files were being generated and processed normally so I didn't think anything of it. Last night, just to see if it was a similar issue I checked the spool directory again and still everything seemed normal as far as the amount of files. The strange thing is when I look at /var/spool/snmptt it appears that there aren't many files there but the directory itself seems huge. I restarted snmptt last night and went from 92% full to 53%. Disk space seems to be filling up again so it appears whatever the issue is has not been resolved.

Code: Select all

[root@lisl-ngos-01-pv nagiosxi]# ll /var/spool/snmptt
total 36
-rw-r--r--. 1 root root 731 Dec  2 10:57 #snmptt-trap-1575305878342456
-rw-r--r--. 1 root root 730 Dec  2 10:57 #snmptt-trap-1575305878687553
-rw-r--r--. 1 root root 730 Dec  2 10:57 #snmptt-trap-1575305878748804
-rw-r--r--. 1 root root 729 Dec  2 10:57 #snmptt-trap-1575305878811611
-rw-r--r--. 1 root root 730 Dec  2 10:57 #snmptt-trap-1575305878871442
-rw-r--r--. 1 root root 730 Dec  2 10:57 #snmptt-trap-1575305878932084
-rw-r--r--. 1 root root 729 Dec  2 10:57 #snmptt-trap-1575305878992901
-rw-r--r--. 1 root root 730 Dec  2 10:57 #snmptt-trap-1575305879052421
-rw-r--r--. 1 root root 725 Dec  2 10:57 #snmptt-trap-1575305879120899
[root@lisl-ngos-01-pv nagiosxi]# ll /var/spool/
total 1037488
drwxr-xr-x.  4 abrt   abrt         4096 Oct 29 12:55 abrt
drwx------.  2 abrt   abrt         4096 Mar 23  2017 abrt-upload
drwxr-xr-x.  2 root   root         4096 Aug 23  2016 anacron
drwx------.  3 daemon daemon       4096 Mar 21  2017 at
drwxrwx---.  2 smmsp  smmsp      122880 Dec  2 10:58 clientmqueue
drwx------.  2 root   root         4096 Apr 26  2019 cron
drwxr-xr-x.  2 root   root         4096 Sep 23  2011 lpd
drwxrwxr-x.  2 root   mail         4096 Dec  2 10:58 mail
drwx------.  2 root   mail         4096 Dec  2 10:58 mqueue
drwxr-xr-x.  2 root   root         4096 Apr 11  2019 plymouth
drwxr-xr-x. 16 root   root         4096 Mar 23  2017 postfix
drwxrwxr-x.  2 snmptt snmptt 1061949440 Dec  2 10:58 snmptt

Re: Disk space is filling up repidly

Posted: Mon Dec 02, 2019 12:59 pm
by scottwilkerson
Hmm, are there hidden files you cannot see with the ll command?

Code: Select all

ls -al /var/spool/snmptt

Re: Disk space is filling up repidly

Posted: Mon Dec 02, 2019 3:12 pm
by snapon_admin
No idea what the heck this is (see top 2 items on this list)...

Code: Select all

[root@lisl-ngos-01-pv nagiosxi]# ls -al /var/spool/snmptt
total 1037408
drwxrwxr-x.  2 snmptt snmptt 1061949440 Dec  2 14:11 .
drwxr-xr-x. 14 root   root         4096 Dec 11  2017 ..
-rw-r--r--.  1 root   root          744 Dec  2 14:11 #snmptt-trap-1575317492281657
-rw-r--r--.  1 root   root          749 Dec  2 14:11 #snmptt-trap-1575317492622991
-rw-r--r--.  1 root   root          729 Dec  2 14:11 #snmptt-trap-1575317492713519
-rw-r--r--.  1 root   root          749 Dec  2 14:11 #snmptt-trap-1575317492781575
-rw-r--r--.  1 root   root          731 Dec  2 14:11 #snmptt-trap-1575317492861147
-rw-r--r--.  1 root   root          731 Dec  2 14:11 #snmptt-trap-1575317492937122
-rw-r--r--.  1 root   root          708 Dec  2 14:11 #snmptt-trap-1575317493090880
-rw-r--r--.  1 root   root          730 Dec  2 14:11 #snmptt-trap-1575317493158825
-rw-r--r--.  1 root   root          751 Dec  2 14:11 #snmptt-trap-1575317493233916
-rw-r--r--.  1 root   root          728 Dec  2 14:11 #snmptt-trap-1575317493309132
-rw-r--r--.  1 root   root          729 Dec  2 14:11 #snmptt-trap-1575317493439817
-rw-r--r--.  1 root   root          754 Dec  2 14:11 #snmptt-trap-1575317493505636
-rw-r--r--.  1 root   root          729 Dec  2 14:11 #snmptt-trap-1575317493606173
-rw-r--r--.  1 root   root          751 Dec  2 14:11 #snmptt-trap-1575317493703977
-rw-r--r--.  1 root   root          749 Dec  2 14:11 #snmptt-trap-1575317493858005
-rw-r--r--.  1 root   root          731 Dec  2 14:11 #snmptt-trap-1575317493977368
-rw-r--r--.  1 root   root          749 Dec  2 14:11 #snmptt-trap-1575317494048057
-rw-r--r--.  1 root   root          731 Dec  2 14:11 #snmptt-trap-1575317494116633
-rw-r--r--.  1 root   root          728 Dec  2 14:11 #snmptt-trap-1575317494200180
-rw-r--r--.  1 root   root          749 Dec  2 14:11 #snmptt-trap-1575317494277511

Re: Disk space is filling up repidly

Posted: Mon Dec 02, 2019 4:09 pm
by ssax
I've seen this before related to a antivirus/security products not releasing the files but never from anything else, please see here:

Code: Select all

https://access.redhat.com/solutions/2316
Do this and send the output:

Code: Select all

yum install lsof
lsof | grep deleted

Re: Disk space is filling up repidly

Posted: Mon Dec 02, 2019 4:20 pm
by snapon_admin

Code: Select all

[root@lisl-ngos-01-pv nagiosxi]# lsof | grep deleted
abrt-dump  2152      root  txt       REG              253,1      29320     159022 /usr/bin/abrt-dump-oops (deleted)
mysqld     3201     mysql  txt       REG              253,1    8095048     148859 /usr/libexec/mysqld (deleted)
mysqld     3201     mysql    1w      REG              253,1   15219176       1318 /var/log/mysqld.log (deleted)
mysqld     3201     mysql    2w      REG              253,1   15219176       1318 /var/log/mysqld.log (deleted)
mysqld     3201     mysql    4u      REG              253,1          0     262149 /tmp/ib5ieNSM (deleted)
mysqld     3201     mysql    5u      REG              253,1          0     262169 /tmp/ibTOUw0m (deleted)
mysqld     3201     mysql    6u      REG              253,1          0     262193 /tmp/ibpgJh8W (deleted)
mysqld     3201     mysql    7u      REG              253,1          0     263758 /tmp/ibJlLsgx (deleted)
mysqld     3201     mysql   11u      REG              253,1          0     264542 /tmp/ibPfd1q7 (deleted)
mrtg       7618      root    3w      REG              253,1          0      10439 /var/lib/mrtg/mrtg_l_7618 (deleted)
mrtg       7957      root    3w      REG              253,1          0      10439 /var/lib/mrtg/mrtg_l_7618 (deleted)
nagios    14699    nagios   20w      REG               0,19      34517 4176091499 /var/nagiosramdisk/spool/perfdata/1575320854.perfdata.host-PID-18672 (deleted)
nagios    14699    nagios   21w      REG               0,19     348113 4176091489 /var/nagiosramdisk/spool/perfdata/1575320854.perfdata.service-PID-18673 (deleted)

Re: Disk space is filling up repidly

Posted: Mon Dec 02, 2019 4:37 pm
by ssax
Nothing listed for that, please attach this file:

Code: Select all

/usr/local/bin/snmptraphandling.py
What about the output of these:

Code: Select all

df -h
df -i
Additionally, include the output from these if it's not that directory that's currently full:

Code: Select all

du -hs /*|grep G

Re: Disk space is filling up repidly

Posted: Tue Dec 03, 2019 11:41 am
by snapon_admin

Code: Select all

#!/usr/bin/env python

"""
Written by Francois Meehan (Cedval Info)
First release 2004/09/15
Modified by Nagios Enterprises, LLC.

This script receives input from sec.pl concerning translated snmptraps

*** Important note: sec must send DATA within quotes


Ex: ./services.py <HOST> <SERVICE> <SEVERITY> <TIME> <PERFDATA> <DATA>
"""

import sys, os, stat, signal

signal.alarm(15)


def printusage():
    print "usage: services.py <HOST> <SERVICE> <SEVERITY> <TIME> <PERFDATA> <DATA>"
    sys.exit()


def check_arg():
        try:
                host = sys.argv[1]
        except:
                printusage()
        try:
                service = sys.argv[2]
        except:
                printusage()
        try:
                severity = sys.argv[3]
        except:
                printusage()
        try:
                mytime = sys.argv[4]
        except:
                printusage()
        try:
                if sys.argv[5] == '':
                        mondata_res = sys.argv[6]
                else:
                        mondata_res = sys.argv[6] + " / " + sys.argv[5]
        except:
                printusage()
        return (host, service, severity, mytime, mondata_res)


def get_return_code(severity):
    severity = severity.upper()
    if severity == "INFORMATIONAL":
        return_code = "0"
    elif severity == "NORMAL":
        return_code = "0"
    elif severity == "SEVERE":
        return_code = "2"
    elif severity == "MAJOR":
        return_code = "2"
    elif severity == "CRITICAL":
        return_code = "2"
    elif severity == "WARNING":
        return_code = "1"
    elif severity == "MINOR":
        return_code = "1"
    else:
        printusage()
    return return_code


def post_results(host, service, mytime, mondata_res, return_code):
    if os.path.exists('/usr/local/nagios/var/rw/nagios.cmd') and stat.S_ISFIFO(os.stat('/usr/local/nagios/var/rw/nagios.cmd').st_mode):
        output = open('/usr/local/nagios/var/rw/nagios.cmd', 'w')
        if service == 'PROCESS_HOST_CHECK_RESULT':
            results = "[" + mytime + "] " + "PROCESS_HOST_CHECK_RESULT;" \
                + host + ";" + return_code + ";" + mondata_res + "\n"
            
        else:
            results = "[" + mytime + "] " + "PROCESS_SERVICE_CHECK_RESULT;" \
                + host + ";" + service + ";" \
                + return_code + ";" + mondata_res + "\n"
        output.write(results)


# Main routine...
if __name__ == '__main__':
    (host, service, severity, mytime, mondata_res) = check_arg()  # validating
                                                                  # parameters
    return_code = get_return_code(severity)
    post_results(host, service, mytime, mondata_res, return_code)

Code: Select all

[root@lisl-ngos-01-pv nagiosxi]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      295G  176G  105G  63% /
tmpfs                  16G     0   16G   0% /dev/shm
/dev/sda1              97M   37M   56M  40% /boot
tmpfs                 500M   36M  465M   8% /var/nagiosramdisk
[root@lisl-ngos-01-pv nagiosxi]# df -i
Filesystem             Inodes  IUsed    IFree IUse% Mounted on
/dev/mapper/VolGroup00-LogVol00
                     19636224 229578 19406646    2% /
tmpfs                 4110008      1  4110007    1% /dev/shm
/dev/sda1               25688     39    25649    1% /boot
tmpfs                 4110008     14  4109994    1% /var/nagiosramdisk
[root@lisl-ngos-01-pv nagiosxi]# du -hs /*|grep G
du: cannot access `/proc/14904': No such file or directory
du: cannot access `/proc/14905': No such file or directory
du: cannot access `/proc/14987/task/14987/fd/4': No such file or directory
du: cannot access `/proc/14987/task/14987/fdinfo/4': No such file or directory
du: cannot access `/proc/14987/fd/4': No such file or directory
du: cannot access `/proc/14987/fdinfo/4': No such file or directory
du: cannot access `/proc/15281': No such file or directory
du: cannot access `/proc/15334': No such file or directory
du: cannot access `/proc/15336': No such file or directory
du: cannot access `/proc/15344': No such file or directory
du: cannot access `/proc/15345': No such file or directory
du: cannot access `/proc/15348': No such file or directory
du: cannot access `/proc/15349': No such file or directory
du: cannot access `/proc/15350': No such file or directory
du: cannot access `/proc/15355': No such file or directory
du: cannot access `/proc/15356': No such file or directory
35G     /store
71G     /usr
71G     /var

Re: Disk space is filling up repidly

Posted: Tue Dec 03, 2019 5:54 pm
by tgriep
Certain filesystem keeps a linked list of the files in a folder and that is why if there are a lot of files in the folder, the directory can get fairly large.
The simplest way to shrink it is to just delete it and recreate it and that will reset the size of the folder.

Can you run the following as root on the Nagios server and post it here.
Find largest 10 directories by size command:

Code: Select all

find / -type d -print0 | xargs -0 du | sort -n | tail -10 | cut -f2 | xargs -I{} du -sh {}
Find the largest 10 files by size command:

Code: Select all

find / -type f -print0 | xargs -0 du | sort -n | tail -10 | cut -f2 | xargs -I{} du -sh {}