Page 1 of 2
Disk space is filling up repidly
Posted: Thu Nov 28, 2019 11:18 pm
by snapon_admin
I just got a warning alert that my disk space was filling up on our XI server, which is weird because I check that fairly regularly and when I checked it a few weeks ago it wasn't even close. It appears that something changed last week friday around 3:30 (I honestly have no idea what that could be at this point as it's been a week and I don't recall what I did) that is causing the disk to rapidly fill up. How do I find out what is causing this and correct it?
Last 30 days chart for root partition:
root filling up.png
Re: Disk space is filling up repidly
Posted: Mon Dec 02, 2019 7:50 am
by scottwilkerson
We likely are going to have to do some rudimentary checking to see which base directories contains the large files and then work our way in
Re: Disk space is filling up repidly
Posted: Mon Dec 02, 2019 11:59 am
by snapon_admin
I think I figured out the issue and something really strange is going on with snmptt. On the Friday this all started I had an issue with snmptt (/var/spool/snmptt was filling up and snmptt wasn't processing those traps) so I restarted it and it appeared everything was fine. I checked the spool directory and files were being generated and processed normally so I didn't think anything of it. Last night, just to see if it was a similar issue I checked the spool directory again and still everything seemed normal as far as the amount of files. The strange thing is when I look at /var/spool/snmptt it appears that there aren't many files there but the directory itself seems huge. I restarted snmptt last night and went from 92% full to 53%. Disk space seems to be filling up again so it appears whatever the issue is has not been resolved.
Code: Select all
[root@lisl-ngos-01-pv nagiosxi]# ll /var/spool/snmptt
total 36
-rw-r--r--. 1 root root 731 Dec 2 10:57 #snmptt-trap-1575305878342456
-rw-r--r--. 1 root root 730 Dec 2 10:57 #snmptt-trap-1575305878687553
-rw-r--r--. 1 root root 730 Dec 2 10:57 #snmptt-trap-1575305878748804
-rw-r--r--. 1 root root 729 Dec 2 10:57 #snmptt-trap-1575305878811611
-rw-r--r--. 1 root root 730 Dec 2 10:57 #snmptt-trap-1575305878871442
-rw-r--r--. 1 root root 730 Dec 2 10:57 #snmptt-trap-1575305878932084
-rw-r--r--. 1 root root 729 Dec 2 10:57 #snmptt-trap-1575305878992901
-rw-r--r--. 1 root root 730 Dec 2 10:57 #snmptt-trap-1575305879052421
-rw-r--r--. 1 root root 725 Dec 2 10:57 #snmptt-trap-1575305879120899
[root@lisl-ngos-01-pv nagiosxi]# ll /var/spool/
total 1037488
drwxr-xr-x. 4 abrt abrt 4096 Oct 29 12:55 abrt
drwx------. 2 abrt abrt 4096 Mar 23 2017 abrt-upload
drwxr-xr-x. 2 root root 4096 Aug 23 2016 anacron
drwx------. 3 daemon daemon 4096 Mar 21 2017 at
drwxrwx---. 2 smmsp smmsp 122880 Dec 2 10:58 clientmqueue
drwx------. 2 root root 4096 Apr 26 2019 cron
drwxr-xr-x. 2 root root 4096 Sep 23 2011 lpd
drwxrwxr-x. 2 root mail 4096 Dec 2 10:58 mail
drwx------. 2 root mail 4096 Dec 2 10:58 mqueue
drwxr-xr-x. 2 root root 4096 Apr 11 2019 plymouth
drwxr-xr-x. 16 root root 4096 Mar 23 2017 postfix
drwxrwxr-x. 2 snmptt snmptt 1061949440 Dec 2 10:58 snmptt
Re: Disk space is filling up repidly
Posted: Mon Dec 02, 2019 12:59 pm
by scottwilkerson
Hmm, are there hidden files you cannot see with the
ll command?
Re: Disk space is filling up repidly
Posted: Mon Dec 02, 2019 3:12 pm
by snapon_admin
No idea what the heck this is (see top 2 items on this list)...
Code: Select all
[root@lisl-ngos-01-pv nagiosxi]# ls -al /var/spool/snmptt
total 1037408
drwxrwxr-x. 2 snmptt snmptt 1061949440 Dec 2 14:11 .
drwxr-xr-x. 14 root root 4096 Dec 11 2017 ..
-rw-r--r--. 1 root root 744 Dec 2 14:11 #snmptt-trap-1575317492281657
-rw-r--r--. 1 root root 749 Dec 2 14:11 #snmptt-trap-1575317492622991
-rw-r--r--. 1 root root 729 Dec 2 14:11 #snmptt-trap-1575317492713519
-rw-r--r--. 1 root root 749 Dec 2 14:11 #snmptt-trap-1575317492781575
-rw-r--r--. 1 root root 731 Dec 2 14:11 #snmptt-trap-1575317492861147
-rw-r--r--. 1 root root 731 Dec 2 14:11 #snmptt-trap-1575317492937122
-rw-r--r--. 1 root root 708 Dec 2 14:11 #snmptt-trap-1575317493090880
-rw-r--r--. 1 root root 730 Dec 2 14:11 #snmptt-trap-1575317493158825
-rw-r--r--. 1 root root 751 Dec 2 14:11 #snmptt-trap-1575317493233916
-rw-r--r--. 1 root root 728 Dec 2 14:11 #snmptt-trap-1575317493309132
-rw-r--r--. 1 root root 729 Dec 2 14:11 #snmptt-trap-1575317493439817
-rw-r--r--. 1 root root 754 Dec 2 14:11 #snmptt-trap-1575317493505636
-rw-r--r--. 1 root root 729 Dec 2 14:11 #snmptt-trap-1575317493606173
-rw-r--r--. 1 root root 751 Dec 2 14:11 #snmptt-trap-1575317493703977
-rw-r--r--. 1 root root 749 Dec 2 14:11 #snmptt-trap-1575317493858005
-rw-r--r--. 1 root root 731 Dec 2 14:11 #snmptt-trap-1575317493977368
-rw-r--r--. 1 root root 749 Dec 2 14:11 #snmptt-trap-1575317494048057
-rw-r--r--. 1 root root 731 Dec 2 14:11 #snmptt-trap-1575317494116633
-rw-r--r--. 1 root root 728 Dec 2 14:11 #snmptt-trap-1575317494200180
-rw-r--r--. 1 root root 749 Dec 2 14:11 #snmptt-trap-1575317494277511
Re: Disk space is filling up repidly
Posted: Mon Dec 02, 2019 4:09 pm
by ssax
I've seen this before related to a antivirus/security products not releasing the files but never from anything else, please see here:
Code: Select all
https://access.redhat.com/solutions/2316
Do this and send the output:
Code: Select all
yum install lsof
lsof | grep deleted
Re: Disk space is filling up repidly
Posted: Mon Dec 02, 2019 4:20 pm
by snapon_admin
Code: Select all
[root@lisl-ngos-01-pv nagiosxi]# lsof | grep deleted
abrt-dump 2152 root txt REG 253,1 29320 159022 /usr/bin/abrt-dump-oops (deleted)
mysqld 3201 mysql txt REG 253,1 8095048 148859 /usr/libexec/mysqld (deleted)
mysqld 3201 mysql 1w REG 253,1 15219176 1318 /var/log/mysqld.log (deleted)
mysqld 3201 mysql 2w REG 253,1 15219176 1318 /var/log/mysqld.log (deleted)
mysqld 3201 mysql 4u REG 253,1 0 262149 /tmp/ib5ieNSM (deleted)
mysqld 3201 mysql 5u REG 253,1 0 262169 /tmp/ibTOUw0m (deleted)
mysqld 3201 mysql 6u REG 253,1 0 262193 /tmp/ibpgJh8W (deleted)
mysqld 3201 mysql 7u REG 253,1 0 263758 /tmp/ibJlLsgx (deleted)
mysqld 3201 mysql 11u REG 253,1 0 264542 /tmp/ibPfd1q7 (deleted)
mrtg 7618 root 3w REG 253,1 0 10439 /var/lib/mrtg/mrtg_l_7618 (deleted)
mrtg 7957 root 3w REG 253,1 0 10439 /var/lib/mrtg/mrtg_l_7618 (deleted)
nagios 14699 nagios 20w REG 0,19 34517 4176091499 /var/nagiosramdisk/spool/perfdata/1575320854.perfdata.host-PID-18672 (deleted)
nagios 14699 nagios 21w REG 0,19 348113 4176091489 /var/nagiosramdisk/spool/perfdata/1575320854.perfdata.service-PID-18673 (deleted)
Re: Disk space is filling up repidly
Posted: Mon Dec 02, 2019 4:37 pm
by ssax
Nothing listed for that, please attach this file:
Code: Select all
/usr/local/bin/snmptraphandling.py
What about the output of these:
Additionally, include the output from these if it's not that directory that's currently full:
Re: Disk space is filling up repidly
Posted: Tue Dec 03, 2019 11:41 am
by snapon_admin
Code: Select all
#!/usr/bin/env python
"""
Written by Francois Meehan (Cedval Info)
First release 2004/09/15
Modified by Nagios Enterprises, LLC.
This script receives input from sec.pl concerning translated snmptraps
*** Important note: sec must send DATA within quotes
Ex: ./services.py <HOST> <SERVICE> <SEVERITY> <TIME> <PERFDATA> <DATA>
"""
import sys, os, stat, signal
signal.alarm(15)
def printusage():
print "usage: services.py <HOST> <SERVICE> <SEVERITY> <TIME> <PERFDATA> <DATA>"
sys.exit()
def check_arg():
try:
host = sys.argv[1]
except:
printusage()
try:
service = sys.argv[2]
except:
printusage()
try:
severity = sys.argv[3]
except:
printusage()
try:
mytime = sys.argv[4]
except:
printusage()
try:
if sys.argv[5] == '':
mondata_res = sys.argv[6]
else:
mondata_res = sys.argv[6] + " / " + sys.argv[5]
except:
printusage()
return (host, service, severity, mytime, mondata_res)
def get_return_code(severity):
severity = severity.upper()
if severity == "INFORMATIONAL":
return_code = "0"
elif severity == "NORMAL":
return_code = "0"
elif severity == "SEVERE":
return_code = "2"
elif severity == "MAJOR":
return_code = "2"
elif severity == "CRITICAL":
return_code = "2"
elif severity == "WARNING":
return_code = "1"
elif severity == "MINOR":
return_code = "1"
else:
printusage()
return return_code
def post_results(host, service, mytime, mondata_res, return_code):
if os.path.exists('/usr/local/nagios/var/rw/nagios.cmd') and stat.S_ISFIFO(os.stat('/usr/local/nagios/var/rw/nagios.cmd').st_mode):
output = open('/usr/local/nagios/var/rw/nagios.cmd', 'w')
if service == 'PROCESS_HOST_CHECK_RESULT':
results = "[" + mytime + "] " + "PROCESS_HOST_CHECK_RESULT;" \
+ host + ";" + return_code + ";" + mondata_res + "\n"
else:
results = "[" + mytime + "] " + "PROCESS_SERVICE_CHECK_RESULT;" \
+ host + ";" + service + ";" \
+ return_code + ";" + mondata_res + "\n"
output.write(results)
# Main routine...
if __name__ == '__main__':
(host, service, severity, mytime, mondata_res) = check_arg() # validating
# parameters
return_code = get_return_code(severity)
post_results(host, service, mytime, mondata_res, return_code)
Code: Select all
[root@lisl-ngos-01-pv nagiosxi]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
295G 176G 105G 63% /
tmpfs 16G 0 16G 0% /dev/shm
/dev/sda1 97M 37M 56M 40% /boot
tmpfs 500M 36M 465M 8% /var/nagiosramdisk
[root@lisl-ngos-01-pv nagiosxi]# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/VolGroup00-LogVol00
19636224 229578 19406646 2% /
tmpfs 4110008 1 4110007 1% /dev/shm
/dev/sda1 25688 39 25649 1% /boot
tmpfs 4110008 14 4109994 1% /var/nagiosramdisk
[root@lisl-ngos-01-pv nagiosxi]# du -hs /*|grep G
du: cannot access `/proc/14904': No such file or directory
du: cannot access `/proc/14905': No such file or directory
du: cannot access `/proc/14987/task/14987/fd/4': No such file or directory
du: cannot access `/proc/14987/task/14987/fdinfo/4': No such file or directory
du: cannot access `/proc/14987/fd/4': No such file or directory
du: cannot access `/proc/14987/fdinfo/4': No such file or directory
du: cannot access `/proc/15281': No such file or directory
du: cannot access `/proc/15334': No such file or directory
du: cannot access `/proc/15336': No such file or directory
du: cannot access `/proc/15344': No such file or directory
du: cannot access `/proc/15345': No such file or directory
du: cannot access `/proc/15348': No such file or directory
du: cannot access `/proc/15349': No such file or directory
du: cannot access `/proc/15350': No such file or directory
du: cannot access `/proc/15355': No such file or directory
du: cannot access `/proc/15356': No such file or directory
35G /store
71G /usr
71G /var
Re: Disk space is filling up repidly
Posted: Tue Dec 03, 2019 5:54 pm
by tgriep
Certain filesystem keeps a linked list of the files in a folder and that is why if there are a lot of files in the folder, the directory can get fairly large.
The simplest way to shrink it is to just delete it and recreate it and that will reset the size of the folder.
Can you run the following as root on the Nagios server and post it here.
Find largest 10 directories by size command:
Code: Select all
find / -type d -print0 | xargs -0 du | sort -n | tail -10 | cut -f2 | xargs -I{} du -sh {}
Find the largest 10 files by size command:
Code: Select all
find / -type f -print0 | xargs -0 du | sort -n | tail -10 | cut -f2 | xargs -I{} du -sh {}