Page 1 of 2
URGENT: Incident for FS saturation.
Posted: Wed Apr 22, 2015 6:34 pm
by monitoreo1
Hi Everybody !!!!
Today we had an incident that was generated for a file system saturation. This FS was at 100 % in use. This caused that the users can“t acces to Nagios via web.
When we saw that the /var/ was full, erased the files in that file system ( before, we backup them ), also we ran a procedure recommended in the
access web page to Nagios:
"Message: A database connection error has been detected, we are attempting to repair the server, if the repair does not resolve the issue, please contact Nagios support.
Run the following from the CLI as root to attempt to repair the DB
/usr/local/nagiosxi/scripts/repair_databases.sh"
I understand that there is an automatically procedure to backup the files and data bases of Nagios, but my question is if the scope of this procedure is remove all the information that was backed with the backup_xi.sh procedure.
This is our actually FS ocupation:
[root@XXXXXXX var]# df /var/
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VGSYS-LVVAR 6133760 3507444 2626316 58% /var
[root@XXXXXXX var]# df /var/log/
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VGSYS-LVVAR 6133760 3507492 2626268 58% /var
[root@XXXXXXX var]# df /usr/local/nagiosxi/
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VGSYS-LVUSR 6133760 4459252 1674508 73% /usr
[root@XXXXXXX var]# df /usr/local/nagios/
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VGSYS-LVUSR 6133760 4458152 1675608 73% /usr
The question is, what files can we erase to prevent a new file system saturation ?
Thanks for your help !!!!
We are Using:
Nagios XI Version : 2014R2.6
x86_64
Red Hat Enterprise Linux Server release 7.0 (Maipo)
Re: URGENT: Incident for FS saturation.
Posted: Wed Apr 22, 2015 7:22 pm
by Box293
What is the disk usage like in /store/ ?
Re: URGENT: Incident for FS saturation.
Posted: Wed Apr 22, 2015 8:29 pm
by monitoreo1
Here is the information:
708K /store/backups/mysql/daily/mysql
79M /store/backups/mysql/daily/nagios
576K /store/backups/mysql/daily/nagiosql
28K /store/backups/mysql/daily/test
81M /store/backups/mysql/daily
704K /store/backups/mysql/weekly/mysql
53M /store/backups/mysql/weekly/nagios
428K /store/backups/mysql/weekly/nagiosql
24K /store/backups/mysql/weekly/test
54M /store/backups/mysql/weekly
280K /store/backups/mysql/monthly/mysql
9.6M /store/backups/mysql/monthly/nagios
124K /store/backups/mysql/monthly/nagiosql
8.0K /store/backups/mysql/monthly/test
10M /store/backups/mysql/monthly
144M /store/backups/mysql
1.3M /store/backups/postgresql/daily/nagiosxi
28K /store/backups/postgresql/daily/postgres
28K /store/backups/postgresql/daily/template1
1.4M /store/backups/postgresql/daily
1004K /store/backups/postgresql/weekly/nagiosxi
24K /store/backups/postgresql/weekly/postgres
24K /store/backups/postgresql/weekly/template1
1.1M /store/backups/postgresql/weekly
248K /store/backups/postgresql/monthly/nagiosxi
8.0K /store/backups/postgresql/monthly/postgres
8.0K /store/backups/postgresql/monthly/template1
264K /store/backups/postgresql/monthly
2.6M /store/backups/postgresql
0 /store/backups/nagiosxi
147M /store/backups
147M /store/
Thanks for your help !!!!!
Re: URGENT: Incident for FS saturation.
Posted: Wed Apr 22, 2015 8:37 pm
by monitoreo1
And here is the df-h output:
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VGSYS-LVRAIZ 3.0G 203M 2.8G 7% /
devtmpfs 3.8G 0 3.8G 0% /dev
tmpfs 3.8G 0 3.8G 0% /dev/shm
tmpfs 3.8G 57M 3.8G 2% /run
tmpfs 3.8G 0 3.8G 0% /sys/fs/cgroup
/dev/mapper/VGSYS-LVUSR 5.9G 4.3G 1.6G 73% /usr
/dev/mapper/VGSYS-LVVAR 5.9G 2.9G 3.0G 50% /var
/dev/mapper/VGSYS-LVTMP 3.0G 33M 2.9G 2% /tmp
/dev/mapper/VGSYS-LVOPT 3.0G 33M 2.9G 2% /opt
/dev/mapper/VGPROGPROD-LVSEGURIDAD 2.0G 33M 2.0G 2% /seguridad
/dev/mapper/VGPROGPROD-LVNAGIOSTEST 19M 332K 17M 2% /monitorizaciontest
/dev/mapper/VGPROGPROD-LVPRODUCCION 2.0G 33M 2.0G 2% /produccion
/dev/mapper/VGSYS-LVHOME 3.0G 49M 2.9G 2% /home
/dev/vda1 509M 133M 376M 27% /boot
We want to know what we have to erase if the the FS /usr/ follows growing !!!!!
Re: URGENT: Incident for FS saturation.
Posted: Wed Apr 22, 2015 8:39 pm
by Box293
That all looks OK.
monitoreo1 wrote:The question is, what files can we erase to prevent a new file system saturation ?
/var/ is used for a lot of different purposes. Knowing what can be deleted is somewhat complicated as it may be something that is broken which is causing disk space to grow (spooled files might not be processed).
What I suggest is keeping a daily record of this command:
If there is something that is growing daily then it should show up and then we can further look into your problem.
monitoreo1 wrote:When we saw that the /var/ was full, erased the files in that file system ( before, we backup them ),
Run the du -h command against the backup you made of /var/ and paste that information here, it might show what the cause of the problem was.
Re: URGENT: Incident for FS saturation.
Posted: Wed Apr 22, 2015 9:20 pm
by monitoreo1
Thanks a lot for your help !!!!
And what about the /usr/ FS ?
Is there something than we can erase if the FS grows?
Re: URGENT: Incident for FS saturation.
Posted: Thu Apr 23, 2015 2:00 pm
by abrist
Most of the /usr usage is probably due to nagios logs and performance data. Nagios logs are required for availability/sl reporting, and the performance data rrds are required for graphs. The good news is that the rrd directory size will not grow unless new checks are added and the nagios logs should grow at a predictable rate. What is the output of:
Code: Select all
find /usr/local/nagios -type d -print0 | xargs -0 du | sort -n | tail -20 | cut -f2 | xargs -I{} du -sh {}| sort | uniq
Re: URGENT: Incident for FS saturation.
Posted: Thu Apr 23, 2015 2:39 pm
by monitoreo1
Hi !!!!
Here is the output:
2.2G /usr/local/nagios/share
2.2G /usr/local/nagios/share/perfdata
2.3G /usr/local/nagios
72M /usr/local/nagios/share/perfdata/lvmcldaspgnp03
72M /usr/local/nagios/share/perfdata/lvmcodaspgnp01
72M /usr/local/nagios/share/perfdata/lvmcopaspgnp01
72M /usr/local/nagios/share/perfdata/lvmmxcaspgnp01
72M /usr/local/nagios/share/perfdata/lvmmxdaspgnp01
72M /usr/local/nagios/share/perfdata/lvmpecaspgnp01
72M /usr/local/nagios/share/perfdata/lvmpedaspgnp01
72M /usr/local/nagios/share/perfdata/lvmpepaspgnp01
72M /usr/local/nagios/share/perfdata/Semilla_Postgres
73M /usr/local/nagios/share/perfdata/lvmclcaspgnp02
74M /usr/local/nagios/share/perfdata/lvmclcaspgnp03
74M /usr/local/nagios/share/perfdata/lvmclpaspgnp01
74M /usr/local/nagios/share/perfdata/lvmclpaspgnp02
74M /usr/local/nagios/share/perfdata/lvmclpaspgnp03
74M /usr/local/nagios/share/perfdata/Plantilla_Postgres
75M /usr/local/nagios/share/perfdata/lvmclcaspgnp01
77M /usr/local/nagios/var
I'm confused,is used to builds the availability's reporting the information in /var/log/ or /var/?.
Can we erase the information in the FS /var/log/ or /var/?
If i am understanding, we have to grow the FS /usr/ for allows to Nagios generate the availability/sl reporting and the performance data.
Is these correct?
Thanks for your patience and help !!!!!
Re: URGENT: Incident for FS saturation.
Posted: Thu Apr 23, 2015 4:45 pm
by jdalrymple
Availability is generated from nagios logs and archives. they will continually grow.
By default:
/usr/local/nagios/var/nagios.log & /usr/local/nagios/var/archives/*
Re: URGENT: Incident for FS saturation.
Posted: Thu Apr 23, 2015 8:59 pm
by monitoreo1
ok....thanks again !!!!!