URGENT: Incident for FS saturation.

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
monitoreo1
Posts: 124
Joined: Wed Feb 18, 2015 10:41 am

URGENT: Incident for FS saturation.

Post by monitoreo1 »

Hi Everybody !!!!

Today we had an incident that was generated for a file system saturation. This FS was at 100 % in use. This caused that the users can´t acces to Nagios via web.

When we saw that the /var/ was full, erased the files in that file system ( before, we backup them ), also we ran a procedure recommended in the
access web page to Nagios:

"Message: A database connection error has been detected, we are attempting to repair the server, if the repair does not resolve the issue, please contact Nagios support.
Run the following from the CLI as root to attempt to repair the DB
/usr/local/nagiosxi/scripts/repair_databases.sh"

I understand that there is an automatically procedure to backup the files and data bases of Nagios, but my question is if the scope of this procedure is remove all the information that was backed with the backup_xi.sh procedure.

This is our actually FS ocupation:

[root@XXXXXXX var]# df /var/
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VGSYS-LVVAR 6133760 3507444 2626316 58% /var
[root@XXXXXXX var]# df /var/log/
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VGSYS-LVVAR 6133760 3507492 2626268 58% /var
[root@XXXXXXX var]# df /usr/local/nagiosxi/
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VGSYS-LVUSR 6133760 4459252 1674508 73% /usr
[root@XXXXXXX var]# df /usr/local/nagios/
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VGSYS-LVUSR 6133760 4458152 1675608 73% /usr

The question is, what files can we erase to prevent a new file system saturation ?

Thanks for your help !!!!

We are Using:

Nagios XI Version : 2014R2.6
x86_64
Red Hat Enterprise Linux Server release 7.0 (Maipo)
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: URGENT: Incident for FS saturation.

Post by Box293 »

What is the disk usage like in /store/ ?

Code: Select all

df /store/
du -h /store/
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
monitoreo1
Posts: 124
Joined: Wed Feb 18, 2015 10:41 am

Re: URGENT: Incident for FS saturation.

Post by monitoreo1 »

Here is the information:

708K /store/backups/mysql/daily/mysql
79M /store/backups/mysql/daily/nagios
576K /store/backups/mysql/daily/nagiosql
28K /store/backups/mysql/daily/test
81M /store/backups/mysql/daily
704K /store/backups/mysql/weekly/mysql
53M /store/backups/mysql/weekly/nagios
428K /store/backups/mysql/weekly/nagiosql
24K /store/backups/mysql/weekly/test
54M /store/backups/mysql/weekly
280K /store/backups/mysql/monthly/mysql
9.6M /store/backups/mysql/monthly/nagios
124K /store/backups/mysql/monthly/nagiosql
8.0K /store/backups/mysql/monthly/test
10M /store/backups/mysql/monthly
144M /store/backups/mysql
1.3M /store/backups/postgresql/daily/nagiosxi
28K /store/backups/postgresql/daily/postgres
28K /store/backups/postgresql/daily/template1
1.4M /store/backups/postgresql/daily
1004K /store/backups/postgresql/weekly/nagiosxi
24K /store/backups/postgresql/weekly/postgres
24K /store/backups/postgresql/weekly/template1
1.1M /store/backups/postgresql/weekly
248K /store/backups/postgresql/monthly/nagiosxi
8.0K /store/backups/postgresql/monthly/postgres
8.0K /store/backups/postgresql/monthly/template1
264K /store/backups/postgresql/monthly
2.6M /store/backups/postgresql
0 /store/backups/nagiosxi
147M /store/backups
147M /store/

Thanks for your help !!!!!
monitoreo1
Posts: 124
Joined: Wed Feb 18, 2015 10:41 am

Re: URGENT: Incident for FS saturation.

Post by monitoreo1 »

And here is the df-h output:

Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VGSYS-LVRAIZ 3.0G 203M 2.8G 7% /
devtmpfs 3.8G 0 3.8G 0% /dev
tmpfs 3.8G 0 3.8G 0% /dev/shm
tmpfs 3.8G 57M 3.8G 2% /run
tmpfs 3.8G 0 3.8G 0% /sys/fs/cgroup
/dev/mapper/VGSYS-LVUSR 5.9G 4.3G 1.6G 73% /usr
/dev/mapper/VGSYS-LVVAR 5.9G 2.9G 3.0G 50% /var
/dev/mapper/VGSYS-LVTMP 3.0G 33M 2.9G 2% /tmp
/dev/mapper/VGSYS-LVOPT 3.0G 33M 2.9G 2% /opt
/dev/mapper/VGPROGPROD-LVSEGURIDAD 2.0G 33M 2.0G 2% /seguridad
/dev/mapper/VGPROGPROD-LVNAGIOSTEST 19M 332K 17M 2% /monitorizaciontest
/dev/mapper/VGPROGPROD-LVPRODUCCION 2.0G 33M 2.0G 2% /produccion
/dev/mapper/VGSYS-LVHOME 3.0G 49M 2.9G 2% /home
/dev/vda1 509M 133M 376M 27% /boot

We want to know what we have to erase if the the FS /usr/ follows growing !!!!!
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: URGENT: Incident for FS saturation.

Post by Box293 »

That all looks OK.
monitoreo1 wrote:The question is, what files can we erase to prevent a new file system saturation ?
/var/ is used for a lot of different purposes. Knowing what can be deleted is somewhat complicated as it may be something that is broken which is causing disk space to grow (spooled files might not be processed).

What I suggest is keeping a daily record of this command:

Code: Select all

du -h /var
If there is something that is growing daily then it should show up and then we can further look into your problem.
monitoreo1 wrote:When we saw that the /var/ was full, erased the files in that file system ( before, we backup them ),
Run the du -h command against the backup you made of /var/ and paste that information here, it might show what the cause of the problem was.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
monitoreo1
Posts: 124
Joined: Wed Feb 18, 2015 10:41 am

Re: URGENT: Incident for FS saturation.

Post by monitoreo1 »

Thanks a lot for your help !!!!

And what about the /usr/ FS ?

Is there something than we can erase if the FS grows?
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: URGENT: Incident for FS saturation.

Post by abrist »

Most of the /usr usage is probably due to nagios logs and performance data. Nagios logs are required for availability/sl reporting, and the performance data rrds are required for graphs. The good news is that the rrd directory size will not grow unless new checks are added and the nagios logs should grow at a predictable rate. What is the output of:

Code: Select all

find /usr/local/nagios -type d -print0 | xargs -0 du | sort -n | tail -20 | cut -f2 | xargs -I{} du -sh {}| sort | uniq
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
monitoreo1
Posts: 124
Joined: Wed Feb 18, 2015 10:41 am

Re: URGENT: Incident for FS saturation.

Post by monitoreo1 »

Hi !!!!


Here is the output:

2.2G /usr/local/nagios/share
2.2G /usr/local/nagios/share/perfdata
2.3G /usr/local/nagios
72M /usr/local/nagios/share/perfdata/lvmcldaspgnp03
72M /usr/local/nagios/share/perfdata/lvmcodaspgnp01
72M /usr/local/nagios/share/perfdata/lvmcopaspgnp01
72M /usr/local/nagios/share/perfdata/lvmmxcaspgnp01
72M /usr/local/nagios/share/perfdata/lvmmxdaspgnp01
72M /usr/local/nagios/share/perfdata/lvmpecaspgnp01
72M /usr/local/nagios/share/perfdata/lvmpedaspgnp01
72M /usr/local/nagios/share/perfdata/lvmpepaspgnp01
72M /usr/local/nagios/share/perfdata/Semilla_Postgres
73M /usr/local/nagios/share/perfdata/lvmclcaspgnp02
74M /usr/local/nagios/share/perfdata/lvmclcaspgnp03
74M /usr/local/nagios/share/perfdata/lvmclpaspgnp01
74M /usr/local/nagios/share/perfdata/lvmclpaspgnp02
74M /usr/local/nagios/share/perfdata/lvmclpaspgnp03
74M /usr/local/nagios/share/perfdata/Plantilla_Postgres
75M /usr/local/nagios/share/perfdata/lvmclcaspgnp01
77M /usr/local/nagios/var


I'm confused,is used to builds the availability's reporting the information in /var/log/ or /var/?.

Can we erase the information in the FS /var/log/ or /var/?

If i am understanding, we have to grow the FS /usr/ for allows to Nagios generate the availability/sl reporting and the performance data.

Is these correct?


Thanks for your patience and help !!!!!
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: URGENT: Incident for FS saturation.

Post by jdalrymple »

Availability is generated from nagios logs and archives. they will continually grow.

By default:

/usr/local/nagios/var/nagios.log & /usr/local/nagios/var/archives/*
monitoreo1
Posts: 124
Joined: Wed Feb 18, 2015 10:41 am

Re: URGENT: Incident for FS saturation.

Post by monitoreo1 »

ok....thanks again !!!!!
Locked