Page 1 of 2

Super large cleaner.log file made my server crash

Posted: Mon Jul 07, 2014 3:14 am
by WillemDH
Hello,

Apparently this weekend our Nagios production server crashed, since a file in /usr/local:nagiosxi/var named cleaner.log took all available disk space. After deleting the file, the disk space was not automatically freed. We tried restarting httpd, postgresql, mysqld, initd and nagios service, but the used disk space did not became available, so I had to reboot the server (which did seem to free up the used disk space) and had to execute the mysql repair script in order to make Nagios XI work again.

Could I please get some help in finding out what the reason was that this file grew so excessively? Se screenshot for more details..

Df -h after reboot.

Code: Select all

 df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                       34G   14G   19G  43% /
tmpfs                 1.9G     0  1.9G   0% /dev/shm
/dev/sda1              97M   28M   65M  31% /boot
I'm not 100 % sure, but it might have something to do with the backup script, as I did have to do some changes.

As we migrated the backend storage, the location where I rsync the backups had been changed. I used to do

Code: Select all

rsync --remove-source-files -azv /store/backups/nagiosxi /var/Digipolis/Backup
But this was no longer working, as the new mounted filesystem (NetApp) works a bit different then the old (EMC Celerra), so although the Nagios server has write permissions, rsync was not able to get the owner and set permissions, so the new command was:

Code: Select all

rsync --remove-source-files --no-perms -r --no-o --no-g --inplace /store/backups/nagiosxi /var/Digipolis/Backup
I did some test runs of the backup Friday during the day and these all seemed to work fine... I'm not sure what's going on. I've ben trying to move the backups with ftp, but this does not seem to work as expected. i'll make a new thread for this, as this ftp problem is not related.

EDIT 1:
Ok, in the meantime it seems soms php process is using 100 % cpu, I saw this same proces sing 100 % cpu this morning. I attached a screenshot. What could be causing this process to use 100 % cpu?

EDIT 2:
Ok, in the meantime I discovered that /usr/local/nagiosxi/cron/cleaner.php is the evil command using up all the server resources (CPU + disk ^^)

Code: Select all

 ps -eo pcpu,pid,user,args | sort -k 1 -r | head -25
%CPU   PID USER     COMMAND
96.8  3418 nagios   /usr/bin/php -q /usr/local/nagiosxi/cron/cleaner.php
 8.3  1438 root     [flush-253:1]
 3.4  8777 mysql    /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
 2.1 24163 nagios   /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
 2.0 11126 apache   /usr/sbin/httpd
 1.9 25224 apache   /usr/sbin/httpd
 1.7 25226 apache   /usr/sbin/httpd
 1.7 25225 apache   /usr/sbin/httpd
 1.6    25 root     [ksoftirqd/5]
 1.6 25250 apache   /usr/sbin/httpd
 1.6 25227 apache   /usr/sbin/httpd
 1.6 24161 nagios   /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
 1.5 32662 nagios   /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
 1.5 25248 apache   /usr/sbin/httpd
 1.5 25245 apache   /usr/sbin/httpd
 1.4 25244 apache   /usr/sbin/httpd
 1.4 25243 apache   /usr/sbin/httpd
 1.4 21237 apache   /usr/sbin/httpd
 1.3 24162 nagios   /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php
 1.3 24157 nagios   /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php
 1.2  3424 apache   /usr/sbin/httpd
 1.2 24155 nagios   /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php
 1.1  3188 apache   /usr/sbin/httpd
 1.1  3187 apache   /usr/sbin/httpd
So what is this /usr/bin/php -q /usr/local/nagiosxi/cron/cleaner.php doing and how do I stop it from making my server crash again?

EDIT 3:
It seems the cleaner.log file is again 759575031549 bytes! I opened the file and it's full of

Code: Select all

PHP Warning:  readdir() expects parameter 1 to be resource, boolean given in /usr/local/nagiosxi/html/includes/components/scheduledbackups/scheduledbackups.inc.php on line 429
PHP Warning:  readdir() expects parameter 1 to be resource, boolean given in /usr/local/nagiosxi/html/includes/components/scheduledbackups/scheduledbackups.inc.php on line 429
PHP Warning:  readdir() expects parameter 1 to be resource, boolean given in /usr/local/nagiosxi/html/includes/components/scheduledbackups/scheduledbackups.inc.php on line 429
PHP Warning:  readdir() expects parameter 1 to be resource, boolean given in /usr/local/nagiosxi/html/includes/components/scheduledbackups/scheduledbackups.inc.php on line 429
PHP Warning:  readdir() expects parameter 1 to be resource, boolean given in /usr/local/nagiosxi/html/includes/components/scheduledbackups/scheduledbackups.inc.php on line 429
PHP Warning:  readdir() expects parameter 1 to be resource, boolean given in /usr/local/nagiosxi/html/includes/components/scheduledbackups/scheduledbackups.inc.php on line 429
Local scheduled backups is disabled! only ftp backups is enabled. What can I do to make this stop? kill the process?

Thanks..

Willem

Re: Super large cleaner.log file made my server crash

Posted: Mon Jul 07, 2014 10:55 am
by jomann
Hello WillemDH,

I believe this is caused by a specific situation that's causing the file directory read to create an infinite loop ... which might explain why the log file gets that big and never truncates.
I'm creating a new version of the Scheduled Backups component that should no longer cause that warning (and the loop) and will post it for you momentarily once it's done building.

Re: Super large cleaner.log file made my server crash

Posted: Mon Jul 07, 2014 11:02 am
by WillemDH
Hey Jomann,

I really hope you can give me a solution soon, as deleting the file doesn't even help, as the file is locked by the php process.. After deleting the file, I need to reboot the server... :( Is there anything else I can do to stop it as a workaround? Can I kill this process safely?

Grtz

Willem

Re: Super large cleaner.log file made my server crash

Posted: Mon Jul 07, 2014 11:03 am
by jomann
You should be able to kill the process safely however another one will most likely start on the next cron run so you have limited time to remove the file. The fix should be available within the next couple minutes.

Re: Super large cleaner.log file made my server crash

Posted: Mon Jul 07, 2014 11:05 am
by jomann
Here's the new component. You can just install it via the GUI if you'd like or you can unzip it into /usr/local/nagiosxi/html/includes/components

Re: Super large cleaner.log file made my server crash

Posted: Mon Jul 07, 2014 11:34 am
by WillemDH
Hey,

I get 'Component installation failed. Uploaded zip file is not a component.'

Grtz

Willem

Re: Super large cleaner.log file made my server crash

Posted: Mon Jul 07, 2014 11:36 am
by jomann
You'll have to do the manual unzip then, didn't think it checked when it was encrypted but you can overwrite the "scheduledbackups" folder in the /usr/local/nagiosxi/html/includes/components directory with the file I gave you. The zip contains the main folder "scheduledbackups" inside it so you can extract it directly into the components directory and overwrite the old one or if you want you can rename the old scheduledbackups to something like "scheduledbackups_old"

Re: Super large cleaner.log file made my server crash

Posted: Mon Jul 07, 2014 11:40 am
by WillemDH
I guess you mean '/usr/local/nagiosxi/html/includes/components/scheduledbackups'. Gonna install it in 5 minutes. I'll keep you posted.

Re: Super large cleaner.log file made my server crash

Posted: Mon Jul 07, 2014 11:44 am
by jomann
Yes sorry, let me know how it goes.

Re: Super large cleaner.log file made my server crash

Posted: Mon Jul 07, 2014 11:56 am
by WillemDH
Ok, installed the new component. Then removed the cleaner.log, which was gigantic in the meantime and then I killed the php process. I think the diskspace was freed now. I'll monitor it closely. Did you read my other post about the ftp backup not working as expected? http://support.nagios.com/forum/viewtop ... 16&t=28079 If you would find some time to answer soem of my questions there about ftp backup?

As the backups are already 500+ MB, they need to be moved and not copied.

Also when I re-enable local backups and test permissions, I get "the directory specified does not exist". This is a mounted volume, which I can perfectly access with cli...

Thanks for the fast solution Jake!

Grtz

Willem