Super large cleaner.log file made my server crash

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Super large cleaner.log file made my server crash

Post by WillemDH »

Hello,

Apparently this weekend our Nagios production server crashed, since a file in /usr/local:nagiosxi/var named cleaner.log took all available disk space. After deleting the file, the disk space was not automatically freed. We tried restarting httpd, postgresql, mysqld, initd and nagios service, but the used disk space did not became available, so I had to reboot the server (which did seem to free up the used disk space) and had to execute the mysql repair script in order to make Nagios XI work again.

Could I please get some help in finding out what the reason was that this file grew so excessively? Se screenshot for more details..

Df -h after reboot.

Code: Select all

 df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                       34G   14G   19G  43% /
tmpfs                 1.9G     0  1.9G   0% /dev/shm
/dev/sda1              97M   28M   65M  31% /boot
I'm not 100 % sure, but it might have something to do with the backup script, as I did have to do some changes.

As we migrated the backend storage, the location where I rsync the backups had been changed. I used to do

Code: Select all

rsync --remove-source-files -azv /store/backups/nagiosxi /var/Digipolis/Backup
But this was no longer working, as the new mounted filesystem (NetApp) works a bit different then the old (EMC Celerra), so although the Nagios server has write permissions, rsync was not able to get the owner and set permissions, so the new command was:

Code: Select all

rsync --remove-source-files --no-perms -r --no-o --no-g --inplace /store/backups/nagiosxi /var/Digipolis/Backup
I did some test runs of the backup Friday during the day and these all seemed to work fine... I'm not sure what's going on. I've ben trying to move the backups with ftp, but this does not seem to work as expected. i'll make a new thread for this, as this ftp problem is not related.

EDIT 1:
Ok, in the meantime it seems soms php process is using 100 % cpu, I saw this same proces sing 100 % cpu this morning. I attached a screenshot. What could be causing this process to use 100 % cpu?

EDIT 2:
Ok, in the meantime I discovered that /usr/local/nagiosxi/cron/cleaner.php is the evil command using up all the server resources (CPU + disk ^^)

Code: Select all

 ps -eo pcpu,pid,user,args | sort -k 1 -r | head -25
%CPU   PID USER     COMMAND
96.8  3418 nagios   /usr/bin/php -q /usr/local/nagiosxi/cron/cleaner.php
 8.3  1438 root     [flush-253:1]
 3.4  8777 mysql    /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
 2.1 24163 nagios   /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
 2.0 11126 apache   /usr/sbin/httpd
 1.9 25224 apache   /usr/sbin/httpd
 1.7 25226 apache   /usr/sbin/httpd
 1.7 25225 apache   /usr/sbin/httpd
 1.6    25 root     [ksoftirqd/5]
 1.6 25250 apache   /usr/sbin/httpd
 1.6 25227 apache   /usr/sbin/httpd
 1.6 24161 nagios   /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
 1.5 32662 nagios   /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
 1.5 25248 apache   /usr/sbin/httpd
 1.5 25245 apache   /usr/sbin/httpd
 1.4 25244 apache   /usr/sbin/httpd
 1.4 25243 apache   /usr/sbin/httpd
 1.4 21237 apache   /usr/sbin/httpd
 1.3 24162 nagios   /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php
 1.3 24157 nagios   /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php
 1.2  3424 apache   /usr/sbin/httpd
 1.2 24155 nagios   /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php
 1.1  3188 apache   /usr/sbin/httpd
 1.1  3187 apache   /usr/sbin/httpd
So what is this /usr/bin/php -q /usr/local/nagiosxi/cron/cleaner.php doing and how do I stop it from making my server crash again?

EDIT 3:
It seems the cleaner.log file is again 759575031549 bytes! I opened the file and it's full of

Code: Select all

PHP Warning:  readdir() expects parameter 1 to be resource, boolean given in /usr/local/nagiosxi/html/includes/components/scheduledbackups/scheduledbackups.inc.php on line 429
PHP Warning:  readdir() expects parameter 1 to be resource, boolean given in /usr/local/nagiosxi/html/includes/components/scheduledbackups/scheduledbackups.inc.php on line 429
PHP Warning:  readdir() expects parameter 1 to be resource, boolean given in /usr/local/nagiosxi/html/includes/components/scheduledbackups/scheduledbackups.inc.php on line 429
PHP Warning:  readdir() expects parameter 1 to be resource, boolean given in /usr/local/nagiosxi/html/includes/components/scheduledbackups/scheduledbackups.inc.php on line 429
PHP Warning:  readdir() expects parameter 1 to be resource, boolean given in /usr/local/nagiosxi/html/includes/components/scheduledbackups/scheduledbackups.inc.php on line 429
PHP Warning:  readdir() expects parameter 1 to be resource, boolean given in /usr/local/nagiosxi/html/includes/components/scheduledbackups/scheduledbackups.inc.php on line 429
Local scheduled backups is disabled! only ftp backups is enabled. What can I do to make this stop? kill the process?

Thanks..

Willem
You do not have the required permissions to view the files attached to this post.
Nagios XI 5.8.1
https://outsideit.net
jomann
Development Lead
Posts: 611
Joined: Mon Apr 22, 2013 10:06 am
Location: Nagios Enterprises

Re: Super large cleaner.log file made my server crash

Post by jomann »

Hello WillemDH,

I believe this is caused by a specific situation that's causing the file directory read to create an infinite loop ... which might explain why the log file gets that big and never truncates.
I'm creating a new version of the Scheduled Backups component that should no longer cause that warning (and the loop) and will post it for you momentarily once it's done building.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Super large cleaner.log file made my server crash

Post by WillemDH »

Hey Jomann,

I really hope you can give me a solution soon, as deleting the file doesn't even help, as the file is locked by the php process.. After deleting the file, I need to reboot the server... :( Is there anything else I can do to stop it as a workaround? Can I kill this process safely?

Grtz

Willem
Nagios XI 5.8.1
https://outsideit.net
jomann
Development Lead
Posts: 611
Joined: Mon Apr 22, 2013 10:06 am
Location: Nagios Enterprises

Re: Super large cleaner.log file made my server crash

Post by jomann »

You should be able to kill the process safely however another one will most likely start on the next cron run so you have limited time to remove the file. The fix should be available within the next couple minutes.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
jomann
Development Lead
Posts: 611
Joined: Mon Apr 22, 2013 10:06 am
Location: Nagios Enterprises

Re: Super large cleaner.log file made my server crash

Post by jomann »

Here's the new component. You can just install it via the GUI if you'd like or you can unzip it into /usr/local/nagiosxi/html/includes/components
You do not have the required permissions to view the files attached to this post.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Super large cleaner.log file made my server crash

Post by WillemDH »

Hey,

I get 'Component installation failed. Uploaded zip file is not a component.'

Grtz

Willem
Nagios XI 5.8.1
https://outsideit.net
jomann
Development Lead
Posts: 611
Joined: Mon Apr 22, 2013 10:06 am
Location: Nagios Enterprises

Re: Super large cleaner.log file made my server crash

Post by jomann »

You'll have to do the manual unzip then, didn't think it checked when it was encrypted but you can overwrite the "scheduledbackups" folder in the /usr/local/nagiosxi/html/includes/components directory with the file I gave you. The zip contains the main folder "scheduledbackups" inside it so you can extract it directly into the components directory and overwrite the old one or if you want you can rename the old scheduledbackups to something like "scheduledbackups_old"
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Super large cleaner.log file made my server crash

Post by WillemDH »

I guess you mean '/usr/local/nagiosxi/html/includes/components/scheduledbackups'. Gonna install it in 5 minutes. I'll keep you posted.
Nagios XI 5.8.1
https://outsideit.net
jomann
Development Lead
Posts: 611
Joined: Mon Apr 22, 2013 10:06 am
Location: Nagios Enterprises

Re: Super large cleaner.log file made my server crash

Post by jomann »

Yes sorry, let me know how it goes.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Super large cleaner.log file made my server crash

Post by WillemDH »

Ok, installed the new component. Then removed the cleaner.log, which was gigantic in the meantime and then I killed the php process. I think the diskspace was freed now. I'll monitor it closely. Did you read my other post about the ftp backup not working as expected? http://support.nagios.com/forum/viewtop ... 16&t=28079 If you would find some time to answer soem of my questions there about ftp backup?

As the backups are already 500+ MB, they need to be moved and not copied.

Also when I re-enable local backups and test permissions, I get "the directory specified does not exist". This is a mounted volume, which I can perfectly access with cli...

Thanks for the fast solution Jake!

Grtz

Willem
You do not have the required permissions to view the files attached to this post.
Nagios XI 5.8.1
https://outsideit.net
Locked