Apparently this weekend our Nagios production server crashed, since a file in /usr/local:nagiosxi/var named cleaner.log took all available disk space. After deleting the file, the disk space was not automatically freed. We tried restarting httpd, postgresql, mysqld, initd and nagios service, but the used disk space did not became available, so I had to reboot the server (which did seem to free up the used disk space) and had to execute the mysql repair script in order to make Nagios XI work again.
Could I please get some help in finding out what the reason was that this file grew so excessively? Se screenshot for more details..
Df -h after reboot.
Code: Select all
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
34G 14G 19G 43% /
tmpfs 1.9G 0 1.9G 0% /dev/shm
/dev/sda1 97M 28M 65M 31% /boot
As we migrated the backend storage, the location where I rsync the backups had been changed. I used to do
Code: Select all
rsync --remove-source-files -azv /store/backups/nagiosxi /var/Digipolis/BackupCode: Select all
rsync --remove-source-files --no-perms -r --no-o --no-g --inplace /store/backups/nagiosxi /var/Digipolis/BackupEDIT 1:
Ok, in the meantime it seems soms php process is using 100 % cpu, I saw this same proces sing 100 % cpu this morning. I attached a screenshot. What could be causing this process to use 100 % cpu?
EDIT 2:
Ok, in the meantime I discovered that /usr/local/nagiosxi/cron/cleaner.php is the evil command using up all the server resources (CPU + disk ^^)
Code: Select all
ps -eo pcpu,pid,user,args | sort -k 1 -r | head -25
%CPU PID USER COMMAND
96.8 3418 nagios /usr/bin/php -q /usr/local/nagiosxi/cron/cleaner.php
8.3 1438 root [flush-253:1]
3.4 8777 mysql /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
2.1 24163 nagios /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
2.0 11126 apache /usr/sbin/httpd
1.9 25224 apache /usr/sbin/httpd
1.7 25226 apache /usr/sbin/httpd
1.7 25225 apache /usr/sbin/httpd
1.6 25 root [ksoftirqd/5]
1.6 25250 apache /usr/sbin/httpd
1.6 25227 apache /usr/sbin/httpd
1.6 24161 nagios /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
1.5 32662 nagios /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
1.5 25248 apache /usr/sbin/httpd
1.5 25245 apache /usr/sbin/httpd
1.4 25244 apache /usr/sbin/httpd
1.4 25243 apache /usr/sbin/httpd
1.4 21237 apache /usr/sbin/httpd
1.3 24162 nagios /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php
1.3 24157 nagios /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php
1.2 3424 apache /usr/sbin/httpd
1.2 24155 nagios /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php
1.1 3188 apache /usr/sbin/httpd
1.1 3187 apache /usr/sbin/httpd
EDIT 3:
It seems the cleaner.log file is again 759575031549 bytes! I opened the file and it's full of
Code: Select all
PHP Warning: readdir() expects parameter 1 to be resource, boolean given in /usr/local/nagiosxi/html/includes/components/scheduledbackups/scheduledbackups.inc.php on line 429
PHP Warning: readdir() expects parameter 1 to be resource, boolean given in /usr/local/nagiosxi/html/includes/components/scheduledbackups/scheduledbackups.inc.php on line 429
PHP Warning: readdir() expects parameter 1 to be resource, boolean given in /usr/local/nagiosxi/html/includes/components/scheduledbackups/scheduledbackups.inc.php on line 429
PHP Warning: readdir() expects parameter 1 to be resource, boolean given in /usr/local/nagiosxi/html/includes/components/scheduledbackups/scheduledbackups.inc.php on line 429
PHP Warning: readdir() expects parameter 1 to be resource, boolean given in /usr/local/nagiosxi/html/includes/components/scheduledbackups/scheduledbackups.inc.php on line 429
PHP Warning: readdir() expects parameter 1 to be resource, boolean given in /usr/local/nagiosxi/html/includes/components/scheduledbackups/scheduledbackups.inc.php on line 429
Thanks..
Willem