Page 1 of 1
Nagios Server Disk Usage filled by error log
Posted: Mon May 05, 2014 10:29 am
by ecarrasq
System Info
Linux Distribution and version?
*Linux fwapp003.wvus.org 2.6.18-371.4.1.el5PAE #1 SMP Thu Jan 30 06:51:58 EST 2014 i686 i686 i386 GNU/Linux
CentOS release 5.10 (Final)
32 or 64bit?
32 bit
VMware Image or Manual Install of XI?
VMware Image
Are there specials configurations on your system, ie; is Gnome installed? Are you using a proxy? Are you using SSL?
Forced SSL on NagiosXi login web screen
Nagios Xi version: Nagios XI 2012R2.8
Issue:
Nagios Server Disk Usage filled and crashed the server I attached some info from the logs. Logs were manually cleared and server is back up now.
All server applications halted.
What would have caused this?
Please help with Root Cause Analysis, as we cannot afford for this to happen again.
Re: Nagios Server Disk Usage filled by error log
Posted: Mon May 05, 2014 1:26 pm
by abrist
ecarrasq wrote:What would have caused this?
I presume you answered this question yourself earlier:
ecarrasq wrote:Nagios Server Disk Usage filled and crashed the server
When the disk fills up, services cannot create lock files, halting their startup.
Re: Nagios Server Disk Usage filled by error log
Posted: Mon May 05, 2014 6:07 pm
by ecarrasq
Thanks, but what error would have caused this log file to fill up?
Re: Nagios Server Disk Usage filled by error log
Posted: Mon May 05, 2014 7:21 pm
by Box293
So to resovle your issue, how did you free up disk space? Delete some files? Where were they located?
Re: Nagios Server Disk Usage filled by error log
Posted: Tue May 06, 2014 12:15 pm
by ecarrasq
Echoed out all Apache error logs
Purged /usr/local/nagvis/var/nagvis-audit.log
Configured new cronjob for 6am daily to purge echo > /usr/local/nagvis/var/nagvis-audit.log
Checked perms on /usr/local/nagiosxi/html/images/locale/.htaccess - file doesn't exist
Created file - touch /usr/local/nagiosxi/html/images/locale/.htaccess
Changed ownership of file - chown nagios:nagios /usr/local/nagiosxi/html/images/locale/.htaccess
Checked apache logs again, /var/www/html/favicon.ico didn't exist
Created file - touch /var/www/html/favicon.ico, default perms of 644 were picked up on creation
We were able to "fix" the current issue, but we need to know the cause, so it doesn't happen again.
Re: Nagios Server Disk Usage filled by error log
Posted: Tue May 06, 2014 12:48 pm
by lmiltchev
Did you check which log filled up the most? This can give you a hint where to look for the issue.
You can clear the biggest log files (if you don't need the info) with "cat /dev/null > <your log file>" and check later how fast (and which ones) they are growing:
Code: Select all
du -a /var/log/ | sort -n -r | head -n 10
Re: Nagios Server Disk Usage filled by error log
Posted: Tue May 06, 2014 12:57 pm
by ecarrasq
Did you check which log filled up the most?
the /var/log/httpd/error.log
Also, what is on line 93, as the error.log was full of the following error (We cannot see it, as the coreuiproxy.inc.php is encrypted):
PHP Warning: feof(): supplied argument is not a valid stream resource in /usr/local/nagiosxi/html/includes/components/nagioscore/coreuiproxy.inc.php on line 93, referer:
https://fwapp003.wvus.org/nagiosxi/views/
Re: Nagios Server Disk Usage filled by error log
Posted: Tue May 06, 2014 3:59 pm
by slansing
Do you have any views that are pointing to pages that are non standard in Nagios XI? I.e. (not pages you grabbed using the get-permalink icon). I'd definitely clear some of your old apache logs if I were you, keep around a few of the most recent ones as we might need them to help troubleshoot.