Page 4 of 6

Re: Tactical Overview, Ops Center, Ops Screen Problems

Posted: Fri Mar 01, 2013 12:47 pm
by jbennett
Just to clarify, the box that is out of space is the one I am trying to migrate FROM, for this reason.

Re: Tactical Overview, Ops Center, Ops Screen Problems

Posted: Fri Mar 01, 2013 1:49 pm
by scottwilkerson
Ok, lets run this to remove the files we just created in the backup process

Code: Select all

cd /usr/local/nagios/share/perfdata
rm -rf *.rrd.xml
then you will likely need to restart the following

Code: Select all

service postgresql restart
service mysqld restart
Finally run the commands abrist suggested so we can find files that can be removed to allow the space

Re: Tactical Overview, Ops Center, Ops Screen Problems

Posted: Mon Mar 04, 2013 11:49 am
by jbennett
abrist wrote:If the XI box is out of space, you will have issues with many parts of XI. Run the following commands from the cli and post the output in a code wrap. This will help us track down where the space went.
I bet the problems are in /usr or /var, they always are.

Code: Select all

cd /
df -i
du -hsx * | sort -rh | head -10
find . -type f -print0 | xargs -0 du -s | sort -n | tail -10 | cut -f2 | xargs -I{} du -sh {}
find . -type d -print0 | xargs -0 du -s | sort -n | tail -10 | cut -f2 | xargs -I{} du -sh {}

Code: Select all

[root@nagiosxivm /]# df -i
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/mapper/VolGroup00-LogVol00
                     7122336  206839 6915497    3% /
tmpfs                 160609       1  160608    1% /dev/shm
/dev/sda1              25688      57   25631    1% /boot
tmpfs                 160609      71  160538    1% /var/nagiosramdisk

[root@nagiosxivm /]# du -hsx * | sort -rh | head -10
du: cannot access `proc/26395/task/26395/fd/4': No such file or directory
du: cannot access `proc/26395/task/26395/fdinfo/4': No such file or directory
du: cannot access `proc/26395/fd/4': No such file or directory
du: cannot access `proc/26395/fdinfo/4': No such file or directory
du: cannot access `proc/26398': No such file or directory
du: cannot access `proc/26399': No such file or directory
du: cannot access `proc/26400': No such file or directory
du: cannot access `proc/26402': No such file or directory
du: cannot access `proc/26404': No such file or directory
du: cannot access `proc/26406': No such file or directory
du: cannot access `proc/26408': No such file or directory
du: cannot access `proc/26409': No such file or directory
du: cannot access `proc/26410': No such file or directory
du: cannot access `proc/26412': No such file or directory
du: cannot access `proc/26413': No such file or directory
22G     usr
16G     var
2.3G    store
290M    lib
209M    tmp
81M     boot
58M     etc
24M     root
9.7M    sbin
6.3M    bin

[root@nagiosxivm /]# find . -type f -print0 | xargs -0 du -s | sort -n | tail -10 | cut -f2 | xargs -I{} du -sh {}
{a whole list of 'No such file or directory' items in addition to the below:}
1.1G    ./var/lib/mysql/mysql-bin.000003
1.1G    ./var/lib/mysql/mysql-bin.000004
1.1G    ./var/lib/mysql/mysql-bin.000005
1.1G    ./var/lib/mysql/mysql-bin.000006
1.1G    ./var/lib/mysql/mysql-bin.000007
1.1G    ./var/lib/mysql/mysql-bin.000008
1.1G    ./var/lib/mysql/mysql-bin.000010
1.1G    ./var/lib/mysql/mysql-bin.000011
1.1G    ./var/lib/mysql/mysql-bin.000012
1.1G    ./store/backups/nagiosxi/1358874081.tar.gz

[root@nagiosxivm /]# find . -type d -print0 | xargs -0 du -s | sort -n | tail -10 | cut -f2 | xargs -I{} du -sh {}
{a whole list of 'No such file or directory' items in addition to the below:}
6.7G    ./usr/local/nagios/var
13G     ./var/lib/mysql
13G     ./var/lib
14G     ./usr/local/nagios/share/perfdata
14G     ./usr/local/nagios/share
16G     ./var
20G     ./usr/local/nagios
21G     ./usr/local
22G     ./usr
41G     .
I have added some space via the VM, but I have not been able to extend the partition in the OS as of yet. It's not entirely straight forward being that it's a LVM.

Re: Tactical Overview, Ops Center, Ops Screen Problems

Posted: Mon Mar 04, 2013 12:01 pm
by jbennett
scottwilkerson wrote:Ok, lets run this to remove the files we just created in the backup process

Code: Select all

cd /usr/local/nagios/share/perfdata
rm -rf *.rrd.xml
then you will likely need to restart the following

Code: Select all

service postgresql restart
service mysqld restart
Finally run the commands abrist suggested so we can find files that can be removed to allow the space
When I run the command to remove the .rrd.xml files that were created, it doesn't appear to do anything. As a result, when I remove the switched to not prompt, I get the following:
rm: cannot remove `*.rrd.xml': No such file or directory

When I ls the directory, I see a ton of folders and inside each of those folders, I have the .rrd.xml files.

Would I be correct in using the following to remove these files from all sub-directories?

Code: Select all

find /usr/local/nagios/share/perfdata -type f -iname "*.rrd.xml" -exec rm -f {} \;

Re: Tactical Overview, Ops Center, Ops Screen Problems

Posted: Mon Mar 04, 2013 12:30 pm
by abrist
Your command looks right. I just tested it to be sure ;)

Re: Tactical Overview, Ops Center, Ops Screen Problems

Posted: Mon Mar 04, 2013 12:32 pm
by jbennett
abrist wrote:Your command looks right. I just tested it to be sure ;)
Thanks! That appears to have done the trick:

Code: Select all

[root@nagiosxivm]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      108G   41G   63G  40% /
tmpfs                 7.4G     0  7.4G   0% /dev/shm
/dev/sda1              97M   82M   11M  89% /boot
tmpfs                  50M   33M   18M  65% /var/nagiosramdisk
One other thing that I've noticed when looking in this directory - I have a number of folders that are from what appears to be old hosts that have since been renamed and/or deleted entirely. Those that have been renamed, are there, as they should be. For example - We went o a uniform way of naming hosts. Previously, the admin had some in all caps, some in lower case, etc. When I went back through and changed everything to all caps, it appears as if it didn't remove the associated folders. I do have the new folders, in all caps. Am I safe to delete these folders? Or are they actually the host groups?

Re: Tactical Overview, Ops Center, Ops Screen Problems

Posted: Mon Mar 04, 2013 1:05 pm
by jbennett
I have since updated my above post to reflect the commands to display directories that are large once I removed the files mentioned above.

Re: Tactical Overview, Ops Center, Ops Screen Problems

Posted: Mon Mar 04, 2013 1:41 pm
by scottwilkerson
jbennett wrote:Am I safe to delete these folders? Or are they actually the host groups?
You are safe in removing these old folder and that would really help reduce the increase of disk-space in the export.

Actually you could find all the files that haven't been updated in 30 days with the following command

Code: Select all

find /usr/local/nagios/share/perfdata/* -mtime +30 -exec ls -l {} \;
CAREFUL WITH THIS
and remove them, if this looks right with

Code: Select all

find /usr/local/nagios/share/perfdata/* -mtime +30 -exec rm -rf {} \;

Re: Tactical Overview, Ops Center, Ops Screen Problems

Posted: Mon Mar 04, 2013 2:28 pm
by jbennett
I will try this for sure. I am running a backup now just to be sure. Once that is finished, I will try this as well. I think I can clear off a bunch of space.

Once this is done, I will copy over the files to assist with the graphing issue on the new Nagios server. However, will this resolve my initial problem of the data being old on the screens I mentioned and/or no data showing up at all?

Re: Tactical Overview, Ops Center, Ops Screen Problems

Posted: Mon Mar 04, 2013 3:11 pm
by abrist
Wasn't the initial problems due to a change in architecture causing the rrds to not be read/written? That and the timeout/load issues with npcd. Lets get the rrds handled and then move on to any other problems once they present themselves.