Page 2 of 2

Re: The LUN mapped for Nagios is utilizing huge IO

Posted: Thu Mar 19, 2015 3:45 am
by wiproltdwiv
THe file count is
ls /usr/local/nagios/var/spool/xidpe/ | wc -l
911506


I am trying to tar it.so at least it could use less space, but b'coz of it's huge size tar or zip both get stucks at some point.

One more thing i am not able t understand when i zipped xidpefiles..below are compare results

no of file in /usr/local/nagios/var/spool/xidpe : 911506
no of files in zip file : wc -l file.zip -2993397 file.zip

Size of /usr/local/nagios/var/spool/xidpe : 14G
Size of file.zip : du -mh file.zip : 844M file.zip

Re: The LUN mapped for Nagios is utilizing huge IO

Posted: Thu Mar 19, 2015 9:52 am
by tmcdonald
As abrist mentioned in his last post, you will need to remove those files using the "find" command he posted. The sheer number of them is too many for most utilities to parse, which causes a lot of the problems you are seeing.

Re: The LUN mapped for Nagios is utilizing huge IO

Posted: Thu Mar 19, 2015 5:15 pm
by Box293
I just wanted to take a step back and explain the purpose of xidpe.

When Nagios receives a host or service check result that contains performance data, it initially gets dumped into the folder /usr/local/nagios/var/spool/xidpe/
Nagios can then continue it's job, it really doesn't care what happens to the performance data after that.
Another process will come along and for any files in /usr/local/nagios/var/spool/xidpe/, it will take the data and insert it into the round robin database files in /usr/local/nagios/share/perfdata/HOST/SERVICE_NAME. It will then delete the file in /usr/local/nagios/var/spool/xidpe/.
So the /usr/local/nagios/var/spool/xidpe/ folder usually has a low amount of files in it.

For one reason or another, process of taking these files from /usr/local/nagios/var/spool/xidpe/ has stopped. Because of this, the amount of files has spooled up. There are now too many files for the usual cleanup processes to work, so you need to follow abrists steps:
abrist wrote:You can remove the files with the following command, but you should be aware that you will have gaps in your graphs representing the time associated with each of these performance data files in xidpe:

Code: Select all

    cd /usr/local/nagios/var/spool/xidpe/
    find . -type f -delete

We need to delete these files to fix your problem. I would not bother tarring them as once we get it working again, you will not be able to process the old performance data any more. NOTE: any data that appears in the Performance Graphs will be kept, you're just going to have a big gap in the graphs, from when it broke to when it started working again.

Once you delete the files I would increase the logging verbosity and then take a deeper look into the logs. Follow the FAQ entry below to increase the log level of process_perfdata and npcd:

http://support.nagios.com/wiki/index.ph ... leshooting

Wait 15 - 20 minutes and then get a tail of the logs:

Code: Select all

tail -250 /usr/local/nagios/var/perfdata.log > /tmp/perfdata.txt
tail -250 /usr/local/nagios/var/npcd.log > /tmp/npcd.txt
Send us a copy of /tmp/perfdata.txt and /tmp/npcd.txt

Don't forget to turn down the log level as per the FAQ when you are done!

Re: The LUN mapped for Nagios is utilizing huge IO

Posted: Fri Mar 20, 2015 12:14 am
by Box293
wiproltdwiv wrote:Are you sure that after deleting these files, we'll face issue in graphs in front-end.
Yes you will have a gap in every graph from when this problem started until you fix the problem. Actually the gap will already exist now if I am correct. If the gap does not exist in your current graphs then there is an underlying problem with these files not being removed after the performance data is taken from them.

In any case, the steps outlined are the first step in fixing your problems.
wiproltdwiv wrote:This is such a huge file that even tar or zip is not happening.
There is not one file, it's a LOT of files, 921,356 and counting. I'm not sure why you are trying to tar or zip the folder.

Is there something in particular that you don't understand?