THe file count is
ls /usr/local/nagios/var/spool/xidpe/ | wc -l
911506
I am trying to tar it.so at least it could use less space, but b'coz of it's huge size tar or zip both get stucks at some point.
One more thing i am not able t understand when i zipped xidpefiles..below are compare results
no of file in /usr/local/nagios/var/spool/xidpe : 911506
no of files in zip file : wc -l file.zip -2993397 file.zip
Size of /usr/local/nagios/var/spool/xidpe : 14G
Size of file.zip : du -mh file.zip : 844M file.zip
The LUN mapped for Nagios is utilizing huge IO
-
wiproltdwiv
- Posts: 281
- Joined: Sat Sep 08, 2012 12:52 am
Re: The LUN mapped for Nagios is utilizing huge IO
As abrist mentioned in his last post, you will need to remove those files using the "find" command he posted. The sheer number of them is too many for most utilities to parse, which causes a lot of the problems you are seeing.
Former Nagios employee
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: The LUN mapped for Nagios is utilizing huge IO
I just wanted to take a step back and explain the purpose of xidpe.
When Nagios receives a host or service check result that contains performance data, it initially gets dumped into the folder /usr/local/nagios/var/spool/xidpe/
Nagios can then continue it's job, it really doesn't care what happens to the performance data after that.
Another process will come along and for any files in /usr/local/nagios/var/spool/xidpe/, it will take the data and insert it into the round robin database files in /usr/local/nagios/share/perfdata/HOST/SERVICE_NAME. It will then delete the file in /usr/local/nagios/var/spool/xidpe/.
So the /usr/local/nagios/var/spool/xidpe/ folder usually has a low amount of files in it.
For one reason or another, process of taking these files from /usr/local/nagios/var/spool/xidpe/ has stopped. Because of this, the amount of files has spooled up. There are now too many files for the usual cleanup processes to work, so you need to follow abrists steps:
We need to delete these files to fix your problem. I would not bother tarring them as once we get it working again, you will not be able to process the old performance data any more. NOTE: any data that appears in the Performance Graphs will be kept, you're just going to have a big gap in the graphs, from when it broke to when it started working again.
Once you delete the files I would increase the logging verbosity and then take a deeper look into the logs. Follow the FAQ entry below to increase the log level of process_perfdata and npcd:
http://support.nagios.com/wiki/index.ph ... leshooting
Wait 15 - 20 minutes and then get a tail of the logs:
Send us a copy of /tmp/perfdata.txt and /tmp/npcd.txt
Don't forget to turn down the log level as per the FAQ when you are done!
When Nagios receives a host or service check result that contains performance data, it initially gets dumped into the folder /usr/local/nagios/var/spool/xidpe/
Nagios can then continue it's job, it really doesn't care what happens to the performance data after that.
Another process will come along and for any files in /usr/local/nagios/var/spool/xidpe/, it will take the data and insert it into the round robin database files in /usr/local/nagios/share/perfdata/HOST/SERVICE_NAME. It will then delete the file in /usr/local/nagios/var/spool/xidpe/.
So the /usr/local/nagios/var/spool/xidpe/ folder usually has a low amount of files in it.
For one reason or another, process of taking these files from /usr/local/nagios/var/spool/xidpe/ has stopped. Because of this, the amount of files has spooled up. There are now too many files for the usual cleanup processes to work, so you need to follow abrists steps:
abrist wrote:You can remove the files with the following command, but you should be aware that you will have gaps in your graphs representing the time associated with each of these performance data files in xidpe:
Code: Select all
cd /usr/local/nagios/var/spool/xidpe/ find . -type f -delete
We need to delete these files to fix your problem. I would not bother tarring them as once we get it working again, you will not be able to process the old performance data any more. NOTE: any data that appears in the Performance Graphs will be kept, you're just going to have a big gap in the graphs, from when it broke to when it started working again.
Once you delete the files I would increase the logging verbosity and then take a deeper look into the logs. Follow the FAQ entry below to increase the log level of process_perfdata and npcd:
http://support.nagios.com/wiki/index.ph ... leshooting
Wait 15 - 20 minutes and then get a tail of the logs:
Code: Select all
tail -250 /usr/local/nagios/var/perfdata.log > /tmp/perfdata.txt
tail -250 /usr/local/nagios/var/npcd.log > /tmp/npcd.txtDon't forget to turn down the log level as per the FAQ when you are done!
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: The LUN mapped for Nagios is utilizing huge IO
Yes you will have a gap in every graph from when this problem started until you fix the problem. Actually the gap will already exist now if I am correct. If the gap does not exist in your current graphs then there is an underlying problem with these files not being removed after the performance data is taken from them.wiproltdwiv wrote:Are you sure that after deleting these files, we'll face issue in graphs in front-end.
In any case, the steps outlined are the first step in fixing your problems.
There is not one file, it's a LOT of files, 921,356 and counting. I'm not sure why you are trying to tar or zip the folder.wiproltdwiv wrote:This is such a huge file that even tar or zip is not happening.
Is there something in particular that you don't understand?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.