Nagios XI - Performance Data Averaging

Overview

This KB article will explain how performance data is averaged over time and why.

Performance Data Explained

In Nagios XI the data that is displayed in the performance graphs comes from data files called a Round Robin Database (RRD).

Those files are a fixed size and to store the data for long periods, it has to be averaged out with the previous data.

When that happens, there will be a loss of accuracy and when you specify a time range further back in time, there will be less accuracy and the data will be different.

Generally you only see this behavior when your performance data has peaks and troughs. Checks like disk usage that usually report the same value with little variation do not get averaged as much due to the similarity of the data samples in the same time period.

When an RRD file is first created it is uses the config file /usr/local/nagios/etc/pnp/rra.cfg to determine how many data entries will be stored within it, the size of the file will always remain the same size.

Here is an explanation of how the config file creates the RRD file:

RRA:AVERAGE:0.5:1:2880
The above entry will store 2880 entries of the average data in the rrd file with 1 minute step for each entry which equals 48 hours of data.

These entries will be the most accurate.

After the 48 hours, the last entry will be removed and added to the next section. It will be averaged and some accuracy will be lost.

If you display a graph in this timeframe, this is the data you are looking at.

RRA:AVERAGE:0.5:5:2880

The above entry will store 2880 entries of the average data in the rrd file with 5 minute step for each entry which equals 10 days of data.

Since the data is an average of 5 of the one minute entries, the results will be slightly less accurate.

If you display a graph for example last 7 days, this is the data you are looking at plus the above entries.

RRA:AVERAGE:0.5:30:4320

The above entry will store 4320 entries of the average data in the rrd file with 30 minute steps for each entry which equals 90 days of data.

Since the data is an average of 6 of the 5 minute entries, the results will be slightly less accurate.

If you display a graph for example last month, this is the data you are looking at.

RRA:AVERAGE:0.5:360:5840

The above entry will store 5840 entries of the average data in the rrd file with 360 minute steps for each entry which equals 4 years of data.

Since the data is an average of of the 30 minute entries, the results will be slightly less accurate.

If you display a graph for example last year, this is the data you are looking at.

The reason the RRD files are used is to save space, generally people don't look at performance data after a specific duration has past.

The RRD files are approximately around 750K for about 47,760 data points. To save the most accuracy over a 4 year span with 1 minute intervals, that would be 4,579,200 data points and that would make a single RRD file 71MB in size, so the averaging keeps the file at an acceptable level. If you had 1000 hosts then that would use about 71GB of disk space, just for one performance graph per host. If those hosts had 10 services each then you are looking at about 710GB of disk space.

You can see how quickly disk space can be consumed storing old performance data that in reality is rarely accessed.

Final Thoughts

For any support related questions please visit the Nagios Support Forums at:

http://support.nagios.com/forum/