More decimal strangeness

Post by **WillemDH** » Fri Aug 21, 2015 11:23 am

Hmm I installed the tool, but it seems to autoselect average and seems not really consistent with the data I'm seeing in the service graph. Weird. Some other services do seem to be corrrect..

tmcdonald · Post by **tmcdonald** » Fri Aug 21, 2015 2:05 pm

WillemDH wrote:Some other services do seem to be corrrect..

Can you see any pattern between the working and non-working? It could just be an artifact of their averages/values, but there might be something about the checks themselves.

Post by **Box293** » Sun Aug 23, 2015 8:47 pm

WillemDH wrote:Hmm I installed the tool, but it seems to autoselect average and seems not really consistent with the data I'm seeing in the service graph

In this screenshot, it shows the performance data returned by the plugin that last time it ran. It is untouched and is everything after the pipe | symbol.

This data is from the RRD file, it has been averaged.

When Nagios receives performance data, it processes it through a series of commands and via the npcd daemon. This data is inserted into an RRD file and this is where the numbers get averaged out. They will only ever be the same number if the number does not change over a period of time.

This is why looking at performance graphs from the past day, week and month can be deceiving. For example:
A server has a load of 48% from 9am - 5pm (thats an eight hour period). Realistically this is the data you are interested in.
From 5pm - 9am it has a load of 11% (thats a sixteen hour period).

When you look at the graph from the last week, the data gets averaged out again as 2/3 of the data is much lower than the other third, making the 9am-5pm data smaller that what it really it.

The same thing happens when you look at data from the past month.

Basically, performance data (in almost all monitoring systems) consumes a lot of storage space if you wanted to keep every measurement for the life of the monitored object. This is data is averaged out over time, it's about reducing the amount of storage space used. This is how PNP4Nagios works (npcd) and RRD files. The RRD files can be tuned to keep more measurements however ultimately it gets averaged.

Personally I've wanted to implemented a 1:1 scale performance data system because these days storage is not as costly as it used to be. I've wanted to look back at checks like "active users" comparing last month to this month but the graphs lie and don't show a true representation of what really happened. I'll probably get to it in n+1 years based on the current ideas I have floating in my head

Post by **WillemDH** » Mon Aug 24, 2015 3:21 pm

Thanks for this extensive answer Troy. It's just a pity some visual presentation (as in the first screenshot) show suddenly a different amount of decimals and that a check like the process count in the second example who should not show anything else then integers, suddenly becomes a float. Anyway, you can close this thread. I know you guys are aware and I guess I'll have to live with these side effects.

Grtz

Willem

Nagios Support Forum

More decimal strangeness

Re: More decimal strangeness

Re: More decimal strangeness

Re: More decimal strangeness

Re: More decimal strangeness