More decimal strangeness

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: More decimal strangeness

Post by WillemDH »

Hmm I installed the tool, but it seems to autoselect average and seems not really consistent with the data I'm seeing in the service graph. Weird. Some other services do seem to be corrrect..
You do not have the required permissions to view the files attached to this post.
Nagios XI 5.8.1
https://outsideit.net
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: More decimal strangeness

Post by tmcdonald »

WillemDH wrote:Some other services do seem to be corrrect..
Can you see any pattern between the working and non-working? It could just be an artifact of their averages/values, but there might be something about the checks themselves.
Former Nagios employee
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: More decimal strangeness

Post by Box293 »

WillemDH wrote:Hmm I installed the tool, but it seems to autoselect average and seems not really consistent with the data I'm seeing in the service graph
In this screenshot, it shows the performance data returned by the plugin that last time it ran. It is untouched and is everything after the pipe | symbol.
Image

This data is from the RRD file, it has been averaged.
Image

When Nagios receives performance data, it processes it through a series of commands and via the npcd daemon. This data is inserted into an RRD file and this is where the numbers get averaged out. They will only ever be the same number if the number does not change over a period of time.

This is why looking at performance graphs from the past day, week and month can be deceiving. For example:
A server has a load of 48% from 9am - 5pm (thats an eight hour period). Realistically this is the data you are interested in.
From 5pm - 9am it has a load of 11% (thats a sixteen hour period).

When you look at the graph from the last week, the data gets averaged out again as 2/3 of the data is much lower than the other third, making the 9am-5pm data smaller that what it really it.

The same thing happens when you look at data from the past month.

Basically, performance data (in almost all monitoring systems) consumes a lot of storage space if you wanted to keep every measurement for the life of the monitored object. This is data is averaged out over time, it's about reducing the amount of storage space used. This is how PNP4Nagios works (npcd) and RRD files. The RRD files can be tuned to keep more measurements however ultimately it gets averaged.

Personally I've wanted to implemented a 1:1 scale performance data system because these days storage is not as costly as it used to be. I've wanted to look back at checks like "active users" comparing last month to this month but the graphs lie and don't show a true representation of what really happened. I'll probably get to it in n+1 years based on the current ideas I have floating in my head :lol:
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: More decimal strangeness

Post by WillemDH »

Thanks for this extensive answer Troy. It's just a pity some visual presentation (as in the first screenshot) show suddenly a different amount of decimals and that a check like the process count in the second example who should not show anything else then integers, suddenly becomes a float. Anyway, you can close this thread. I know you guys are aware and I guess I'll have to live with these side effects.

Grtz

Willem
Nagios XI 5.8.1
https://outsideit.net
Locked