Page 1 of 3
aggregating performance data
Posted: Wed May 20, 2015 2:28 pm
by tonyleatwork
Hi -
Once a quarter we provide OPs review metrics which helps highlight the heavy hitters in system utilization. This data in conjunction with our applications metrics show which systems are currently or close to starving for resources.
In our previous monitoring system, we can simply provide a service criteria (CPU for example) and a threshold (e.g. the amount of time a system sits at 95% utilization). I understand this is tougher to do in Nagios since it is a framework type of platform and it's not aware of the content of the graphs so this will most likely be a manual process (at least initially).
To help with this process, is there a way I can aggregate performance data from multiple systems and grab similar metrics ? Like who are the top CPU consumers (for a CPU service I defined) for a particular host group for example?
I've played around graph explorer but it just allows me to stack graphs not really aggregate the data.
Thanks in advance!
Re: aggregating performance data
Posted: Wed May 20, 2015 3:42 pm
by lmiltchev
The Metrics component would be the best option if one could select different timeperiods. Currently, the component only shows the utilization/graphs for the last 24 hours. There is an internal feature request for adding the ability to select different timeperiods in the Metrics component (TASK ID 5683) but I am not sure when/if this will be implemented.
Re: aggregating performance data
Posted: Thu May 21, 2015 9:05 am
by tonyleatwork
Hi -
Thanks for the response. I did find 'Top alert producers' under 'reports' which I think I can use to scrape the data from but is there a way to filter it out to a specific service, exclude certain alerts (we dont care about UNKNOWNs) and include WARNINGS in that data?
Re: aggregating performance data
Posted: Thu May 21, 2015 1:21 pm
by lmiltchev
...but is there a way to filter it out to a specific service, exclude certain alerts (we dont care about UNKNOWNs) and include WARNINGS in that data?
Not at the moment. You can filter by hostgroup or servicegroup only. You may be able to exclude "Unknowns" if you modify the "/usr/local/nagiosxi/html/reports/topalertproducers.php" but I am not sure about that. If you decide to modify this file, you will be on your own.
I am a bit confused though - initially, you said:
...is there a way I can aggregate performance data from multiple systems and grab similar metrics ?
then you asked:
...is there a way to filter it out to a specific service...
What exactly are you trying to accomplish? Are you trying to aggregate similar metrics, or you need to view the data from one service?
Re: aggregating performance data
Posted: Fri May 22, 2015 9:41 am
by tonyleatwork
lmiltchev wrote:...but is there a way to filter it out to a specific service, exclude certain alerts (we dont care about UNKNOWNs) and include WARNINGS in that data?
Not at the moment. You can filter by hostgroup or servicegroup only. You may be able to exclude "Unknowns" if you modify the "/usr/local/nagiosxi/html/reports/topalertproducers.php" but I am not sure about that. If you decide to modify this file, you will be on your own.
I am a bit confused though - initially, you said:
...is there a way I can aggregate performance data from multiple systems and grab similar metrics ?
then you asked:
...is there a way to filter it out to a specific service...
What exactly are you trying to accomplish? Are you trying to aggregate similar metrics, or you need to view the data from one service?
I apologize for the confusion, let me ask this another way.. How do I view performance metrics for a particular service from multiple systems?
i.e.
Top 5 CPU Usage (CPU Usage is a title of the service) from all of the servers in Hostgroup X last quarter:
Ideally it would just pop up the CPU Usage performance graphs from the top 5 systems within the 'last quarter' date range.
I found the 'metrics' component but it doesn't work properly (only one system shows up) and I cant specify a date range.
Just to add some pain on my side, our quarter ended and I'm really trying to get this data ASAP. I'm throwing the graphs together piece meal, but it doesn't sort and it's a lot of manual analysis for 600+ servers (3 graphs from each!).
Re: aggregating performance data
Posted: Fri May 22, 2015 12:53 pm
by jolson
How do I view performance metrics for a particular service from multiple systems?
The graph explorer accomplishes this nicely, but as you said in your original post - this does not work for you. You stated that "it just allows me to stack graphs not really aggregate the data".
The answer here is that we don't currently have a good way of doing what you're requesting.
When the Metrics component is updated, that could certainly be a resolution. For now I'm afraid there isn't a way to get this accomplished. I have added priority on to the feature request mentioned by lmiltchev for you.
Re: aggregating performance data
Posted: Fri May 22, 2015 3:15 pm
by scottwilkerson
Or you can run an availability report for a particular hostgroup or service group and in advanced options check the performance graphs check box
Re: aggregating performance data
Posted: Tue May 26, 2015 11:31 am
by tonyleatwork
scottwilkerson wrote:Or you can run an availability report for a particular hostgroup or service group and in advanced options check the performance graphs check box
Thanks for this suggestion Scott. It does seem like a nice quick way to get all of the graphs on the screen but unfortunately doesn't aggregate the data in the way we need (like which one is the top consumers of CPU for example).
As a work around, is there a way to display all of the graphs for a particular service for a hostgroup and date range?
Re: aggregating performance data
Posted: Tue May 26, 2015 3:27 pm
by abrist
You could use the graph explorer multistacked graph option, but you would have to add all the services by hand.
Re: aggregating performance data
Posted: Wed May 27, 2015 4:31 am
by WillemDH
Hello,
This is a feature we also need at Digipolis. I made a feature request for this some time ago and Scott has been working on it. Please +1 this feature request and add a comment with what you would like to see in the new metrics component.
http://tracker.nagios.com/view.php?id=471
Grtz
Willem