Hi -
Once a quarter we provide OPs review metrics which helps highlight the heavy hitters in system utilization. This data in conjunction with our applications metrics show which systems are currently or close to starving for resources.
In our previous monitoring system, we can simply provide a service criteria (CPU for example) and a threshold (e.g. the amount of time a system sits at 95% utilization). I understand this is tougher to do in Nagios since it is a framework type of platform and it's not aware of the content of the graphs so this will most likely be a manual process (at least initially).
To help with this process, is there a way I can aggregate performance data from multiple systems and grab similar metrics ? Like who are the top CPU consumers (for a CPU service I defined) for a particular host group for example?
I've played around graph explorer but it just allows me to stack graphs not really aggregate the data.
Thanks in advance!
aggregating performance data
Re: aggregating performance data
The Metrics component would be the best option if one could select different timeperiods. Currently, the component only shows the utilization/graphs for the last 24 hours. There is an internal feature request for adding the ability to select different timeperiods in the Metrics component (TASK ID 5683) but I am not sure when/if this will be implemented.
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
tonyleatwork
- Posts: 91
- Joined: Mon Jul 07, 2014 8:55 am
Re: aggregating performance data
Hi -
Thanks for the response. I did find 'Top alert producers' under 'reports' which I think I can use to scrape the data from but is there a way to filter it out to a specific service, exclude certain alerts (we dont care about UNKNOWNs) and include WARNINGS in that data?
Thanks for the response. I did find 'Top alert producers' under 'reports' which I think I can use to scrape the data from but is there a way to filter it out to a specific service, exclude certain alerts (we dont care about UNKNOWNs) and include WARNINGS in that data?
Re: aggregating performance data
Not at the moment. You can filter by hostgroup or servicegroup only. You may be able to exclude "Unknowns" if you modify the "/usr/local/nagiosxi/html/reports/topalertproducers.php" but I am not sure about that. If you decide to modify this file, you will be on your own....but is there a way to filter it out to a specific service, exclude certain alerts (we dont care about UNKNOWNs) and include WARNINGS in that data?
I am a bit confused though - initially, you said:
then you asked:...is there a way I can aggregate performance data from multiple systems and grab similar metrics ?
What exactly are you trying to accomplish? Are you trying to aggregate similar metrics, or you need to view the data from one service?...is there a way to filter it out to a specific service...
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
tonyleatwork
- Posts: 91
- Joined: Mon Jul 07, 2014 8:55 am
Re: aggregating performance data
I apologize for the confusion, let me ask this another way.. How do I view performance metrics for a particular service from multiple systems?lmiltchev wrote:Not at the moment. You can filter by hostgroup or servicegroup only. You may be able to exclude "Unknowns" if you modify the "/usr/local/nagiosxi/html/reports/topalertproducers.php" but I am not sure about that. If you decide to modify this file, you will be on your own....but is there a way to filter it out to a specific service, exclude certain alerts (we dont care about UNKNOWNs) and include WARNINGS in that data?
I am a bit confused though - initially, you said:
then you asked:...is there a way I can aggregate performance data from multiple systems and grab similar metrics ?
What exactly are you trying to accomplish? Are you trying to aggregate similar metrics, or you need to view the data from one service?...is there a way to filter it out to a specific service...
i.e.
Top 5 CPU Usage (CPU Usage is a title of the service) from all of the servers in Hostgroup X last quarter:
Ideally it would just pop up the CPU Usage performance graphs from the top 5 systems within the 'last quarter' date range.
I found the 'metrics' component but it doesn't work properly (only one system shows up) and I cant specify a date range.
Just to add some pain on my side, our quarter ended and I'm really trying to get this data ASAP. I'm throwing the graphs together piece meal, but it doesn't sort and it's a lot of manual analysis for 600+ servers (3 graphs from each!).
Re: aggregating performance data
The graph explorer accomplishes this nicely, but as you said in your original post - this does not work for you. You stated that "it just allows me to stack graphs not really aggregate the data".How do I view performance metrics for a particular service from multiple systems?
The answer here is that we don't currently have a good way of doing what you're requesting.
When the Metrics component is updated, that could certainly be a resolution. For now I'm afraid there isn't a way to get this accomplished. I have added priority on to the feature request mentioned by lmiltchev for you.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: aggregating performance data
Or you can run an availability report for a particular hostgroup or service group and in advanced options check the performance graphs check box
-
tonyleatwork
- Posts: 91
- Joined: Mon Jul 07, 2014 8:55 am
Re: aggregating performance data
Thanks for this suggestion Scott. It does seem like a nice quick way to get all of the graphs on the screen but unfortunately doesn't aggregate the data in the way we need (like which one is the top consumers of CPU for example).scottwilkerson wrote:Or you can run an availability report for a particular hostgroup or service group and in advanced options check the performance graphs check box
As a work around, is there a way to display all of the graphs for a particular service for a hostgroup and date range?
Re: aggregating performance data
You could use the graph explorer multistacked graph option, but you would have to add all the services by hand.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: aggregating performance data
Hello,
This is a feature we also need at Digipolis. I made a feature request for this some time ago and Scott has been working on it. Please +1 this feature request and add a comment with what you would like to see in the new metrics component.
http://tracker.nagios.com/view.php?id=471
Grtz
Willem
This is a feature we also need at Digipolis. I made a feature request for this some time ago and Scott has been working on it. Please +1 this feature request and add a comment with what you would like to see in the new metrics component.
http://tracker.nagios.com/view.php?id=471
Grtz
Willem
Nagios XI 5.8.1
https://outsideit.net
https://outsideit.net