aggregating performance data

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
tonyleatwork
Posts: 91
Joined: Mon Jul 07, 2014 8:55 am

aggregating performance data

Post by tonyleatwork »

Hi -

Once a quarter we provide OPs review metrics which helps highlight the heavy hitters in system utilization. This data in conjunction with our applications metrics show which systems are currently or close to starving for resources.

In our previous monitoring system, we can simply provide a service criteria (CPU for example) and a threshold (e.g. the amount of time a system sits at 95% utilization). I understand this is tougher to do in Nagios since it is a framework type of platform and it's not aware of the content of the graphs so this will most likely be a manual process (at least initially).

To help with this process, is there a way I can aggregate performance data from multiple systems and grab similar metrics ? Like who are the top CPU consumers (for a CPU service I defined) for a particular host group for example?

I've played around graph explorer but it just allows me to stack graphs not really aggregate the data.

Thanks in advance!
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: aggregating performance data

Post by lmiltchev »

The Metrics component would be the best option if one could select different timeperiods. Currently, the component only shows the utilization/graphs for the last 24 hours. There is an internal feature request for adding the ability to select different timeperiods in the Metrics component (TASK ID 5683) but I am not sure when/if this will be implemented.
Be sure to check out our Knowledgebase for helpful articles and solutions!
tonyleatwork
Posts: 91
Joined: Mon Jul 07, 2014 8:55 am

Re: aggregating performance data

Post by tonyleatwork »

Hi -

Thanks for the response. I did find 'Top alert producers' under 'reports' which I think I can use to scrape the data from but is there a way to filter it out to a specific service, exclude certain alerts (we dont care about UNKNOWNs) and include WARNINGS in that data?
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: aggregating performance data

Post by lmiltchev »

...but is there a way to filter it out to a specific service, exclude certain alerts (we dont care about UNKNOWNs) and include WARNINGS in that data?
Not at the moment. You can filter by hostgroup or servicegroup only. You may be able to exclude "Unknowns" if you modify the "/usr/local/nagiosxi/html/reports/topalertproducers.php" but I am not sure about that. If you decide to modify this file, you will be on your own.

I am a bit confused though - initially, you said:
...is there a way I can aggregate performance data from multiple systems and grab similar metrics ?
then you asked:
...is there a way to filter it out to a specific service...
What exactly are you trying to accomplish? Are you trying to aggregate similar metrics, or you need to view the data from one service?
Be sure to check out our Knowledgebase for helpful articles and solutions!
tonyleatwork
Posts: 91
Joined: Mon Jul 07, 2014 8:55 am

Re: aggregating performance data

Post by tonyleatwork »

lmiltchev wrote:
...but is there a way to filter it out to a specific service, exclude certain alerts (we dont care about UNKNOWNs) and include WARNINGS in that data?
Not at the moment. You can filter by hostgroup or servicegroup only. You may be able to exclude "Unknowns" if you modify the "/usr/local/nagiosxi/html/reports/topalertproducers.php" but I am not sure about that. If you decide to modify this file, you will be on your own.

I am a bit confused though - initially, you said:
...is there a way I can aggregate performance data from multiple systems and grab similar metrics ?
then you asked:
...is there a way to filter it out to a specific service...
What exactly are you trying to accomplish? Are you trying to aggregate similar metrics, or you need to view the data from one service?
I apologize for the confusion, let me ask this another way.. How do I view performance metrics for a particular service from multiple systems?

i.e.

Top 5 CPU Usage (CPU Usage is a title of the service) from all of the servers in Hostgroup X last quarter:

Ideally it would just pop up the CPU Usage performance graphs from the top 5 systems within the 'last quarter' date range.

I found the 'metrics' component but it doesn't work properly (only one system shows up) and I cant specify a date range.

Just to add some pain on my side, our quarter ended and I'm really trying to get this data ASAP. I'm throwing the graphs together piece meal, but it doesn't sort and it's a lot of manual analysis for 600+ servers (3 graphs from each!).
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: aggregating performance data

Post by jolson »

How do I view performance metrics for a particular service from multiple systems?
The graph explorer accomplishes this nicely, but as you said in your original post - this does not work for you. You stated that "it just allows me to stack graphs not really aggregate the data".

The answer here is that we don't currently have a good way of doing what you're requesting.
When the Metrics component is updated, that could certainly be a resolution. For now I'm afraid there isn't a way to get this accomplished. I have added priority on to the feature request mentioned by lmiltchev for you.
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: aggregating performance data

Post by scottwilkerson »

Or you can run an availability report for a particular hostgroup or service group and in advanced options check the performance graphs check box
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
tonyleatwork
Posts: 91
Joined: Mon Jul 07, 2014 8:55 am

Re: aggregating performance data

Post by tonyleatwork »

scottwilkerson wrote:Or you can run an availability report for a particular hostgroup or service group and in advanced options check the performance graphs check box
Thanks for this suggestion Scott. It does seem like a nice quick way to get all of the graphs on the screen but unfortunately doesn't aggregate the data in the way we need (like which one is the top consumers of CPU for example).

As a work around, is there a way to display all of the graphs for a particular service for a hostgroup and date range?
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: aggregating performance data

Post by abrist »

You could use the graph explorer multistacked graph option, but you would have to add all the services by hand.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: aggregating performance data

Post by WillemDH »

Hello,

This is a feature we also need at Digipolis. I made a feature request for this some time ago and Scott has been working on it. Please +1 this feature request and add a comment with what you would like to see in the new metrics component. :)
http://tracker.nagios.com/view.php?id=471

Grtz

Willem
Nagios XI 5.8.1
https://outsideit.net
Locked