Metrics for nix CPU

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Metrics for nix CPU

Post by BanditBBS »

I know it doesn't currently work, but is there a reason for CPU Stats not being able to show under metrics for nix based systems? I have performance data being returned and we'd much rather look at average CPU utilization instead of Load. Any easier way to see this? We really want to see the top X number of nix hosts with the highest CPU average.

Thanks!
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Metrics for nix CPU

Post by sreinhardt »

It's entirely possible, its just a matter of finding a good counter\location to check this depending on the distro you are running. Are these all Cent\RHEL?

Edit: Ah you mean in the metrics component.... refer to abrist.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Metrics for nix CPU

Post by abrist »

The primary reason it is is missing, is that *nix server performance is predominantly measured by "Load". As many *nix distributions are quite aggressive in cacheing, read-ahead, preprocessing, etc, "Load over time" is usually a superior metric to properly gauge a server's performance health. Once load is over 1.0 per cpu core, you may start to have issues due to wait, and that is the important business metric.

I understand the desire for cpu average though (especially for ec2 and other cloud computing). Do you have a custom script returning these metrics?

The metrics component is an odd beast. It deals with the output of many different plugins and tries to normalize them. You may be able to look at it's php and add your metric, though just a warning, the component is a bit complex due to how many differently formatted sources it pulls from. Obviously, custom development is an option as well . . .
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Metrics for nix CPU

Post by BanditBBS »

abrist,

That reasoning is a pretty big generalization. None of our AIX admins care one bit about load and they only really care about CPU usage if it is "cookiing" for over an hour or so.

The two importane part of the check I use:

Code: Select all

open(PS, "/usr/bin/vmstat 1 4 | egrep -v '[a-z,A-Z]|-' |egrep '[0-9]' |") || return 1;
	while (<PS>) {
		(undef,undef,undef,undef,undef,undef,undef,undef,undef,undef,undef,undef,undef,undef,undef,undef,$idle,undef) = split(/[\t \n]+/);
		$tidle = $tidle + $idle;
               }
$usage = 100 - ($tidle / 4);
and

Code: Select all

if ($usage >= $crit) {

	printf("CRITICAL - CPU usage at $usage%|Percent=$usage\n");
	exit($STATUSCODE{"CRITICAL"});
	}

elsif ($usage >= $warn) {

	printf("WARNING - CPU usage at $usage%|Percent=$usage\n");
	exit($STATUSCODE{"WARNING"});
	}

elsif ($usage < $warn) {

	printf("OK - CPU usage at $usage%|Percent=$usage\n");
	exit($STATUSCODE{"OK"});
So it is just returning a simple percentage and we'd love to be able to see an average for the host group sorted by average usage. I guess I'll look at the php....gulp
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Metrics for nix CPU

Post by abrist »

BanditBBS wrote:That reasoning is a pretty big generalization. None of our AIX admins care one bit about load and they only really care about CPU usage if it is "cookiing" for over an hour or so.
Fair enough, I meant no offense, I just wanted to explain the decision behind the component's use of load instead of cpu utilization for *nix boxes.
I am looking into the php to see if there is any easy way to include your checkresults' performance data in the component.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Metrics for nix CPU

Post by BanditBBS »

abrist wrote:
BanditBBS wrote:That reasoning is a pretty big generalization. None of our AIX admins care one bit about load and they only really care about CPU usage if it is "cookiing" for over an hour or so.
Fair enough, I meant no offense, I just wanted to explain the decision behind the component's use of load instead of cpu utilization for *nix boxes.
I am looking into the php to see if there is any easy way to include your checkresults' performance data in the component.
I wasn't yelling, was just calling it a generalization :)

I'm looking into it as well.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Metrics for nix CPU

Post by BanditBBS »

What conditions need to be true for the metric component to utilize the information as CPU Usage stats?
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Metrics for nix CPU

Post by abrist »

I just tested this. As you are returning a percentage, you will want to the "CPU Usage" (windows cpu metric) so that sorts work correctly.

1. Name the service check description "CPU Usage"
2. Make sure the perfdata ds label is named "5 min avg Load" or better yet, change the php file: /usr/local/nagiosxi/html/includes/utils-metrics.inc.php

Lines 166 and 219:
From:

Code: Select all

if(preg_match("/5 min avg Load/",$perfdata)>0)
To:

Code: Select all

if(preg_match("/(5 min avg Load|<your perfdata ds label here>)/",$perfdata)>0)
EDIT: Essentially, the component grabs services with a specific name and then greps the performance datasource names for "5 min avg Load". We need to make the regex an "or" statement and then include your performance datasource label so that you do not have to lose your perfdata or change the checks.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Metrics for nix CPU

Post by BanditBBS »

abrist wrote:I just tested this. As you are returning a percentage, you will want to the "CPU Usage" (windows cpu metric) so that sorts work correctly.

1. Name the service check description "CPU Usage"
2. Make sure the perfdata ds label is named "5 min avg Load" or better yet, change the php file: /usr/local/nagiosxi/html/includes/utils-metrics.inc.php

Lines 166 and 219:
From:

Code: Select all

if(preg_match("/5 min avg Load/",$perfdata)>0)
To:

Code: Select all

if(preg_match("/(5 min avg Load|<your perfdata ds label here>)/",$perfdata)>0)
EDIT: Essentially, the component grabs services with a specific name and then greps the performance datasource names for "5 min avg Load". We need to make the regex an "or" statement and then include your performance datasource label so that you do not have to lose your perfdata or change the checks.
Works like a champ. I was just about to correct yo uand say line 219, not 229, but I see you already corrected that :)

FYI - The service is labeled CPU Stats not CPU Usage for the AIX servers and they show up fine. Just modifying that php file has given me the desired effect! My last question for you....this is only showing the current numbers, correct? We can't say "show me avg CPU % over past week" That is not a function of the metrics, right?
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Metrics for nix CPU

Post by abrist »

BanditBBS wrote:Works like a champ. I was just about to correct yo uand say line 219, not 229, but I see you already corrected that :)
Yeah, I think I edited it like 4 times . .
BanditBBS wrote:That is not a function of the metrics, right?
Nope. As the data is just pulled from the most recent checkresult in the rrd, that is all you get. This could be changed, but would be a bit deeper of an edit to the php. I am sure that it would require custom development unfortunately.
BanditBBS wrote:FYI - The service is labeled CPU Stats not CPU Usage for the AIX servers and they show up fine
Interesting, I changed my service description to something else and it disappeared from the metrics ui. Maybe I was just impatient?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Locked