Page 1 of 1

check_cloudwatch_status: CloudWatch Metric: CPUUtilization:

Posted: Fri Apr 24, 2015 5:28 am
by alp_support
Hi Support,

We are using nagios check_cloudwatch_status.rb for monitoring the Cpu utilization of our AWS hosted instances. But since few days we are getting the alert: CloudWatch Metric: CPUUtilization: No AWS CloudWatch Datapoint retrieved for many instances.

Now in earlier topic: http://support.nagios.com/forum/viewtop ... =7&t=27414 it states that the instance under load will behave erratic and hence the alerts are received. But when i checked the instances load manually they seems to be fine.

Also NRPE agent to pull in Load data i.e "check_command check_nrpe!check_load" is working fine with no spike received howsoever

Kindly assist us in further troubleshooting

nagios version
[root@ip-10-0-200-5 tmp]# nagios -V

Nagios Core 3.5.1
Copyright (c) 2009-2011 Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-30-2013
License: GPL

[root@ip-10-0-200-5 tmp]# nrpe -V

NRPE - Nagios Remote Plugin Executor
Copyright (c) 1999-2008 Ethan Galstad (nagios@nagios.org)
Version: 2.14
Last Modified: 12-21-2012
License: GPL v2 with exemptions (-l for more info)
SSL/TLS Available: Anonymous DH Mode, OpenSSL 0.9.6 or higher required
TCP Wrappers Available

Re: check_cloudwatch_status: CloudWatch Metric: CPUUtilizati

Posted: Fri Apr 24, 2015 2:32 pm
by tmcdonald
Not to deflect, but have you contacted the author of that plugin? He or she would likely be better able to assist, as anything we do would be somewhat guesswork (since it is not one of our standard plugins).

Re: check_cloudwatch_status: CloudWatch Metric: CPUUtilizati

Posted: Fri Apr 24, 2015 2:34 pm
by ssax
Please try running the command manually from the command line and use the --verbose option and post the sanitized output.

Re: check_cloudwatch_status: CloudWatch Metric: CPUUtilizati

Posted: Sat Apr 25, 2015 3:53 am
by alp_support
I have raised the query to Author as well but still waiting on for a reply
tmcdonald wrote:Not to deflect, but have you contacted the author of that plugin? He or she would likely be better able to assist, as anything we do would be somewhat guesswork (since it is not one of our standard plugins).

Re: check_cloudwatch_status: CloudWatch Metric: CPUUtilizati

Posted: Sat Apr 25, 2015 3:57 am
by alp_support
ssax wrote:Please try running the command manually from the command line and use the --verbose option and post the sanitized output.
I'm removing the customer related information from the verbose output like IP, Customer name, ID is abstracted and replaced with xxxx

[root@ip-10-0-200-5 tmp]# /usr/lib64/nagios/plugins/check_cloudwatch_status.rb -a eu-west-1 -i i-xxxxx -f /etc/facter/facts.d/cloudwatch.txt -C "CPUUtilization" --warning 60 --critical 70 --verbose
** Launching AWS status retrieval on instance ID: i-xxxx
Amazon AWS Endpoint: EC2 ec2.eu-west-1.amazonaws.com, RDS rds.eu-west-1.amazonaws.com, ELB elasticloadbalancing.eu-west-1.amazonaws.com
Amazon CloudWatch Endpoint: monitoring.eu-west-1.amazonaws.com
Warning values: [0, 60.0]
Critical values: [0, 70.0]
AWS EC2 Instance:
{"requestId"=>"84932634-e9a2-4f10-af84-7398b7b7577c",
"xmlns"=>"http://ec2.amazonaws.com/doc/2010-08-31/",
"reservationSet"=>
{"item"=>
[{"instancesSet"=>
{"item"=>
[{"rootDeviceType"=>"ebs",
"architecture"=>"x86_64",
"tagSet"=>
{"item"=>
[{"value"=>"asgapplication",
"key"=>"aws:cloudformation:logical-id"},
{"value"=>
"arn:aws:cloudformation:eu-west-1:xxxxx:stack/xxxxx/d4fb2980-0e50-11e4-b6c7-50fa18c86ab4",
"key"=>"aws:cloudformation:stack-id"},
{"value"=>"app", "key"=>"Name"},
{"value"=>"xxxx", "key"=>"environment_name"},
{"value"=>"xxxx",
"key"=>"aws:cloudformation:stack-name"},
{"value"=>"xx@xx.com", "key"=>"requested_by"},
{"value"=>"ip-10-8-10-240", "key"=>"LaunchedFrom"},
{"value"=>"xx-xx", "key"=>"jenkins_user"},
{"value"=>"xxxx",
"key"=>"aws:autoscaling:groupName"},
{"value"=>"bronze-vpc", "key"=>"environment_type"}]},
"launchTime"=>"2014-07-18T07:57:25.000Z",
"privateDnsName"=>"ip-xx.eu-west-1.compute.internal",
"dnsName"=>nil,
"kernelId"=>"aki-71665e05",
"instanceType"=>"m3.xlarge",
"reason"=>nil,
"virtualizationType"=>"paravirtual",
"rootDeviceName"=>"/dev/sda1",
"vpcId"=>"vpc-xxx",
"placement"=>{"availabilityZone"=>"eu-west-1a", "groupName"=>nil},
"imageId"=>"ami-xxx",
"monitoring"=>{"state"=>"enabled"},
"productCodes"=>nil,
"instanceId"=>"i-xxxx",
"clientToken"=>"c33d61ec-1937-4c3b-837c-c897aea72409_eu-west-1a_2",
"amiLaunchIndex"=>"1",
"privateIpAddress"=>"xx.xx.xx.xx",
"instanceState"=>{"code"=>"16", "name"=>"running"},
"blockDeviceMapping"=>
{"item"=>
[{"deviceName"=>"/dev/sda",
"ebs"=>
{"status"=>"attached",
"volumeId"=>"vol-xxxx",
"deleteOnTermination"=>"true",
"attachTime"=>"2014-07-18T07:57:27.000Z"}}]},
"subnetId"=>"subnet-xxxx",
"keyName"=>"xxxx"}]},
"reservationId"=>"r-xxxx",
"requesterId"=>"xxxx",
"ownerId"=>"xxxxxx",
"groupSet"=>nil}]}}
CloudWatch Detailed Monitoring is enabled for Instance i-xxxx
CloudWatch:
#<AWS::Cloudwatch::Base:0x7f2d3e4eb598
@access_key_id="xxxx",
@http=#<Net::HTTP monitoring.eu-west-1.amazonaws.com:443 open=false>,
@path="/",
@port=443,
@proxy_server=nil,
@secret_access_key="xxxxxx",
@server="monitoring.eu-west-1.amazonaws.com",
@use_ssl=true>
CloudWatch Metrics Statistics:
{"xmlns"=>"http://monitoring.amazonaws.com/doc/2010-08-01/",
"GetMetricStatisticsResult"=>{"Label"=>"CPUUtilization", "Datapoints"=>nil},
"ResponseMetadata"=>{"RequestId"=>"c59a8f57-eb27-11e4-af8c-4bdfc7697f44"}}
CloudWatch Metric: CPUUtilization: No AWS CloudWatch Datapoint retrieved|
[root@ip-10-0-200-5 tmp]#

Re: check_cloudwatch_status: CloudWatch Metric: CPUUtilizati

Posted: Mon Apr 27, 2015 1:45 pm
by jdalrymple
Please reference this other topic on the support forums.

I'm not totally clear on whether the problem was because this guy's Nagios AWS instance was heavily loaded or if it was the machines he was querying. I'm almost certain it must have been the Nagios instance because I don't know why a heavily loaded monitored instance would have any affect on the APIs ability to return results to you.

What does the load look like on your Nagios box? Is it also an AWS instance?

Re: check_cloudwatch_status: CloudWatch Metric: CPUUtilizati

Posted: Mon May 04, 2015 4:44 am
by alp_support
jdalrymple wrote:Please reference this other topic on the support forums.

I'm not totally clear on whether the problem was because this guy's Nagios AWS instance was heavily loaded or if it was the machines he was querying. I'm almost certain it must have been the Nagios instance because I don't know why a heavily loaded monitored instance would have any affect on the APIs ability to return results to you.

What does the load look like on your Nagios box? Is it also an AWS instance?

Hi

My nagios box is also an AWS instance. The load on the box is sometimes high but not always. I have the polling interval of 5 mins and during the timespan of 30 mins which is monitored i have checked the load and its seems to be normal.

Any other ideas ?

Re: check_cloudwatch_status: CloudWatch Metric: CPUUtilizati

Posted: Mon May 04, 2015 2:39 pm
by jdalrymple
Would it be possible to get the verbose output of one that's not failing, they'd be interesting to compare. The error seems legit to me. I don't see anything in the verbose output that I'd regard as a CPU statistic. Are the failures more related to a type of service, or to a host?

Re: check_cloudwatch_status: CloudWatch Metric: CPUUtilizati

Posted: Mon May 18, 2015 5:43 am
by alp_support
Hi jdalrymple,

I couldnt get a verbose of the working ones as all are failing for my nagios. Since we are monitoring only CPU Utilization i cannot comment on other metrics.

Re: check_cloudwatch_status: CloudWatch Metric: CPUUtilizati

Posted: Mon May 18, 2015 11:12 am
by jdalrymple
Can you debug the metrics a bit from the AWS side of things to verify they're actually there?

https://console.aws.amazon.com/cloudwatch/

Click the Metrics button on the left and find the metric you're trying to monitor.