EC2 monitoring unknown status

sgargano · Post by **sgargano** » Thu Oct 03, 2019 10:45 am

Hi,

I'm currently using the EC2 wizard v1.1.2 and it works fine to add new servers to be monitored.

The issue appear later, and it occurs randomly: suddenly some services are goes to unknown status with error "The check has received a response with no data. This is generally caused by an incorrect region name, invalid metric name, or invalid instance ID." then later they come back to "ok" and again later will come to "unknown" status.

What could cause this issue?
Thanks

benjaminsmith · Post by **benjaminsmith** » Thu Oct 03, 2019 12:13 pm

Hello @sgargano,

The fact that this is intermittent is typically the related to the network connection. Double check the AWS credentials, as mentioned in the post below the most common source of this error is an invalid region name.

Unable to fetch the details from ec2 instance

Next run the plugin check from the command line with the verbose option -v to display more data to help troubleshoot the error and post the results for us to review. Thanks.

Nagios XI - How To Test Check Commands From The Command-line

sgargano · Post by **sgargano** » Fri Oct 04, 2019 2:02 am

The region name is definitely correct: "eu-central-1"

Very weird thing is that by shell it works but at the same time we have the issue under Nagios XI.

Code: Select all

################ AWS Response Data ################

Namespace: AWS/EC2
Instance ID: i-xxxxxxxxx
Metric Name: NetworkOut
Period: 300 seconds
Unit of measure: Bytes
Timestamp: 2019-10-04 06:52:00+00:00

################ Datapoints ################

Statistic: Average
         Value: 862208.4
         Warning Threshold: 1000000000
         Critical Threshold: 2000000000
         Return Code: 0

Statistic: Minimum
         Value: 733503.0
         Warning Threshold: 1000000000
         Critical Threshold: 2000000000
         Return Code: 0

Statistic: Maximum
         Value: 1169452.0
         Warning Threshold: 1000000000
         Critical Threshold: 2000000000
         Return Code: 0

Statistic: Sum
         Value: 4311042.0
         Warning Threshold: 1000000000
         Critical Threshold: 2000000000
         Return Code: 0

OK: Network Out Sum - 4311042.0 (Average: 862208.4B, Minimum: 733503.0B, Maximum: 1169452.0B) | NetworkOut=4311042.0B;1000000000;2000000000;;;

Post by **lmiltchev** » Fri Oct 04, 2019 10:18 am

It seems like, these 3 metrics - CPUCreditBalance, NetworkPacketsIn, and NetworkPacketsOut gets updated on the Amazon side of things roughly every 5 min. So, the data won't be always available when you use "-P 5". Try increasing your period to 10.

Example:

Code: Select all

/usr/local/nagios/libexec/check_ec2.py  -P 10 --metricname NetworkPacketsIn --instanceid 'xxx' --accesskeyid 'xxx' --secretaccesskey 'xxx' --region 'us-east-1' --warning '1750000' --critical '3500000'
OK: Network Packets In Sum - 64.0 (Average: 12.8, Minimum: 5.0, Maximum: 31.0) | NetworkPacketsIn=64.0;1750000;3500000;;;

Let us know if this helped.

Nagios Support Forum

EC2 monitoring unknown status

EC2 monitoring unknown status

Re: EC2 monitoring unknown status

Re: EC2 monitoring unknown status

Re: EC2 monitoring unknown status