EC2 monitoring unknown status

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
sgargano
Posts: 51
Joined: Mon May 23, 2016 9:06 am

EC2 monitoring unknown status

Post by sgargano »

Hi,

I'm currently using the EC2 wizard v1.1.2 and it works fine to add new servers to be monitored.

The issue appear later, and it occurs randomly: suddenly some services are goes to unknown status with error "The check has received a response with no data. This is generally caused by an incorrect region name, invalid metric name, or invalid instance ID." then later they come back to "ok" and again later will come to "unknown" status.

What could cause this issue?
Thanks
You do not have the required permissions to view the files attached to this post.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: EC2 monitoring unknown status

Post by benjaminsmith »

Hello @sgargano,

The fact that this is intermittent is typically the related to the network connection. Double check the AWS credentials, as mentioned in the post below the most common source of this error is an invalid region name.

Unable to fetch the details from ec2 instance

Next run the plugin check from the command line with the verbose option -v to display more data to help troubleshoot the error and post the results for us to review. Thanks.

Nagios XI - How To Test Check Commands From The Command-line
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
sgargano
Posts: 51
Joined: Mon May 23, 2016 9:06 am

Re: EC2 monitoring unknown status

Post by sgargano »

The region name is definitely correct: "eu-central-1"

Very weird thing is that by shell it works but at the same time we have the issue under Nagios XI.

Code: Select all

################ AWS Response Data ################

Namespace: AWS/EC2
Instance ID: i-xxxxxxxxx
Metric Name: NetworkOut
Period: 300 seconds
Unit of measure: Bytes
Timestamp: 2019-10-04 06:52:00+00:00

################ Datapoints ################

Statistic: Average
         Value: 862208.4
         Warning Threshold: 1000000000
         Critical Threshold: 2000000000
         Return Code: 0

Statistic: Minimum
         Value: 733503.0
         Warning Threshold: 1000000000
         Critical Threshold: 2000000000
         Return Code: 0

Statistic: Maximum
         Value: 1169452.0
         Warning Threshold: 1000000000
         Critical Threshold: 2000000000
         Return Code: 0

Statistic: Sum
         Value: 4311042.0
         Warning Threshold: 1000000000
         Critical Threshold: 2000000000
         Return Code: 0

OK: Network Out Sum - 4311042.0 (Average: 862208.4B, Minimum: 733503.0B, Maximum: 1169452.0B) | NetworkOut=4311042.0B;1000000000;2000000000;;;
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: EC2 monitoring unknown status

Post by lmiltchev »

It seems like, these 3 metrics - CPUCreditBalance, NetworkPacketsIn, and NetworkPacketsOut gets updated on the Amazon side of things roughly every 5 min. So, the data won't be always available when you use "-P 5". Try increasing your period to 10.

Example:

Code: Select all

/usr/local/nagios/libexec/check_ec2.py  -P 10 --metricname NetworkPacketsIn --instanceid 'xxx' --accesskeyid 'xxx' --secretaccesskey 'xxx' --region 'us-east-1' --warning '1750000' --critical '3500000'
OK: Network Packets In Sum - 64.0 (Average: 12.8, Minimum: 5.0, Maximum: 31.0) | NetworkPacketsIn=64.0;1750000;3500000;;;
Let us know if this helped.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked