Hi Team,
We monitor AWS status check failed on nagios.
Till last week it was working fine.
Now the service check fails with below kind of error
"UNKNOWN - Name=InstanceId,Value=i-018c183920bf5a804 StatusCheckFailed (5 min Average): null null - No metric value known."
For some of the service check, if I do forceful check , it runs fine. But not for all the hosts.
Please advise.
Thanks!
Monitor Status Check failed Status AWS
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Monitor Status Check failed Status AWS
In order to have any sort of idea what type of check this is running I would need to see the plugin you are using, as well as the command if you were to run it from the command line as it is not something builtin to XI.
Re: Monitor Status Check failed Status AWS
Hi Team,
Attached plugin that we use for AWS.
Also sometime i see below prompt on service check.
"Notifications for this service are being suppressed because it was detected as having been flapping between different states (23.2% change >= 20.0% threshold). When the service state stabilizes and the flapping stops, notifications will be re-enabled."
Note - We are using only AWS Status Check failed service check currently.
Thanks!
Attached plugin that we use for AWS.
Also sometime i see below prompt on service check.
"Notifications for this service are being suppressed because it was detected as having been flapping between different states (23.2% change >= 20.0% threshold). When the service state stabilizes and the flapping stops, notifications will be re-enabled."
Note - We are using only AWS Status Check failed service check currently.
Thanks!
You do not have the required permissions to view the files attached to this post.
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Monitor Status Check failed Status AWS
I will also need the command line your Nagios Server uses to run this pluginscottwilkerson wrote:In order to have any sort of idea what type of check this is running I would need to see the plugin you are using, as well as the command if you were to run it from the command line as it is not something builtin to XI.
Re: Monitor Status Check failed Status AWS
Hi Team,
Below is the command used to run the plugin
$USER1$/check_cloudwatch.sh --region=us-east-1 --namespace="$ARG1$" --metric="$ARG2$" --statistics="Average" --mins=$ARG7$ --dimensions="Name=$ARG3$,Value=$HOSTALIAS$" --warning=$ARG5$ --critical=$ARG6$ --profile=$ARG8$
$ARG1$
EC2
$ARG2$
StatusCheckFailed
$ARG3$
InstanceId
$ARG4$
0
$ARG5$
1
$ARG6$
1
$ARG7$
5
$ARG8$
cisProd
Below is the command used to run the plugin
$USER1$/check_cloudwatch.sh --region=us-east-1 --namespace="$ARG1$" --metric="$ARG2$" --statistics="Average" --mins=$ARG7$ --dimensions="Name=$ARG3$,Value=$HOSTALIAS$" --warning=$ARG5$ --critical=$ARG6$ --profile=$ARG8$
$ARG1$
EC2
$ARG2$
StatusCheckFailed
$ARG3$
InstanceId
$ARG4$
0
$ARG5$
1
$ARG6$
1
$ARG7$
5
$ARG8$
cisProd
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Monitor Status Check failed Status AWS
Based on my cursory knowledge of AWS metrics, this appears right.
You said it works if you force the check, you may want to verify with Amazon or the plugin author why sometimes they would not return data
You said it works if you force the check, you may want to verify with Amazon or the plugin author why sometimes they would not return data
Re: Monitor Status Check failed Status AWS
Hi,
This was working fine till last week.
Suddenly it started giving the error.
This is specifically seen when we do apply config.
This was working fine till last week.
Suddenly it started giving the error.
This is specifically seen when we do apply config.
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Monitor Status Check failed Status AWS
With a sudden change, I would suspect something maybe changed with the AWS API that it connects to.ss6407 wrote:Hi,
This was working fine till last week.
Suddenly it started giving the error.
This is specifically seen when we do apply config.
That is the only thing that is a variable.