Strange Metric Graphing Issue

ejlorson · Post by **ejlorson** » Thu Sep 28, 2017 2:58 pm

I am sorry, but this is a little long and confusing, which is why I need help.

Running Nagios XI on AWS instance. 64-bit CentOS. I am using the check_cloudwatch.py command to check CPUUtilization on other AWS instances.

On the "Host Status" page I get:
"CLOUDWATCHMETRIC OK - CloudWatch Metric AWS/EC2:CPUUtilization with dimensions {'InstanceId': 'REDACTED'}"
for ALL my cloudwatch checks.

When I do the Run Check Command I get this:
CLOUDWATCHMETRIC OK - CloudWatch Metric AWS/EC2:CPUUtilization with dimensions {'InstanceId': 'REDACTED'} | cloudwatchmetric=0.066Percent;75;90
for ALL my cloudwatch checks, and all of them have a cloudwatchmetric value that was not 0 and changed every time I ran the test check.

This indicates to me that the Cloudwatch check is working, and we used the same command (check_cloudwatch.py) with a unique instanceID for 13 hosts in 2 VPC's.

In Host Status all 13 checks indicate "CLOUDMETRIC OK" as stated above. They also indicate they are graphing the metric and have created a graph in Status Detail for the CPUUtilization Metric.

PROBLEM
* 7 of the checks stopped reporting CPUUtilization on their respective host graphs, all on Sept 14 between 3:22 and 3:27 PM, and we have not been able to get the CPUUtilization data to graph since.
* One check stopped reporting CPUUtilization on its host graph on Sept 27 at 3:06 PM, and we have not been able to get the CPUUtilization data to graph since.
* The remaining 5 checks have been reporting CPUUtilization to their host graph since Sept 14 to now and have not stopped reporting CPUUtilization to the host graphs.
* There are checks failing to post metrics in BOTH VPC's.
* There are checks that ARE posting metrics in BOTH VPC's.

This is very confusing. I have confirmed that the return from the command is EXACTLY the same for all hosts. Why would Nagios ignore the metrics for only some of these checks when they are all functionally identical?

Can anyone give me an idea what to look at? I don't think it is AWS because it responds and I can see the values return consistently when checked in Nagios.

Thanks,
Eric

Post by **cdienger** » Thu Sep 28, 2017 4:21 pm

Hi Eric,

If you edit the service(s) and go to to the Check Settings tab, make sure that "Process perf data" is enabled. It could be disabled here or the services may be inheriting a template with this option disabled.

Also, the perfdata is stored in /usr/local/nagios/share/perfdata/<hostname/<servicedescp>.rrd. you should see this file growing and its timestamp updating if the perfdata is getting processed.

Nagios Support Forum

Strange Metric Graphing Issue

Strange Metric Graphing Issue

Re: Strange Metric Graphing Issue