AWS boto3 cannot see credentials or $USER macros

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
iopp
Posts: 4
Joined: Wed Jan 02, 2013 10:25 am

AWS boto3 cannot see credentials or $USER macros

Post by iopp »

I have written a python boto script to get some metric statistics from the AWS hosts in our production account

The script uses AWS API calls to see which hosts are up and then asks each one for it's "StatusCheckFailed" stats. These represent health checks that AWS carry out on the VMs and if one of them is failed there is usually a problem, sometimes with hardware

These stats are only accessible with the AWS API. In order to use the AWS API for monitoring we have established an IAM user with read-only access to most of the resources in our production account. Obviously the credentials for this account are sensitive because the permissions are quite strong

The script normally picks up the aws credentials to use from a ~/.aws directory. But the Nagios XI process seems unable to pick this up. To attempt to fix this I have tried
  • Setting the HOME environment in the script with os.environ['HOME']='/nagioshome' This does work with python / boto3 as I can run in a shell from an account with no credentials and it finds the nagios home ones. It does not make the script work from the Nagios XI process however
  • Following the advice in https://exchange.nagios.org/directory/D ... os/details for adding the AWS API keys as $USERn$. The $USERn$ variables did not appear in the script, I modified error messages to display them but they did not seem to be set. I used the /etc/init.d/nagios reload to sighup the nagios process

    Here is the output from the script version that takes $USERn$ variables (this is not the version attached below)

    Code: Select all

    COMMAND: /usr/local/nagios/libexec/check_ec2_statuscheck.py -u \$USER9\$ -a \$USER10\$
    OUTPUT: system error type:  value: An error occurred (AuthFailure) when calling the DescribeInstances operation: AWS was not able to validate the provided access credentials : -u $USER9$ -a $USER10$ Unknown
    
Finally I have given up and set the AWS API access keys in the script as constants. This worked but it is most unsatistfactory

My specific questions are
  • What configuration of the Nagios XI system should I check to ensure that the Nagios XI process does have access to the nagios user .aws config directory?
  • What configuration of the Nagios XI system should I check to ensure that the $USERn$ macros are set correctly?
Script and nagios admin -> system profile below. The script version is the simple case that picks up the credentials from the ~/.aws directory

Thanks

Code: Select all

Nagios XI Installation Profile
Download Profile	
System:
Nagios XI Version : 2012R2.7
nagios.ioppublishing.com 2.6.32-358.23.2.el6.x86_64 x86_64
Red Hat Enterprise Linux Server release 6.4 (Santiago)
Gnome Installed
Apache Information
PHP Version: 5.3.3
Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0
Server Name: nagios.ioppublishing.com
Server Address: 10.20.1.83
Server Port: 80
Date/Time
PHP Timezone: Europe/London
PHP Time: Thu, 15 Oct 2015 08:47:55 +0100
System Time: Thu, 15 Oct 2015 08:47:55 +0100
Nagios XI Data
License ends in: OVQTPV

nagios (pid 28940) is running...
NPCD running (pid 2338).
ndo2db (pid 2973) is running...
CPU Load 15: 2.11
Total Hosts: 381
Total Services: 2689
Function 'get_base_uri' returns: http://nagios.ioppublishing.com/nagiosxi/
Function 'get_base_url' returns: http://nagios.ioppublishing.com/nagiosxi/
Function 'get_backend_url(internal_call=false)' returns: http://nagios.ioppublishing.com/nagiosxi/includes/components/profile/profile.php
Function 'get_backend_url(internal_call=true)' returns: http://localhost/nagiosxi/backend/
Ping Test localhost
Running:

/bin/ping -c 3 localhost 2>&1 

PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.028 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.026 ms
64 bytes from localhost (127.0.0.1): icmp_seq=3 ttl=64 time=0.025 ms

--- localhost ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.025/0.026/0.028/0.004 ms
Test wget To locahost
WGET From URL: http://localhost/nagiosql/index.php
Running:

/usr/bin/wget http://localhost/nagiosql/index.php 

--2015-10-15 08:47:57-- http://localhost/nagiosql/index.php
Resolving localhost... ::1, 127.0.0.1
Connecting to localhost|::1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5259 (5.1K) [text/html]
Saving to: "/tmp/nagiosql_index.tmp"

0K ..... 100% 537M=0s

2015-10-15 08:47:58 (537 MB/s) - "/tmp/nagiosql_index.tmp" saved [5259/5259]

Code: Select all

import boto3
import sys
from datetime import datetime, timedelta

# set the time period to get stats for
lookback = timedelta(hours=1)
finish = datetime.now()
start = finish - lookback

try:
# get details of instances in account
  e = boto3.client('ec2').describe_instances()
  
  myinstances = []
  myfail = []
  
# filter the instances to only the running ones
  for r in e['Reservations']:
    if r['Instances'][0]['State']['Name'] == 'running':
      myinstances.append({ 'Name': 'InstanceId', 'Value': r['Instances'][0]['InstanceId']})
  
  
# attach to cloudwatch
  
  c = boto3.client('cloudwatch')
  
# get the status checks 
  for instance in myinstances :
    x=c.get_metric_statistics(
       Namespace='AWS/EC2',
       MetricName='StatusCheckFailed_System',
       Dimensions= [instance],
       StartTime=start,
       EndTime=finish,
       Period=300,
       Statistics=['Sum'],
       Unit='Count'
       )
# go through all returned results looking for problems
    for result in x['Datapoints']:
       n = result['Sum']
       if (n > 0):
         myfail.append(instance)
except:
# something went wrong
  print "system error %s %s: Unknown" % (sys.exc_info()[0],sys.exc_info()[1])
  sys.exit(3)
          
if (len(myfail) == 0):
  print "%d instances checked :OK " % len(myinstances)
  sys.exit(0)
else:
  print "%d instances checked %s :Critical" % (len(myinstances), repr(myfail))
  sys.exit(2)
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: AWS boto3 cannot see credentials or $USER macros

Post by tmcdonald »

A few things to note:

1.) Plugins are run by the nagios user during normal execution. Any command-line testing should be done as the nagios user to ensure accuracy of results.

2.) Judging by the output, you are using the "Test Check Command" button - this is good for simple tests but it does have some flaws. In particular, it runs as the apache user and not nagios so permissions-based rules might be wrong. It also does some escaping of characters that are harmful in PHP such as $. Finally, since it is not running through the nagios binary but rather directly calling the plugin, it has no knowledge of the $USERn$ macros. For these reasons I would suggest that all testing be done via CLI.
Former Nagios employee
iopp
Posts: 4
Joined: Wed Jan 02, 2013 10:25 am

Re: AWS boto3 cannot see credentials or $USER macros

Post by iopp »

tmcdonald wrote:A few things to note:

1.) Plugins are run by the nagios user during normal execution. Any command-line testing should be done as the nagios user to ensure accuracy of results.

2.) Judging by the output, you are using the "Test Check Command" button - this is good for simple tests but it does have some flaws. In particular, it runs as the apache user and not nagios so permissions-based rules might be wrong. It also does some escaping of characters that are harmful in PHP such as $. Finally, since it is not running through the nagios binary but rather directly calling the plugin, it has no knowledge of the $USERn$ macros. For these reasons I would suggest that all testing be done via CLI.
I will avoid using the "Test Check Command" button in future. Is there some configuration we can do to disable this misleading GUI feature?

Thanks
iopp
Posts: 4
Joined: Wed Jan 02, 2013 10:25 am

Re: AWS boto3 cannot see credentials or $USER macros

Post by iopp »

Here is the working script as in use now. To call it set $USERn$ variables as outlined above (for example to $USER9$ and $USER10$) and set the command up as $USER1$/check_ec2_statuscheck.py $ARG1$ and the service $ARG1$ as -u $USER9$ -a $USER10$

Code: Select all

#!/usr/bin/python
import boto3
import sys
import os
import getopt
from datetime import datetime, timedelta

opt,args = getopt.getopt(sys.argv[1:],'hvu:a:')
d=dict()
for key,value in opt:
  d[key]=value

if d.has_key('-h'):
  print "aws api cloudwatch program ",sys.argv[0]
  print "-h   help this page"
  print "-v   verbose mode"
  print "-u ACCESSKEY  aws access key"
  print "-a SECRETKEY  aws secret key"
  print "region is set to eu-west-1"
  print "nagios cannot read .aws credentials"
  sys.exit(0)

if d.has_key('-v'):
  verbose = true

if not(d.has_key('-u') and d.has_key('-a')):
  print "configuration error must give -u ACCESSKEY AND -a SECRETKEY"
  sys.exit(3)

access = d['-u']
secret = d['-a']
  
#os.environ['HOME']='/nagioshome'
# set the time period to get stats for
lookback = timedelta(hours=1)
finish = datetime.now()
start = finish - lookback

try:
# get details of instances in account
  e = boto3.client('ec2',
                   aws_access_key_id=access,
                   aws_secret_access_key=secret,
                   region_name='eu-west-1').describe_instances()
  
  myinstances = []
  myfail = []
  
# filter the instances to only the running ones
  for r in e['Reservations']:
    if r['Instances'][0]['State']['Name'] == 'running':
      myinstances.append({ 'Name': 'InstanceId', 'Value': r['Instances'][0]['InstanceId']})
  
  
# attach to cloudwatch
  
  c = boto3.client('cloudwatch',
                   aws_access_key_id=access,
                   aws_secret_access_key=secret,
                   region_name='eu-west-1')
  
# get the status checks 
  for instance in myinstances :
    x=c.get_metric_statistics(
       Namespace='AWS/EC2',
       MetricName='StatusCheckFailed',
       Dimensions= [instance],
       StartTime=start,
       EndTime=finish,
       Period=300,
       Statistics=['Sum'],
       Unit='Count'
       )
# go through all returned results looking for problems
    for result in x['Datapoints']:
       n = result['Sum']
       if (n > 0):
         myfail.append(instance)
except:
# something went wrong
  print "system error type: %s value: %s : Unknown" %  (sys.exc_info()[0],sys.exc_info()[1])
  sys.exit(3)
          
if (len(myfail) == 0):
  print "%d instances checked :OK " % len(myinstances)
  sys.exit(0)
else:
  print "%d instances checked %s :Critical" % (len(myinstances), repr(myfail))
  sys.exit(2)
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: AWS boto3 cannot see credentials or $USER macros

Post by tmcdonald »

There is not currently an option to disable the Test Check Command button, but I have mentioned a few times in the past that we should add some wording that the results are not always 100% accurate.

That being said, is everything sorted out now? If so I would like to close the thread.
Former Nagios employee
iopp
Posts: 4
Joined: Wed Jan 02, 2013 10:25 am

Re: AWS boto3 cannot see credentials or $USER macros

Post by iopp »

tmcdonald wrote:There is not currently an option to disable the Test Check Command button, but I have mentioned a few times in the past that we should add some wording that the results are not always 100% accurate.

That being said, is everything sorted out now? If so I would like to close the thread.
yep, all works for me
Locked