Page 1 of 3

Trouble adding check-aacraid.py to Nagios

Posted: Sun Mar 03, 2013 9:28 am
by Grizzly
Hi all, I've been trying to add a plugin to Nagios to allow me to check my RAID config remotely and ultimately have Nagios send me an e-mail when a drive is failing, I've tried a couple of different plugins and configurations that should allow me to do this but I haven't had much luck getting them to work.
For the sake of this thread the plugin I would like to add is Adaptec RAID Check by Anchor Systems http://exchange.nagios.org/directory/Pl ... ms/details. Following the instructions on that page I get as far as defining the new services, I had no cfg file called servicedefs.cfg, so I assumed this was a placeholder name and I tried defining the services in localhost.cfg instead. But then when I try to reload or restart nagios I get a config error.
I'm almost certain I'm simply missing something but I have no idea what.
I have nagios core 3.4.4, nagios plugins 1.4.16 and nrpe 2.13 all running on Ubuntu 12.04.

Re: Trouble adding check-aacraid.py to Nagios

Posted: Mon Mar 04, 2013 11:14 am
by slansing
Can you please run the following and let us know the output:

Code: Select all

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
This should give a good direction to work in. It will verify the configuration integrity and output the first error it complains about.

Re: Trouble adding check-aacraid.py to Nagios

Posted: Mon Mar 04, 2013 12:33 pm
by Grizzly
Hi slansing, output from that command is as follows:

Code: Select all

Nagios Core 3.4.4
Copyright (c) 2009-2011 Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 01-12-2013
License: GPL

Website: http://www.nagios.org
Reading configuration data...
   Read main config file okay...
Processing object config file '/usr/local/nagios/etc/objects/commands.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/contacts.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/timeperiods.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/templates.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/localhost.cfg'...
Error: Template 'low-service-level' specified in service definition could not be not found (config file '/usr/local/nagios/etc/objects/localhost.cfg', starting on line 158)
Error: Invalid max_attempts, check_interval, retry_interval, or notification_interval value for service 'aacraid' on host 'localhost'
Error: Could not register service (config file '/usr/local/nagios/etc/objects/localhost.cfg', starting on line 167)
   Error processing object config files!


***> One or more problems was encountered while processing the config files...

     Check your configuration file(s) to ensure that they contain valid
     directives and data defintions.  If you are upgrading from a previous
     version of Nagios, you should be aware that some variables/definitions
     may have been removed or modified in this version.  Make sure to read
     the HTML documentation regarding the config files, as well as the
     'Whats New' section to find out what has changed.

Re: Trouble adding check-aacraid.py to Nagios

Posted: Mon Mar 04, 2013 1:35 pm
by abrist
The following two errors will need to be resolved before you can write out the config files:

Code: Select all

Error: Template 'low-service-level' specified in service definition could not be not found (config file '/usr/local/nagios/etc/objects/localhost.cfg', starting on line 158)
Error: Invalid max_attempts, check_interval, retry_interval, or notification_interval value for service 'aacraid' on host 'localhost'
The first implies that you have defined the template "low-service-level" in the "localhost.cfg" service definition but you have not actually created a template definition for the "low-service-level" template. You will either need to remove the "low-service-level" template definition from the localhost.cfg file, or create the template itself.

You are missing the required values "Invalid max_attempts, check_interval, retry_interval, or notification_interval value" for your aacraid definition on localhost as well. You can usually get the default settings by include the "generic-service" template for the "use" directive for the service.

Re: Trouble adding check-aacraid.py to Nagios

Posted: Mon Mar 04, 2013 2:10 pm
by Grizzly
Thanks for the help abrist! Changed aacraid-service to use generic-service instead of low-local-service (since I had no definition for it), then used the command slansing gave me to fix some other errors with my configuration of the new services and hey presto Nagios restarted fine and I now have aacraid under services.
Now for the new problem, when I perform a check on the service it returns Current Status: UNKNOWN and Status Information: NRPE: Unable to read output. When I run the script NRPE is calling in the command line I get a normal looking string as a return value so I'm not sure why it would be doing this.

Re: Trouble adding check-aacraid.py to Nagios

Posted: Mon Mar 04, 2013 2:20 pm
by abrist
Try running the command again from the cli as the user "nagios". If you get a permission denied, you may need to set up sudoers for the user nagios and the aacraid binary. You may also have to suid the aacraid bin as well. But first step is to try running the check as nagios instead of root (as most admins setting up nagios work as root or a user with wheel).

Re: Trouble adding check-aacraid.py to Nagios

Posted: Mon Mar 04, 2013 3:16 pm
by Grizzly
Ran the command as nagios with no problem, then did

Code: Select all

chmod u+s /usr/local/sbin/check-aacraid.py
chmod u+s /usr/StorMan/arcconf
as root, I think one of those is what you meant by "suid the aacraid bin". Sorry if it's not I've only been using Linux for a couple of months so I don't know all of the terminology yet. I'm still getting the same status information in Nagios after these steps.

Re: Trouble adding check-aacraid.py to Nagios

Posted: Mon Mar 04, 2013 3:21 pm
by abrist
What is your service definition for the aacraid check? What is your command definition for the same check?

Re: Trouble adding check-aacraid.py to Nagios

Posted: Mon Mar 04, 2013 3:29 pm
by Grizzly
Command definition is:

Code: Select all

define command { 
command_name check_aacraid 
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_aacraid 
} 
Service definition for that command is:

Code: Select all

define service { 
use generic-service
name aacraid-service 
service_description aacraid 
check_command check_aacraid 
register 0 
notification_interval 3600 
} 
And a second service that needed to be defined is:

Code: Select all

define service { 
use aacraid-service 
host_name localhost
contact_groups admins
} 

Re: Trouble adding check-aacraid.py to Nagios

Posted: Mon Mar 04, 2013 3:36 pm
by abrist
This is local check to the core box, correct?
If so, then you do not need to use nrpe for the check, and you can just setup a command for the python file itself.

If you do wish to use nrpe, you will have to configure the command in your nrpe.cfg file as well.