Omar, I like the person who assigned you this project (I'm hoping they are going to encourage you to share your how-to and plugin when you are done).
Extra info for those not familiar with the device: The WeatherGoose is a standalone appliance used for environmental monitoring. It's got a standard set of monitors built into it for things like noise, light, temperature and humidity at the device. It can have added probes to monitor things that aren't at the device like a remote IDF's temperature, airflow up at a vent, moisture down under the raised floor, power coming in at the wall, cameras and more. It's pretty neat and has a straight forward web interface and you don't get a shell, so no NRPE. (full disclosure, I liked the product enough to become a reseller to get the things at a good price for my own clients). It's got its own monitoring and notifications, but the devices probes can also be monitored via snmp and also have an xml file at
http://weathergooseip/data.xml that you can pull with a read-only authenticated account.
In any case, I've mostly been depending on their built-in notification mechanisms and monitoring a few things through snmp but I like the idea of polling the data.xml for status.
Before commenting on the perl script to do the check - these things jump out at me about the nagios config above:
The 'host_name' looks odd in the host definition, though you can use slashes in names - generally this would be used as a short name for the host for referencing elsewhere in the config. In most cases, you put the actual server name here. Mostly this stands out because that isn't the host_name you use in your service definition.
You probably already have 'localhost' defined since you mentioned having localhost checks. It probably looks like this:
Code: Select all
define host{
use linux-server
host_name localhost
alias localhost
address 127.0.0.1
}
You can use that existing host definition for your test . When you set this up to actually poll your real weathergoose, you can setup another host definition for it using that goose's IP:
Code: Select all
define host{
use linux-server
host_name WG52_1_1
address 192.168.1.whateveryourgooseipis
}
(the 'use' line may not end up being appropriate - you'll want to dig into that service template when you are ready - but it should work fine with the sample config for your tests)
Which brings us to the service definition. It is checking the host_name you list with the check_command you are supplying. In your config above, you have '127.0.0.1.data.xml' as the host_name in the service definition. If you *really* want to use that host_name of 127.0.0.1/data.xml in your host definition, you'll need to make that match in the service definition. However, in this case, I would just make it check 'localhost':
Code: Select all
define service{
use generic-service
host_name localhost
description Check Calculus Temperature #<--heh, derivatives and integrals are cool.
check_command WeatherGooseII.pl
}
That gets to the command definition - this may be a typo as well, but the command_line has $User$ in it - is that right? I suppose it is possible that in your specific nagios config $User$ exists and is set to '/usr/local/nagios/libexec', but normally the location of your plugins will be in $USER1$.
Check out the macro list - the bit that applies here is the $USERn$ sections.
$USERn$ is often defined in a resource.cfg or private/resource.cfg in your nagios config dir (but can technically be elsewhere in the nagios config). The $USER#$ variables are used for things that are specific to your nagios setup - some people put users and passwords in here and then keep that file locked down. Example, $USER5$ could be set to the 'view' user on your weathergoose, $USER6$ could be set to the 'view' user's password on the weathergoose. Then once your check command gets fancy and accepts a user and password as arguments, you could keep the config clean and have a service definition and command definition that looks like this:
Code: Select all
define service{
use generic-service
host_name localhost
description Check Calculus Temperature #<--heh, derivatives and integrals are cool.
check_command WeatherGooseII.pl!$USER5$!$USER6$
}
define command{
command_name WeatherGooseII.pl
command_line $USER1$/WeatherGooseII.pl –H $HOSTADDRESS$ --user=$ARG1$ --password=$ARG2$
}
(don't try that yet, but you'll almost definitely want to try something like the above as your develop your plugin)
So, the other bit that jumps out at me about your command definition - you have arguments being passed to the perl script that don't appear to be in your perl script. This is what you are currently making nagios run:
Code: Select all
WeatherGooseII.pl –H $HOSTADDRESS$ -p $ARG1$
Your perl script doesn't do anything with arguments currently. That probably *is* the way you will be going in the long run (with even more arguments defining which field you want to check and what the warning and error levels are). $HOSTADDRESS$ will get filled in with the value from the 'address' part of your host definition. $ARG1$ is the first argument you pass in the service definition.
What your check from nagios is running is (assuming you correct the $USER1$ if that wasn't a typo, and the host_name in the service def):
Code: Select all
/usr/local/nagios/libexec/WeatherGooseII.pl –H 127.0.0.1 -p
I dont' see anything in your script that'll crater due to passing it the extra arguments currently, but eventually that will matter.
This bit of the documentation is going to help you out here
I'm assuming once you get the command_line fixed, and get the service definition to have the right host_name, your check will probably start doing something. If you aren't already doing it, you want to be checking your nagios config when you make a change with
Code: Select all
nagios -v /locationofyour/nagios.cfg
- it should catch things like that service definition having a bad host_name.
Ok, other bits worth noting - right now your perl script is reading directly from the file system for that xml file. When you switch it to monitoring the weathergoose, you are going to be doing it through an http request. The fact that you are placing the file in /var/www/html suggests you already intend to work your way up to checking it via
http://127.0.0.1/data.xml in your tests. If you wanted to get even closer to the weathergoose's setup, you'll want to password protect that data.xml file with the 'view' user and password you'll be using for the real check. For the http request, the package you probably want to use to do this in perl is LWP. You can't pass a URL straight to XMLin, but you *can* pass it a scalar that has the content of the xml in it. LWP will let you populate that scalar with a few lines. At the risk of doing your homework, here's how!
Code: Select all
#!/usr/bin/perl
#use LWP - if you dont have this, install with cpan, or
# if using a fedora/redhat/centos setup, yum install perl-libwww-perl
use LWP;
#set some base variables - you'll probably pass some of these as arguments eventually
my $gooseaddress='127.0.0.1';
my $gooseviewuser='view';
my $gooseviewpass='view';
#just to keep the request looking pretty below, prebuild the URL with the xml file
my $goosedataxmllocation='http://'.$gooseaddress.'/data.xml';
#setup the web request
my $ua = LWP::UserAgent->new;
#make the request
my $req = HTTP::Request->new( GET => $goosedataxmllocation );
# authenticate - for your test setup,
# if you aren't requiring a user and password, just comment this next line
$req->authorization_basic($gooseviewuser, $gooseviewpass);
#response
my $res = $ua->request( $req );
# get the content and store it in the $goosedataxml variable
my $goosedataxml = $res->content();
It's unlikely you'll need more than that out of LWP, but if you want,
read more about LWP. It'd probably be good form to have exit(3) statuses for the web request in case anything goes wrong while trying to poll the data.xml. That exercise is up to you.
So, now that $goosedataxml has all the xml goodness in it, you can pump it straight into your read_xml subroutine:
Code: Select all
my $xml = read_xml( $goosedataxml );
Other thoughts - you have the Data::Dumper module listed in your script, if you aren't using it to look at the xml contents, do so!
Assuming you have your xml already pumped into the $goosedataxml scalar, you can dump that out in a nice readable format with:
Code: Select all
print Dumper (XML::Simple->new()->XMLin( $goosedataxml ));
It will give you something that looks like this:
Code: Select all
$VAR1 = {
'support-email' => '[email protected]',
'pversion' => '6.03',
'demo-mode' => 'N',
'datetime' => 'Sat, 01/25/14 23:19:58',
'product-version' => '3.9.5',
'cameras' => {},
'support-phone' => '512.257.xxxx',
'console-id' => 'chh',
'uptime' => '4546313',
'address' => '192.168.123.45',
'company' => 'I.T. Watchdogs, Inc.',
'tempunit' => 'F',
'owl-version' => 'CB_1025',
'mac-address' => '00:19:85:XX:XX:XX',
'company-url' => 'http://www.itwatchdogs.com',
'name' => 'WeatherGoose II',
'devices' => {
'device' => {
'WeatherGoose II' => {
'index' => '0',
'id' => '1234567890ABCDEF',
'type' => 'Clim-THLFSA3',
'available' => '1',
'field' => {
'TempC' => {
'value' => '22.84',
'min' => '-40.0',
'order' => '0',
'max' => '123.8',
'niceName' => 'Temperature (C)',
'type' => '0'
},
'Light' => {
'value' => '1',
'min' => '1.0',
'order' => '3',
'max' => '100.0',
'niceName' => 'Light Level',
'type' => '2'
},
}
}
}
},
'host' => 'WeatherGoose II'
};
Which brings me to this weathergoose specific bit, and hopefully dumping your xml like the above will show it to you, too.
One of the values you are pulling is the celcius temperature like this:
Code: Select all
my $tempCValue = $xml->{'devices'}->{'device'}->{'field'}->{'TempC'}->{'value'};
If you actually look at your xml output, the xml should be devices -> device ->
devicename -> field -> TempC.
In your case,
Code: Select all
my $tempCValue = $xml->{'devices'}->{'device'}->{'WG52_1_1'}->{'field'}->{'TempC'}->{'value'};
This probably doesn't jump out at your since it looks like the sample file is from a weather goose using just the built in probes. If you have additional external probes, they would show up as additional names. (I don't know 100% that a goose without any external probes doesn't eliminate that level, but glancing at your sample compared to one of my own weathergooseII's, I believe you should have that level regardless of external probes). Dumping the xml with the datadumper should let you visually identify this when you see the indents:
Code: Select all
'devices' => {
'device' => {
'WeatherGoose II' => {
'field' => {
'TempC' => {
'value' => '22.84',
Alright, so this next bit is more of a suggestion - I am a big fan of reusability and making checks flexible. I mentioned earlier that you may want to consider passing the field you want to check as an argument. Consider the following:
Code: Select all
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use XML::Simple;
use LWP;
#pass these in as $HOSTADDRESS$ as first argument, and the view user/pass as following arguments
# for now, I'll set them statically
my $gooseaddress='127.0.0.1';
my $gooseviewuser='view';
my $gooseviewpass='view';
#these would normally get passed in as arguments,
# for now, I've hard set them
my $goosedevice='WG52_1_1';
my $goosedevicefield='TempC';
my $goosedevicevalue=0;
#build the url
my $goosedataxmllocation='http://'.$gooseaddress.'/data.xml';
#setup the web request
my $ua = LWP::UserAgent->new;
#make the request
my $req = HTTP::Request->new( GET => $goosedataxmllocation );
# authenticate
$req->authorization_basic($gooseviewuser, $gooseviewpass);
#response
my $res = $ua->request( $req );
# get the content
my $goosedataxml = $res->content();
sub read_xml {
my $file = shift;
my $xml = XML::Simple->new();
my $data = XMLin( $file );
return $data;
}
#so instead of passing it a file, pass it the $goosedataxml that has the xml in it
my $xml = read_xml( $goosedataxml );
$goosedevicevalue=$xml->{'devices'}->{'device'}->{$goosedevice}->{'field'}->{$goosedevicefield}->{'value'};
print 'Host: '.$xml->{'host'}."\n";
print 'ValueBeingChecked: '.$goosedevicevalue."\n";
Assuming you passed the $goosedevicefield and the $goosedevice in as arguments to the script instead of hard setting them, you now have a perl script that will dynamically check a field from the weathergoose. You can do comparisons on $goosedevicevalue. If you pass additional arguments in as $goosedevicevaluewarning and $goosedevicevaluecritical, you can change your if-elsif structure to be more along the lines of
Code: Select all
if($goosedevicevalue > $goosedevicecritical){
#critical, exit status is 2
}elsif($goosedevicevalue >= $goosedevicewarning && $goosedevicevalue <= $goosedevicecritical){
#warning, exit status is 1
}elsif($goosedevicevalue < $goosedevicewarning){
#ok!, exit status is 0
}else{
#unknown! exit status is 3
}
and you now have something that isn't specific to temperature. You can check airflow, humidity, sound. Whatever happens to be part of your weathergoose's build and you don't have to rewrite your plugin when the IT manager springs for the remote airflow monitor! You just need to add a service definition in nagios for a new type of check.
One last suggestion - put your exit status in a variable and default it to 'unknown' at the beginning of the script.
and then anytime you want to change the exit status, like in that if-elsif-else structure, you just set that one variable:
Code: Select all
if($goosedevicevalue > $goosedevicecritical){
#critical, exit status is 2
$nagiosexit=2;
}elsif($goosedevicevalue >= $goosedevicewarning && $goosedevicevalue <= $goosedevicecritical){
#warning, exit status is 1
$nagiosexit=1;
}elsif($goosedevicevalue < $goosedevicewarning){
#ok!, exit status is 0
$nagiosexit=0;
}else{
#unknown! exit status is 3
$nagiosexit=3;
}
And then actually do your exit at the end of the script:
(some people like using an array/hash so then can set the value to 'ok/critical/etc' in text; there's a Nagios::Plugin module that handles some of this if you want to use it). An advantage to holding your exit until the end of the script is you can also build the output along the way and then print it to stdout before your exit. If you decide to go the route of a single check that is custom written for all of your weather goose probes at once, you'll want the script to keep processing the rest of the probe checks; not just exit at the first one to meet a failure state (I still like the route of multiple service checks).
Oh - and seriously - good job on the ' else { exit 3;}' at the end. I hate running into plugins that don't put an 'unknown' exit in an 'else' condition . . . and then you end up with a check that doesn't alert when the unexpected condition comes up. For a first stab at a perl plugin in nagios, you've got a pretty good start.