I have three checks that run from the command line (taken from from the service test command) that work but fail when run by the Nagios scheduler. This same check is working for other service checks but not against three hosts.
/usr/local/nagios/libexec/mw_check_mathworks_app.pl -H services-integ1.mathworks.com -a activationws -p 80
OK: All tests passed
The same check running from the scheduler produces a critical 404 not found.
What is the best debug mode for Nagios to see what's happening when the scheduler runs this check?
Thanks in advance.
service checks failing from scheduler, not cli
-
kendallchenoweth
- Posts: 195
- Joined: Fri Sep 13, 2013 10:43 am
Re: service checks failing from scheduler, not cli
Can you show us the service and command definitions for this check? Can you run the same command in the CLI as nagios user and get the correct output?
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: service checks failing from scheduler, not cli
Can you show us the command definition you are using for the service check, as well as the service's own configuration so we can take a look at what you've got? Nagios is going to run the command you give it, so it is important you pair the arguments correctly, etc. That is what we want to take a look at.
-
kendallchenoweth
- Posts: 195
- Joined: Fri Sep 13, 2013 10:43 am
Re: service checks failing from scheduler, not cli
/usr/local/nagios/libexec/mw_check_mathworks_app.pl -H services-integ1.mathworks.com -a activationws -p 80
mw_check_mathworks_app.pl
Code: Select all
define service {
service_description HTTP /activationws 80
use webvip-24x7-service_notify
hostgroup_name services-integ
display_name HTTP /activationws 80
check_command check-mw-app!activationws!-p 80!!!!!!
obsess_over_service 1
register 1
} Code: Select all
define command {
command_name check-mw-app
command_line $USER1$/mw_check_mathworks_app.pl -H $HOSTNAME$ -a $ARG1$ $ARG2$
}Code: Select all
#!/usr/bin/perl -w
####################### check_apachestatus.pl #######################
# Version : 1.1
# Date : 27 Jul 2007
# Author : De Bodt Lieven (Lieven.DeBodt at gmail.com)
# Licence : GPL - http://www.fsf.org/licenses/gpl.txt
#############################################################
#
# help : ./check_passenger_app.pl -h
use strict;
use Getopt::Long;
use LWP::UserAgent;
use Time::HiRes qw(gettimeofday tv_interval);
# Nagios specific
#use lib "/usr/lib64/nagios/plugins";
use lib "/usr/local/nagios/libexec";
use mw_utils qw(%ERRORS $TIMEOUT);
my %ERRORS=('OK'=>0,'WARNING'=>1,'CRITICAL'=>2,'UNKNOWN'=>3,'DEPENDENT'=>4);
# Globals
my $Version='1.0';
my $Name=$0;
my $o_host = undef; # hostname
my $o_help= undef; # want some help ?
my $o_port = undef; # port
my $o_version= undef; # print version
my $o_app= undef; # app name
my $o_warn_level= "DEGRADED"; # Number of available slots that will cause a warning
my $o_crit_level= "FAILURE"; # Number of available slots that will cause an error
my $o_timeout= 60; # Default 15s Timeout
my $o_vhost= undef; # Virtual Host
my $o_ua= "MathWorks Nagios App Checker"; # Useragent
# functions
sub show_versioninfo { print "$Name version : $Version\n"; }
sub print_usage {
print "Usage: $Name -H <host> -a <app> [-p <port>] [-t <timeout>] [-h <vhost>] [-V]\n";
}
# Get the alarm signal
$SIG{'ALRM'} = sub {
print ("ERROR: Alarm signal (Nagios time-out)\n");
exit $ERRORS{"CRITICAL"};
};
sub help {
print "Passenger App Monitor for Nagios version ",$Version,"\n";
print_usage();
print <<EOT;
-h, --help
print this help message
-H, --hostname=HOST
name or IP address of host to check
-p, --port=PORT
Http port
-a, --app=APP
Passenger application name
-u, --useragent=USER-AGENT
User agent to send the request with
-t, --timeout=INTEGER
timeout in seconds (Default: $o_timeout)
-v, --vhost=VHOST
Name based virtual host
-V, --version
prints version number
Note :
The script will return
OK if the status check returns all OK,
WARNING if there is one or more DEGRADED checks,
CRITICAL if there is one or more FAILURE checks, or the check doesn't respond,
UNKNOWN
EOT
}
sub check_options {
Getopt::Long::Configure ("bundling");
GetOptions(
'h' => \$o_help, 'help' => \$o_help,
'H:s' => \$o_host, 'hostname:s' => \$o_host,
'p:i' => \$o_port, 'port:i' => \$o_port,
'a:s' => \$o_app, 'app:s' => \$o_app,
'V' => \$o_version, 'version' => \$o_version,
't:i' => \$o_timeout, 'timeout:i' => \$o_timeout,
'v:s' => \$o_vhost, 'vhost:s' => \$o_vhost,
'u:s' => \$o_ua, 'useragent:s' => \$o_ua,
);
if (defined ($o_help)) { help(); exit $ERRORS{"UNKNOWN"}};
if (defined($o_version)) { show_versioninfo(); exit $ERRORS{"UNKNOWN"}};
# Check compulsory attributes
if (!defined($o_host)) { print_usage(); exit $ERRORS{"UNKNOWN"}};
if (!defined($o_app)) { print_usage(); exit $ERRORS{"UNKNOWN"}};
if (!defined($o_vhost)) { $o_vhost=$o_host };
}
########## MAIN ##########
check_options();
my $ua = LWP::UserAgent->new( protocols_allowed => ['http'], timeout => $o_timeout, agent => $o_ua
);
$ua->default_header(Host => $o_vhost);
my $timing0 = [gettimeofday];
my $response = undef;
if (!defined($o_port)) {
$response = $ua->get('http://' . $o_host . '/' . $o_app . '/admin/status');
} else {
$response = $ua->get('http://' . $o_host . ':' . $o_port . '/' . $o_app . '/admin/status');
}
my $timeelapsed = tv_interval ($timing0, [gettimeofday]);
my $status = undef;
my $critical = undef;
my $webcontent = undef;
if ($response->code eq 200 || $response->code eq 500) {
#####if ($response->code eq 200) {
$status = "OK" }
else {
$status = "RESPONSE_ERROR" }
if ($status eq "OK") {
$webcontent=$response->content;
my @webcontentarr = split("</tr>", $webcontent);
my $i = 0;
# Get overall status
while ($i < @webcontentarr) {
#print $webcontentarr[$i] . "\n\n\n";
for ($webcontentarr[$i]) {
if (/<title>(\D+)<\/title>/i) { $status = $1; }
elsif (/<td.*?>(.*?)<\/td><td.*?>(.*?)<\/td><td.*?>(<strong>)?(.*?)(<\/strong>)?<\/td>/) {
my $test = $1;
$critical = $2;
my $teststat = $4;
if ( $teststat !~ m/ok/i ) {
print $teststat . ": " . $test . " test failed" . "\n"; }
}
}
$i++;
}
}
for ($status) {
if ( /OK/i ) {
print "OK: All tests passed\n";
exit $ERRORS{"OK"} }
elsif ( /DEGRADED/i ) { exit $ERRORS{"WARNING"} }
elsif ( /FAILURE/i ) { exit $ERRORS{"CRITICAL"} }
else {
printf("CRITICAL %s\n", $response->status_line);
exit $ERRORS{"CRITICAL"} }
}-
sreinhardt
- -fno-stack-protector
- Posts: 4366
- Joined: Mon Nov 19, 2012 12:10 pm
Re: service checks failing from scheduler, not cli
What happens if you run the command as the nagios user? Also what are the permissions on your plugin?
Code: Select all
su - nagios -c '/usr/local/nagios/libexec/mw_check_mathworks_app.pl -H services-integ1.mathworks.com -a activationws -p 80'
ls -lart /usr/local/nagios/libexec/mw_check_mathworks_app.plNagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
-
kendallchenoweth
- Posts: 195
- Joined: Fri Sep 13, 2013 10:43 am
Re: service checks failing from scheduler, not cli
Code: Select all
[nagios@nagiosxinonprod-00-ls libexec]$ whoami
nagios
[nagios@nagiosxinonprod-00-ls libexec]$ /usr/local/nagios/libexec/mw_check_mathworks_app.pl -H services-integ1.mathworks.com -a activationws -p 80
OK: All tests passed
[nagios@nagiosxinonprod-00-ls libexec]$ ls -lart /usr/local/nagios/libexec/mw_check_mathworks_app.pl
-rwxr-xr-x 1 nagios users 4842 May 20 14:29 /usr/local/nagios/libexec/mw_check_mathworks_app.plCode: Select all
nslookup
> services-integ1
Server: X.X.X.X.
Address: X.X.X.X#53
services-integ1.mathworks.com canonical name = lb.mathworks.com.
Name: lb.mathworks.com
Address: Y.Y.Y.Y
> services-integ2
services-integ2.mathworks.com canonical name = lb.mathworks.com.
Name: lb.mathworks.com
Address: Y.Y.Y.Y
> services-integ3
services-integ3.mathworks.com canonical name = lb.mathworks.com.
Name: lb.mathworks.com
Address: Y.Y.Y.Y-
kendallchenoweth
- Posts: 195
- Joined: Fri Sep 13, 2013 10:43 am
Re: service checks failing from scheduler, not cli
I found a difference in the host resolution and am pursuing the problem there. You can close this ticket. Thanks!