service checks failing from scheduler, not cli

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
kendallchenoweth
Posts: 195
Joined: Fri Sep 13, 2013 10:43 am

service checks failing from scheduler, not cli

Post by kendallchenoweth »

I have three checks that run from the command line (taken from from the service test command) that work but fail when run by the Nagios scheduler. This same check is working for other service checks but not against three hosts.

/usr/local/nagios/libexec/mw_check_mathworks_app.pl -H services-integ1.mathworks.com -a activationws -p 80
OK: All tests passed

The same check running from the scheduler produces a critical 404 not found.

What is the best debug mode for Nagios to see what's happening when the scheduler runs this check?

Thanks in advance.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: service checks failing from scheduler, not cli

Post by lmiltchev »

Can you show us the service and command definitions for this check? Can you run the same command in the CLI as nagios user and get the correct output?
Be sure to check out our Knowledgebase for helpful articles and solutions!
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: service checks failing from scheduler, not cli

Post by slansing »

Can you show us the command definition you are using for the service check, as well as the service's own configuration so we can take a look at what you've got? Nagios is going to run the command you give it, so it is important you pair the arguments correctly, etc. That is what we want to take a look at.
kendallchenoweth
Posts: 195
Joined: Fri Sep 13, 2013 10:43 am

Re: service checks failing from scheduler, not cli

Post by kendallchenoweth »

/usr/local/nagios/libexec/mw_check_mathworks_app.pl -H services-integ1.mathworks.com -a activationws -p 80

Code: Select all

define service {
	service_description		HTTP /activationws 80
	use				webvip-24x7-service_notify
	hostgroup_name			services-integ
	display_name			HTTP /activationws 80
	check_command			check-mw-app!activationws!-p 80!!!!!!
	obsess_over_service		1
	register			1
	}	

Code: Select all

define command {
       command_name                             check-mw-app
       command_line                             $USER1$/mw_check_mathworks_app.pl -H $HOSTNAME$ -a $ARG1$ $ARG2$
}
mw_check_mathworks_app.pl

Code: Select all

#!/usr/bin/perl -w
####################### check_apachestatus.pl #######################
# Version : 1.1
# Date : 27 Jul 2007
# Author  : De Bodt Lieven (Lieven.DeBodt at gmail.com)
# Licence : GPL - http://www.fsf.org/licenses/gpl.txt
#############################################################
#
# help : ./check_passenger_app.pl -h

use strict;
use Getopt::Long;
use LWP::UserAgent;
use Time::HiRes qw(gettimeofday tv_interval);

# Nagios specific

#use lib "/usr/lib64/nagios/plugins";
use lib "/usr/local/nagios/libexec";
use mw_utils qw(%ERRORS $TIMEOUT);
my %ERRORS=('OK'=>0,'WARNING'=>1,'CRITICAL'=>2,'UNKNOWN'=>3,'DEPENDENT'=>4);

# Globals

my $Version='1.0';
my $Name=$0;

my $o_host =		undef; 		# hostname
my $o_help=		undef; 		# want some help ?
my $o_port = 		undef; 		# port
my $o_version= 		undef;  	# print version
my $o_app= 		undef;  	# app name
my $o_warn_level=	"DEGRADED";  	# Number of available slots that will cause a warning
my $o_crit_level=	"FAILURE";  	# Number of available slots that will cause an error
my $o_timeout=  	60;            	# Default 15s Timeout
my $o_vhost=	  	undef;         	# Virtual Host
my $o_ua=               "MathWorks Nagios App Checker";          # Useragent

# functions

sub show_versioninfo { print "$Name version : $Version\n"; }

sub print_usage {
  print "Usage: $Name -H <host> -a <app> [-p <port>] [-t <timeout>] [-h <vhost>] [-V]\n";
}

# Get the alarm signal
$SIG{'ALRM'} = sub {
  print ("ERROR: Alarm signal (Nagios time-out)\n");
  exit $ERRORS{"CRITICAL"};
};

sub help {
  print "Passenger App Monitor for Nagios version ",$Version,"\n";
  print_usage();
  print <<EOT;
-h, --help
   print this help message
-H, --hostname=HOST
   name or IP address of host to check
-p, --port=PORT
   Http port
-a, --app=APP
   Passenger application name
-u, --useragent=USER-AGENT
   User agent to send the request with
-t, --timeout=INTEGER
   timeout in seconds (Default: $o_timeout)
-v, --vhost=VHOST
   Name based virtual host
-V, --version
   prints version number
Note :
  The script will return
        OK       if the status check returns all OK,
        WARNING  if there is one or more DEGRADED checks,
        CRITICAL if there is one or more FAILURE checks, or the check doesn't respond,
        UNKNOWN

EOT
}

sub check_options {
  Getopt::Long::Configure ("bundling");
  GetOptions(
      'h'     => \$o_help,        'help'          => \$o_help,
      'H:s'   => \$o_host,        'hostname:s'	  => \$o_host,
      'p:i'   => \$o_port,        'port:i'	  => \$o_port,
      'a:s'   => \$o_app,         'app:s'	  => \$o_app,
      'V'     => \$o_version,     'version'       => \$o_version,
      't:i'   => \$o_timeout,     'timeout:i'     => \$o_timeout,
      'v:s'   => \$o_vhost,       'vhost:s'       => \$o_vhost,
      'u:s'   => \$o_ua,          'useragent:s'   => \$o_ua,

  );

  if (defined ($o_help)) { help(); exit $ERRORS{"UNKNOWN"}};
  if (defined($o_version)) { show_versioninfo(); exit $ERRORS{"UNKNOWN"}};
  # Check compulsory attributes
  if (!defined($o_host)) { print_usage(); exit $ERRORS{"UNKNOWN"}};
  if (!defined($o_app)) { print_usage(); exit $ERRORS{"UNKNOWN"}};
  if (!defined($o_vhost)) { $o_vhost=$o_host };
}

########## MAIN ##########

check_options();

my $ua = LWP::UserAgent->new( protocols_allowed => ['http'], timeout => $o_timeout, agent => $o_ua
);
$ua->default_header(Host => $o_vhost);
my $timing0 = [gettimeofday];
my $response = undef;
if (!defined($o_port)) {
  $response = $ua->get('http://' . $o_host . '/' . $o_app . '/admin/status');
} else {
  $response = $ua->get('http://' . $o_host . ':' . $o_port . '/' . $o_app .  '/admin/status');
}
my $timeelapsed = tv_interval ($timing0, [gettimeofday]);
my $status = undef;
my $critical = undef;

my $webcontent = undef;
if ($response->code eq 200 || $response->code eq 500) {
#####if ($response->code eq 200) {
  $status = "OK" }
else {
  $status = "RESPONSE_ERROR" }

if ($status eq "OK") {
  $webcontent=$response->content;
  my @webcontentarr = split("</tr>", $webcontent);
  my $i = 0;
  # Get overall status
  while ($i < @webcontentarr)  {
  #print $webcontentarr[$i] . "\n\n\n";
    for ($webcontentarr[$i]) {
      if    (/<title>(\D+)<\/title>/i) { $status = $1; }
      elsif (/<td.*?>(.*?)<\/td><td.*?>(.*?)<\/td><td.*?>(<strong>)?(.*?)(<\/strong>)?<\/td>/) {
        my $test = $1;
        $critical = $2;
        my $teststat = $4;
        if ( $teststat !~ m/ok/i ) {
          print $teststat . ": " . $test . " test failed" . "\n"; }
      }
    }
    $i++;
  }
}

for ($status) {
  if    ( /OK/i ) {
    print "OK: All tests passed\n";
    exit $ERRORS{"OK"} }
  elsif ( /DEGRADED/i ) { exit $ERRORS{"WARNING"} }
  elsif ( /FAILURE/i ) { exit $ERRORS{"CRITICAL"} }
  else {
    printf("CRITICAL %s\n", $response->status_line);
    exit $ERRORS{"CRITICAL"} }
}
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: service checks failing from scheduler, not cli

Post by sreinhardt »

What happens if you run the command as the nagios user? Also what are the permissions on your plugin?

Code: Select all

su - nagios -c '/usr/local/nagios/libexec/mw_check_mathworks_app.pl -H services-integ1.mathworks.com -a activationws -p 80'
ls -lart /usr/local/nagios/libexec/mw_check_mathworks_app.pl
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
kendallchenoweth
Posts: 195
Joined: Fri Sep 13, 2013 10:43 am

Re: service checks failing from scheduler, not cli

Post by kendallchenoweth »

Code: Select all

[nagios@nagiosxinonprod-00-ls libexec]$ whoami
nagios
[nagios@nagiosxinonprod-00-ls libexec]$ /usr/local/nagios/libexec/mw_check_mathworks_app.pl -H services-integ1.mathworks.com -a activationws -p 80
OK: All tests passed
[nagios@nagiosxinonprod-00-ls libexec]$ ls -lart /usr/local/nagios/libexec/mw_check_mathworks_app.pl
-rwxr-xr-x 1 nagios users 4842 May 20 14:29 /usr/local/nagios/libexec/mw_check_mathworks_app.pl
This same check works in Core 3.4.1 from the CLI And scheduler. I was thinking it had something to do with the DNS redirect through a load balancer.

Code: Select all

nslookup
> services-integ1
Server:		X.X.X.X.
Address:	X.X.X.X#53

services-integ1.mathworks.com	canonical name = lb.mathworks.com.
Name:	lb.mathworks.com
Address: Y.Y.Y.Y
> services-integ2

services-integ2.mathworks.com	canonical name = lb.mathworks.com.
Name:	lb.mathworks.com
Address: Y.Y.Y.Y
> services-integ3

services-integ3.mathworks.com	canonical name = lb.mathworks.com.
Name:	lb.mathworks.com
Address: Y.Y.Y.Y
Because of this wrinkle, I was hoping to find the optimal flags to put into the debug options so I can see how the scheduler is executing the check vs the command line...
kendallchenoweth
Posts: 195
Joined: Fri Sep 13, 2013 10:43 am

Re: service checks failing from scheduler, not cli

Post by kendallchenoweth »

I found a difference in the host resolution and am pursuing the problem there. You can close this ticket. Thanks!
Locked