Page 1 of 2
SSH Proxy command is not working
Posted: Wed Jun 18, 2014 10:04 am
by isat-iad
Hi,
We have a slightly bizarre issue, where an SSH Proxy command doesn't work from Nagios XI but it works from the command line.
From the command line:
Code: Select all
$ /usr/local/nagios/libexec/check_by_ssh -H hostname -C "/opt/nagios/libexec/check_disk -d /export/home -c 90 -w 80"
OK - /export/home space used=61% | '/export/home usage'=61%;80;90;
$ echo $?
0
$
From the Nagios XI interface, we get:
(Return code of 255 is out of bounds)
The permissions on the SSH keys are correct and key verification has been done for both name and IP of the monitored host.
When we try to run "Test Check Command" from CCM, we get a different error:
Code: Select all
OUTPUT: Remote command execution failed: Could not create directory '/var/www/.ssh'.
Or if we pass the '-E' flag to the check_by_ssh command, then we get:
Code: Select all
OUTPUT: UNKNOWN - check_by_ssh: Remote command '/opt/nagios/libexec/check_disk -d /export/home -c 90 -w 80' returned status 255
(Which it doesn't - it returns 0, 1 or 2)
We tried single quotes on the command instead of double quotes but no luck.
Any ideas?
Thanks,
Michael
Re: SSH Proxy command is not working
Posted: Wed Jun 18, 2014 12:33 pm
by lmiltchev
What is the "-d" flag? I don't see this as a option in the usage page... It seems like that if the thresholds are placed AFTER the path in the command, they are not being considered... For example:
This will should work correctly:
Code: Select all
[nagios@testbox libexec]$ ./check_by_ssh -H 192.168.x.x -C "/usr/local/nagios/libexec/check_disk -w 20% -c 10% /"
DISK OK - free space: / 6983 MB (41% inode=86%);| /=9796MB;14137;15904;0;17672
[nagios@testbox libexec]$ ./check_by_ssh -H 192.168.x.x -C "/usr/local/nagios/libexec/check_disk -w 45% -c 35% /"
DISK WARNING - free space: / 6983 MB (41% inode=86%);| /=9796MB;9719;11486;0;17672
[nagios@testbox libexec]$ ./check_by_ssh -H 192.168.x.x -C "/usr/local/nagios/libexec/check_disk -w 60% -c 50% /"
DISK CRITICAL - free space: / 6983 MB (41% inode=86%);| /=9796MB;7068;8836;0;17672
but this won't (it will always show "OK"):
Code: Select all
[nagios@testbox libexec]$ ./check_by_ssh -H 192.168.x.x -C "/usr/local/nagios/libexec/check_disk / -w 20% -c 10%"
DISK OK - free space: / 6983 MB (41% inode=86%);| /=9796MB;;;0;17672
You have new mail in /var/spool/mail/root
[nagios@testbox libexec]$ ./check_by_ssh -H 192.168.x.x -C "/usr/local/nagios/libexec/check_disk / -w 45% -c 35%"
DISK OK - free space: / 6982 MB (41% inode=86%);| /=9796MB;;;0;17672
[nagios@testbox libexec]$ ./check_by_ssh -H 192.168.x.x -C "/usr/local/nagios/libexec/check_disk / -w 60% -c 50%"
DISK OK - free space: / 6982 MB (41% inode=86%);| /=9796MB;;;0;17672
Don't worry about this output:
OUTPUT: Remote command execution failed: Could not create directory '/var/www/.ssh'.
Testing from the CCM is not 100% reliable (it doesn't work with all checks) and it's not supposed to be a substitute for testing from the CLI.
Go to the CCM->Services, and modify the text in the $ARG1$ field as such:
Code: Select all
-C "/opt/nagios/libexec/check_disk -w 20 -c 10 /export/home"
Save and Apply Configuration. Go to Home->Service Detail, click on the service and schedule an immediate check to see if you are going to get the correct output this time.
Note: You can use whatever thresholds you need to. This is just an example.
Re: SSH Proxy command is not working
Posted: Thu Jun 19, 2014 4:11 am
by isat-iad
service_OK.png
Hi,
I should have mentioned, that we are using custom scripts on this monitored host - the check_disk script is not the same as the check_disk script supplied by Nagios, that's why it has a '-d' flag. Having said that, we have exactly the same custom script running on other servers and it works without an issue. And we have other custom scripts that don't work for that specific host. Which makes me think that there may be something to do with the version of SSH (SunSSH for Solaris 9 on Sparc).
But what puzzles me is that the check works every time when I run it from the CLI of the Nagios XI box against that host. Furthermore, when I click "schedule an immediate check" from the Nagios XI service screen, I see it working (Green) the first time and subsequently it changes to CRITICAL with this message "(Return code of 255 is out of bounds)".
Thanks,
Michael
Re: SSH Proxy command is not working
Posted: Thu Jun 19, 2014 11:40 am
by lmiltchev
Can you run the script locally on the remote machine as a nagios user, or only as root? Did you use the following document for setting up check_by_ssh?
http://assets.nagios.com/downloads/nagi ... ng_SSH.pdf
Run the following commands from the CLI on the Nagios XI server and show us the output:
Code: Select all
su nagios
/usr/local/nagios/libexec/check_by_ssh -H <remoteip> -C "/opt/nagios/libexec/check_disk -d /export/home -c 90 -w 80"
Re: SSH Proxy command is not working
Posted: Fri Jun 20, 2014 3:58 am
by isat-iad
The scripts on the remote machine run as the nagios user and we did follow the document for setting them up - we have exactly the same scripts working on other machines.
When I run the command manually from the CLI of the Nagios XI box, it always works, and it works both using the hostname and the IP address of the remote machine. It is only from the Nagios XI GUI where it doesn't work....
Code: Select all
[nagios@nagiosxi ~]$ /usr/local/nagios/libexec/check_by_ssh -H hostname -C "/opt/nagios/libexec/check_disk -d /export/home -c 90 -w 80"
OK - /export/home space used=61% | '/export/home usage'=61%;80;90;
[nagios@nagiosxi ~]$ /usr/local/nagios/libexec/check_by_ssh -H remoteip -C "/opt/nagios/libexec/check_disk -d /export/home -c 90 -w 80"
OK - /export/home space used=61% | '/export/home usage'=61%;80;90;
[nagios@nagiosxi ~]$ /usr/local/nagios/libexec/check_by_ssh -H remoteip -C "/opt/nagios/libexec/check_disk -d /export/home -c 90 -w 80"
OK - /export/home space used=61% | '/export/home usage'=61%;80;90;
[nagios@nagiosxi ~]$ /usr/local/nagios/libexec/check_by_ssh -H remoteip -C "/opt/nagios/libexec/check_disk -d /export/home -c 90 -w 80"
OK - /export/home space used=61% | '/export/home usage'=61%;80;90;
[nagios@nagiosxi ~]$ /usr/local/nagios/libexec/check_by_ssh -H remoteip -C "/opt/nagios/libexec/check_disk -d /export/home -c 90 -w 80"
OK - /export/home space used=61% | '/export/home usage'=61%;80;90;
[nagios@nagiosxi ~]$ /usr/local/nagios/libexec/check_by_ssh -H remoteip -C "/opt/nagios/libexec/check_disk -d /export/home -c 90 -w 80"
OK - /export/home space used=61% | '/export/home usage'=61%;80;90;
[nagios@nagiosxi ~]$ /usr/local/nagios/libexec/check_by_ssh -H remoteip -C "/opt/nagios/libexec/check_disk -d /export/home -c 90 -w 80"
OK - /export/home space used=61% | '/export/home usage'=61%;80;90;
[nagios@nagiosxi ~]$ /usr/local/nagios/libexec/check_by_ssh -H remoteip -C "/opt/nagios/libexec/check_disk -d /export/home -c 90 -w 80"
OK - /export/home space used=61% | '/export/home usage'=61%;80;90;
[nagios@nagiosxi ~]$ /usr/local/nagios/libexec/check_by_ssh -H remoteip -C "/opt/nagios/libexec/check_disk -d /export/home -c 90 -w 80"
OK - /export/home space used=61% | '/export/home usage'=61%;80;90;
[nagios@nagiosxi ~]$ id
uid=501(nagios) gid=100(users) groups=100(users),501(nagios),502(nagcmd)
[nagios@nagiosxi ~]$
Re: SSH Proxy command is not working
Posted: Fri Jun 20, 2014 1:45 pm
by slansing
Can we take a look at your check_by_ssh command in XI? The one you are using for these services? Navigate to Configure > CCM > Commands > "name of the check_command you are using > Copy it's command_line, and post it here. Can we also see the service configuration for either this service or the Disk Health one? Are you using the exact same command in nagios for other servers? Or do you have multiple derivatives? Are the other services on that host also using the same check_command? check by ssh?
Re: SSH Proxy command is not working
Posted: Mon Jun 23, 2014 9:22 am
by isat-iad
Hello,
The command line for disk_health.ksh is
The command line for check_by_ssh is
Code: Select all
$USER1$/check_by_ssh -H $HOSTADDRESS$ $ARG1$ $ARG2$
The service configuration file is below:
Code: Select all
[root@nagiosxi services]# cat solaris_disk_health.cfg
###############################################################################
#
# Service configuration file
#
# Created by: Nagios QL Version 3.0.3
# Date: 2014-06-20 11:01:43
# Version: Nagios 3.x config file
#
# --- DO NOT EDIT THIS FILE BY HAND ---
# Nagios QL will overwite all manual settings during the next update
#
###############################################################################
define service {
host_name hostname
service_description Disk Health
hostgroup_name aims-servers
check_command check_nrpe!check_disk_health!-a 'noiostat'!!!!!!
max_check_attempts 3
check_interval 60
retry_interval 1
check_period xi_timeperiod_24x7
notification_period xi_timeperiod_24x7
contacts operations
register 1
}
###############################################################################
#
# Service configuration file
#
# END OF FILE
#
###############################################################################
[root@nagiosxi services]#
And the script that runs on the monitored host is:
Code: Select all
bash-3.2# cat disk_health.ksh
#!/bin/ksh
#
# #######################################################
# Script that checks for hard disk problems or errors
# #######################################################
#
# Define the meta commands
METASTAT=/usr/sbin/metastat
METADB=/usr/sbin/metadb
ZPOOL="/usr/sbin/zpool status -x"
#
# Work out Solaris release
RELEASE=$(uname -r | cut -d'.' -f2)
#
# Define iostat thresholds
# for errors and warnings
SOFTERR_WARN_THRES=10
HARDERR_WARN_THRES=2
TRNSERR_WARN_THRES=2
SOFTERR_CRIT_THRES=25
HARDERR_CRIT_THRES=10
TRNSERR_CRIT_THRES=10
#
# Return code
RET_CODE=0
if [ -n $1 ]
then
ARG1=$1
else
ARG1=false
fi
# Check for SVM errors
function meta_check {
# Check for metadb trouble - on the basis that
# upper case letter indicate problems.
DBTROUBLE=`$METADB | tail +2 | /usr/bin/awk \
'{ fl = substr($0,1,20); if (fl ~ /[A-Z]/) print $0 }'`
# Check for metadevice trouble - on the basis that
# a faulty disk will have status other than 'Okay'
MDTROUBLE=`$METASTAT | /usr/bin/awk \
'/State:/ { if ( $2 != "Okay" ) print "$0" }'`
if [ ! -z $MDTROUBLE ]
then
echo "CRITICAL - SVM (metastat) is reporting errors"
RET_CODE=2
fi
}
function zfs_check {
if [ -x $ZPOOL ]
then
ZPOOL_STATUS=$ZPOOL
case $ZPOOL_STATUS in
"no pools available")
;;
"all pools are healthy")
;;
"*")
echo "CRITICAL - zpool is reporting errors"
RET_CODE=2
;;
esac
fi
}
if [[ $ARG1 != "noiostat" ]];
then
# Check iostat for device errors
/usr/bin/iostat -En | /usr/bin/egrep \
'c[0-9]+t[0-9]+d[0-9]+' > /tmp/iostat.tmp
if [ -s /tmp/iostat.tmp ]
then
while read line
do
DEV=`echo $line | awk '{print $1}'`
SOFTERR=`echo $line | awk '{print $4}'`
HARDERR=`echo $line | awk '{print $7}'`
TRNSERR=`echo $line | awk '{print $10}'`
if [ $SOFTERR -gt $SOFTERR_WARN_THRES -a $SOFTERR -lt $SOFTERR_C RIT_THRES ]
then
/usr/ucb/echo -n "WARNING - "
echo "Device $DEV has $SOFTERR soft errors."
RET_CODE=1
fi
if [ $SOFTERR -ge $SOFTERR_CRIT_THRES ]
then
/usr/ucb/echo -n "CRITICAL - "
echo "Device $DEV has $SOFTERR soft errors."
RET_CODE=2
fi
if [ $HARDERR -gt $HARDERR_WARN_THRES -a $HARDERR -lt $HARDERR_C RIT_THRES ]
then
/usr/ucb/echo -n "WARNING - "
echo "Device $DEV has $HARDERR hard errors."
RET_CODE=1
fi
if [ $HARDERR -ge $HARDERR_CRIT_THRES ]
then
/usr/ucb/echo -n "CRITICAL - "
echo "Device $DEV has $HARDERR hard errors."
RET_CODE=2
fi
if [ $TRNSERR -gt $TRNSERR_WARN_THRES -a $TRNSERR -lt $TRNSERR_C RIT_THRES ]
then
/usr/ucb/echo -n "WARNING - "
echo "Device $DEV has $TRNSERR transport errors."
RET_CODE=1
fi
if [ $TRNSERR -ge $TRNSERR_CRIT_THRES ]
then
/usr/ucb/echo -n "CRITICAL - "
echo "Device $DEV has $TRNSERR transport errors."
RET_CODE=2
fi
done < /tmp/iostat.tmp
/bin/rm /tmp/iostat.tmp
fi
fi
# Check if system is SVM-enabled and run SVM checks
$METASTAT > /dev/null 2>&1
if [ $? -eq 0 ]
then
meta_check
fi
# If Solaris release is 10 or higher check for ZFS status
if [ $RELEASE -ge 10 ]
then
zfs_check
fi
if [ $RET_CODE -eq 0 ]
then
echo "OK - disk health is good"
fi
exit $RET_CODE
bash-3.2#
We are using the exact same command in Nagios XI for other servers and it works. We don't have any other services configured for that host using check_by_ssh - all the ones we have exhibit the same symptoms, they all work fine from the Nagios command line but from the XI Web page they "flap" between OK and (Return code of 255 is out of bounds).
Thanks
Re: SSH Proxy command is not working
Posted: Mon Jun 23, 2014 12:35 pm
by lmiltchev
How did you set up this check?
define service {
host_name hostname
service_description Disk Health
hostgroup_name aims-servers
check_command check_nrpe!check_disk_health!-a 'noiostat'!!!!!!
max_check_attempts 3
check_interval 60
retry_interval 1
check_period xi_timeperiod_24x7
notification_period xi_timeperiod_24x7
contacts operations
register 1
}
This doesn't seem at all like the check you've been running (testing) from the CLI...
/usr/local/nagios/libexec/check_by_ssh -H hostname -C "/opt/nagios/libexec/check_disk -d /export/home -c 90 -w 80"
If you ran the "SSH Proxy Wizard", you would have something like this:
Code: Select all
define service {
host_name CentOS-SSH
service_description Root Disk Space
use generic-service
check_command check_xi_by_ssh!-C "/usr/local/nagios/libexec/check_disk -w 80% -c 90% /"!!!!!!!
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period xi_timeperiod_24x7
notifications_enabled 1
contacts nagiosadmin
_xiwizard sshproxy
register 1
}
It seems like you are using check_nrpe, not check_by_ssh...
Re: SSH Proxy command is not working
Posted: Tue Jun 24, 2014 8:45 am
by isat-iad
Perhaps it was a bad example, but exactly the same thing happens for the check_by_ssh service (which works on other servers):
Code: Select all
[root@nagiosxi services]# cat hostname.cfg
###############################################################################
#
# Service configuration file
#
# Created by: Nagios QL Version 3.0.3
# Date: 2014-06-24 10:38:18
# Version: Nagios 3.x config file
#
# --- DO NOT EDIT THIS FILE BY HAND ---
# Nagios QL will overwite all manual settings during the next update
#
###############################################################################
define service {
host_name hostname
service_description /export/home Disk Space
use generic-service
check_command check_xi_by_ssh!-C "/opt/nagios/libexec/check_disk -w 80 -c 90 -d /export/home"!!!!!!!
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period xi_timeperiod_24x7
contacts operations
_xiwizard sshproxy
register 1
}
define service {
host_name hostname
service_description Disk Health
use generic-service
check_command check_xi_by_ssh!-C "/opt/nagios/libexec/disk_health.ksh noiostat"
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period xi_timeperiod_24x7
contacts operations
_xiwizard sshproxy
register 1
}
define service {
host_name hostname
service_description HTTP
use xiwizard_website_http_service
check_command check_xi_service_http! -f ok -I IP_ADDR -u "/" -p 80
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period xi_timeperiod_24x7
contacts operations
_xiwizard website
register 1
}
define service {
host_name hostname
service_description Ping
use xiwizard_linuxserver_ping_service
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period xi_timeperiod_24x7
contacts operations
_xiwizard sshproxy
register 1
}
define service {
host_name hostname
service_description Root Disk Space
use generic-service
check_command check_xi_by_ssh!-C "/opt/nagios/libexec/check_disk -d / -c 90 -w 80"
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period xi_timeperiod_24x7
contacts operations
_xiwizard sshproxy
register 1
}
###############################################################################
#
# Service configuration file
#
# END OF FILE
#
###############################################################################
[root@nagiosxi services]#
Re: SSH Proxy command is not working
Posted: Tue Jun 24, 2014 4:37 pm
by sreinhardt
Could you post your check_disk script or a link to where we can find it? 255, means that it is not exiting within 0-255 as an exitcode, and is outside standards. I find it interesting that it seems to complete some times, but possibly it is also erroring some of the time prior to a proper return.