Page 1 of 2

check_by_ssh: Output Unknown Return Code 255

Posted: Mon May 04, 2015 3:25 pm
by fsbeaunix
I'm scratching my head on this. I thought I read every message regarding Return Code 255, but nothing seems to work.

Situation: I am using check_by_ssh to check local disk on some remote AIX hosts.

Error (when I run Test Command in Core Config):

COMMAND: /usr/local/nagios/libexec/check_by_ssh -H vnbmedia03 -l nagios -E -C "/home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90"
OUTPUT: UNKNOWN - check_by_ssh: Remote command '/home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90' returned status 255

ssh is configured correctly and I can run remote commands passwordless from the Nagios server to the remote server via the console. When I run the above command on the Nagios server using the nagios ID, it seems to work OK:

uid=2018(nagios) gid=2018(nagios) groups=2018(nagios),2019(nagcmd)
-bash-4.1$ /usr/local/nagios/libexec/check_by_ssh -H vnbmedia03 -l nagios -E -C "/home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90"
/tmp is at 53% capacity, 5346.30 of 4893.70

File rights seem OK. -E took care of ignoring the ssh header. I added '-l nagios' for troubleshooting. It did not change anything. After I run this manually, $? returns the proper 1-3 value depending on how low I set the thresholds. I've tried single/double/no quotes around the command.

Here is the command in config form:

define service {
service_description AIX check_by_ssh tmp
check_command check_xi_by_ssh!-l nagios -E -C "/home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90"!!!!!!!
register 1
}

(check_xi_by_ssh just uses /usr/local/nagios/libexec/check_by_ssh).

Any ideas?

Re: check_by_ssh: Output Unknown Return Code 255

Posted: Mon May 04, 2015 3:45 pm
by jdalrymple
What happens if you just try to run the check "by hand?"

Code: Select all

[jdalrymple@nagiosserver ~]$ ssh nagios@vnbmedia03 /home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90

Re: check_by_ssh: Output Unknown Return Code 255

Posted: Mon May 04, 2015 4:05 pm
by fsbeaunix
From the Nagios server logged in as nagios, I get this when running the command manually:

/usr/local/nagios/libexec/check_by_ssh -H vnbmedia03 -E -C "/home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90"
/tmp is at 53% capacity, 5346.30 of 4893.70
bash-4.1$ echo $?
0


So, it seems OK.

Just to make sure, again on the Nagios server, I'll lower the thresholds to make sure $? is being updated, which it is.

bash-4.1$ /usr/local/nagios/libexec/check_by_ssh -H vnbmedia03 -E -C "/home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 25 -w 30"
/tmp is at 53% capacity! 5346.30 of 4893.70
bash-4.1$ echo $?
2


From the remote AIX machine logged on as nagios, when I run the check_disk_aix script, it also returns what I would expect:

bash-4.2$ /home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90
/tmp is at 53% capacity, 5346.30 of 4893.70
bash-4.2$ echo $?
0


On another note, I downloaded another similar ssh module called check_by_ssh_master. It's behaving the exact same way. Manually, everything's great, but when run through Config Manager, it returns the same 255. The check_disk_aix script is a pretty basic script which runs df, then uses grep/awk to grab the output and compute whether disk id 0 (good) 1 (Warning) or 2 (Critical). That scripts seems to be fine. It actually worked prior to my little upgrade from Core to XI.

Re: check_by_ssh: Output Unknown Return Code 255

Posted: Mon May 04, 2015 4:22 pm
by jdalrymple
How about sestatus on the Nagios box?

Code: Select all

[jdalrymple@nagiosserver ~]$ sestatus
SELinux status:                 disabled
If not that I'm going to be at a total loss.

Re: check_by_ssh: Output Unknown Return Code 255

Posted: Tue May 05, 2015 8:14 am
by fsbeaunix
[root@unag01 ~]# sestatus
SELinux status: disabled

Yes, disabled. So is IPTables.
jdalrymple wrote:How about sestatus on the Nagios box?

Code: Select all

[jdalrymple@nagiosserver ~]$ sestatus
SELinux status:                 disabled
If not that I'm going to be at a total loss.

Re: check_by_ssh: Output Unknown Return Code 255

Posted: Tue May 05, 2015 9:17 am
by jdalrymple
Can we rule out argument parsing by creating a shell script on the AIX box?

something like:

Code: Select all

#!/bin/ksh
CMD="/home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90"
$($CMD)
ERROR_CODE=$?
exit $ERROR_CODE

Re: check_by_ssh: Output Unknown Return Code 255

Posted: Tue May 05, 2015 11:06 am
by fsbeaunix
Here is the output of the test script. I set the alarms low to generate a simulated error code '2'. Doesn't appear to be a parsing problem. Going to think about this over some grub.

#!/usr/bin/ksh
CMD="/home/nagios/AIX/scripts/check_disk_aix -v / -c 10 -w 15"
${CMD}
ERROR_CODE=$?
exit $ERROR_CODE

bash-3.2$ ./test.ksh
/ is at 18% capacity! 533M of 3.0G

bash-3.2$ echo $?
2

jdalrymple wrote:Can we rule out argument parsing by creating a shell script on the AIX box?

something like:

Code: Select all

#!/bin/ksh
CMD="/home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90"
$($CMD)
ERROR_CODE=$?
exit $ERROR_CODE

Re: check_by_ssh: Output Unknown Return Code 255

Posted: Tue May 05, 2015 11:49 am
by jdalrymple
righty-o - that looks good. Now what happens if you replace

Code: Select all

check_command check_xi_by_ssh!-l nagios -E -C "/home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90"!!!!!!!
with

Code: Select all

check_command check_xi_by_ssh!-l nagios -E -C "/home/nagios/<whatever_path>/test.ksh"!!!!!!!
On the nagios box?

Re: check_by_ssh: Output Unknown Return Code 255

Posted: Tue May 05, 2015 12:41 pm
by fsbeaunix
When I replaced

Code: Select all

check_command check_xi_by_ssh!-l nagios -E -C "/home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90"!!!!!!!


with

Code: Select all

check_command check_xi_by_ssh!-l nagios -E -C "/home/nagios/<whatever_path>/test.ksh"!!!!!!!
I got the same output as before.

I ran some tests and might be onto something. I suspect Nagios is not using the 'nagios' ID to execute the remote command. But perhaps the 'apache' ID is running the command.

When I remove the -E from the Service and run though Core Config, I get the error, "OUTPUT: Remote command execution failed: Could not create directory '/var/www/.ssh'." Hmm... For giggles, I setup SSH auth for Apache and seemed to be getting some good output, though I need -E to trim the ssh header. Something's got to be trippy on my Nagios server if Apache is running the remote command, even though the '-l nagios' should have nagios run the command.

Re: check_by_ssh: Output Unknown Return Code 255

Posted: Tue May 05, 2015 1:39 pm
by jdalrymple
That's a compile time option. Create a simple check that runs the plugin /usr/bin/whoami