check_by_ssh: Output Unknown Return Code 255
check_by_ssh: Output Unknown Return Code 255
I'm scratching my head on this. I thought I read every message regarding Return Code 255, but nothing seems to work.
Situation: I am using check_by_ssh to check local disk on some remote AIX hosts.
Error (when I run Test Command in Core Config):
COMMAND: /usr/local/nagios/libexec/check_by_ssh -H vnbmedia03 -l nagios -E -C "/home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90"
OUTPUT: UNKNOWN - check_by_ssh: Remote command '/home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90' returned status 255
ssh is configured correctly and I can run remote commands passwordless from the Nagios server to the remote server via the console. When I run the above command on the Nagios server using the nagios ID, it seems to work OK:
uid=2018(nagios) gid=2018(nagios) groups=2018(nagios),2019(nagcmd)
-bash-4.1$ /usr/local/nagios/libexec/check_by_ssh -H vnbmedia03 -l nagios -E -C "/home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90"
/tmp is at 53% capacity, 5346.30 of 4893.70
File rights seem OK. -E took care of ignoring the ssh header. I added '-l nagios' for troubleshooting. It did not change anything. After I run this manually, $? returns the proper 1-3 value depending on how low I set the thresholds. I've tried single/double/no quotes around the command.
Here is the command in config form:
define service {
service_description AIX check_by_ssh tmp
check_command check_xi_by_ssh!-l nagios -E -C "/home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90"!!!!!!!
register 1
}
(check_xi_by_ssh just uses /usr/local/nagios/libexec/check_by_ssh).
Any ideas?
Situation: I am using check_by_ssh to check local disk on some remote AIX hosts.
Error (when I run Test Command in Core Config):
COMMAND: /usr/local/nagios/libexec/check_by_ssh -H vnbmedia03 -l nagios -E -C "/home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90"
OUTPUT: UNKNOWN - check_by_ssh: Remote command '/home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90' returned status 255
ssh is configured correctly and I can run remote commands passwordless from the Nagios server to the remote server via the console. When I run the above command on the Nagios server using the nagios ID, it seems to work OK:
uid=2018(nagios) gid=2018(nagios) groups=2018(nagios),2019(nagcmd)
-bash-4.1$ /usr/local/nagios/libexec/check_by_ssh -H vnbmedia03 -l nagios -E -C "/home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90"
/tmp is at 53% capacity, 5346.30 of 4893.70
File rights seem OK. -E took care of ignoring the ssh header. I added '-l nagios' for troubleshooting. It did not change anything. After I run this manually, $? returns the proper 1-3 value depending on how low I set the thresholds. I've tried single/double/no quotes around the command.
Here is the command in config form:
define service {
service_description AIX check_by_ssh tmp
check_command check_xi_by_ssh!-l nagios -E -C "/home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90"!!!!!!!
register 1
}
(check_xi_by_ssh just uses /usr/local/nagios/libexec/check_by_ssh).
Any ideas?
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: check_by_ssh: Output Unknown Return Code 255
What happens if you just try to run the check "by hand?"
Code: Select all
[jdalrymple@nagiosserver ~]$ ssh nagios@vnbmedia03 /home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90Re: check_by_ssh: Output Unknown Return Code 255
From the Nagios server logged in as nagios, I get this when running the command manually:
/usr/local/nagios/libexec/check_by_ssh -H vnbmedia03 -E -C "/home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90"
/tmp is at 53% capacity, 5346.30 of 4893.70
bash-4.1$ echo $?
0
So, it seems OK.
Just to make sure, again on the Nagios server, I'll lower the thresholds to make sure $? is being updated, which it is.
bash-4.1$ /usr/local/nagios/libexec/check_by_ssh -H vnbmedia03 -E -C "/home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 25 -w 30"
/tmp is at 53% capacity! 5346.30 of 4893.70
bash-4.1$ echo $?
2
From the remote AIX machine logged on as nagios, when I run the check_disk_aix script, it also returns what I would expect:
bash-4.2$ /home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90
/tmp is at 53% capacity, 5346.30 of 4893.70
bash-4.2$ echo $?
0
On another note, I downloaded another similar ssh module called check_by_ssh_master. It's behaving the exact same way. Manually, everything's great, but when run through Config Manager, it returns the same 255. The check_disk_aix script is a pretty basic script which runs df, then uses grep/awk to grab the output and compute whether disk id 0 (good) 1 (Warning) or 2 (Critical). That scripts seems to be fine. It actually worked prior to my little upgrade from Core to XI.
/usr/local/nagios/libexec/check_by_ssh -H vnbmedia03 -E -C "/home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90"
/tmp is at 53% capacity, 5346.30 of 4893.70
bash-4.1$ echo $?
0
So, it seems OK.
Just to make sure, again on the Nagios server, I'll lower the thresholds to make sure $? is being updated, which it is.
bash-4.1$ /usr/local/nagios/libexec/check_by_ssh -H vnbmedia03 -E -C "/home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 25 -w 30"
/tmp is at 53% capacity! 5346.30 of 4893.70
bash-4.1$ echo $?
2
From the remote AIX machine logged on as nagios, when I run the check_disk_aix script, it also returns what I would expect:
bash-4.2$ /home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90
/tmp is at 53% capacity, 5346.30 of 4893.70
bash-4.2$ echo $?
0
On another note, I downloaded another similar ssh module called check_by_ssh_master. It's behaving the exact same way. Manually, everything's great, but when run through Config Manager, it returns the same 255. The check_disk_aix script is a pretty basic script which runs df, then uses grep/awk to grab the output and compute whether disk id 0 (good) 1 (Warning) or 2 (Critical). That scripts seems to be fine. It actually worked prior to my little upgrade from Core to XI.
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: check_by_ssh: Output Unknown Return Code 255
How about sestatus on the Nagios box?
If not that I'm going to be at a total loss.
Code: Select all
[jdalrymple@nagiosserver ~]$ sestatus
SELinux status: disabledRe: check_by_ssh: Output Unknown Return Code 255
[root@unag01 ~]# sestatus
SELinux status: disabled
Yes, disabled. So is IPTables.
SELinux status: disabled
Yes, disabled. So is IPTables.
jdalrymple wrote:How about sestatus on the Nagios box?
If not that I'm going to be at a total loss.Code: Select all
[jdalrymple@nagiosserver ~]$ sestatus SELinux status: disabled
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: check_by_ssh: Output Unknown Return Code 255
Can we rule out argument parsing by creating a shell script on the AIX box?
something like:
something like:
Code: Select all
#!/bin/ksh
CMD="/home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90"
$($CMD)
ERROR_CODE=$?
exit $ERROR_CODE
Re: check_by_ssh: Output Unknown Return Code 255
Here is the output of the test script. I set the alarms low to generate a simulated error code '2'. Doesn't appear to be a parsing problem. Going to think about this over some grub.
#!/usr/bin/ksh
CMD="/home/nagios/AIX/scripts/check_disk_aix -v / -c 10 -w 15"
${CMD}
ERROR_CODE=$?
exit $ERROR_CODE
bash-3.2$ ./test.ksh
/ is at 18% capacity! 533M of 3.0G
bash-3.2$ echo $?
2
#!/usr/bin/ksh
CMD="/home/nagios/AIX/scripts/check_disk_aix -v / -c 10 -w 15"
${CMD}
ERROR_CODE=$?
exit $ERROR_CODE
bash-3.2$ ./test.ksh
/ is at 18% capacity! 533M of 3.0G
bash-3.2$ echo $?
2
jdalrymple wrote:Can we rule out argument parsing by creating a shell script on the AIX box?
something like:
Code: Select all
#!/bin/ksh CMD="/home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90" $($CMD) ERROR_CODE=$? exit $ERROR_CODE
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: check_by_ssh: Output Unknown Return Code 255
righty-o - that looks good. Now what happens if you replace
with
On the nagios box?
Code: Select all
check_command check_xi_by_ssh!-l nagios -E -C "/home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90"!!!!!!!Code: Select all
check_command check_xi_by_ssh!-l nagios -E -C "/home/nagios/<whatever_path>/test.ksh"!!!!!!!Re: check_by_ssh: Output Unknown Return Code 255
When I replaced
with
I got the same output as before.
I ran some tests and might be onto something. I suspect Nagios is not using the 'nagios' ID to execute the remote command. But perhaps the 'apache' ID is running the command.
When I remove the -E from the Service and run though Core Config, I get the error, "OUTPUT: Remote command execution failed: Could not create directory '/var/www/.ssh'." Hmm... For giggles, I setup SSH auth for Apache and seemed to be getting some good output, though I need -E to trim the ssh header. Something's got to be trippy on my Nagios server if Apache is running the remote command, even though the '-l nagios' should have nagios run the command.
Code: Select all
check_command check_xi_by_ssh!-l nagios -E -C "/home/nagios/AIX/scripts/check_disk_aix -v /tmp -c 95 -w 90"!!!!!!!with
Code: Select all
check_command check_xi_by_ssh!-l nagios -E -C "/home/nagios/<whatever_path>/test.ksh"!!!!!!!I ran some tests and might be onto something. I suspect Nagios is not using the 'nagios' ID to execute the remote command. But perhaps the 'apache' ID is running the command.
When I remove the -E from the Service and run though Core Config, I get the error, "OUTPUT: Remote command execution failed: Could not create directory '/var/www/.ssh'." Hmm... For giggles, I setup SSH auth for Apache and seemed to be getting some good output, though I need -E to trim the ssh header. Something's got to be trippy on my Nagios server if Apache is running the remote command, even though the '-l nagios' should have nagios run the command.
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: check_by_ssh: Output Unknown Return Code 255
That's a compile time option. Create a simple check that runs the plugin /usr/bin/whoami