return code 127 on test that seems invalid

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
ucemike
Posts: 56
Joined: Wed Nov 16, 2011 3:13 pm

return code 127 on test that seems invalid

Post by ucemike »

I have recently tried to implement a port test using check_jabber which links to check_tcp. The config entry for the test is:

Code: Select all

# 'check_jabber' command definition
define command{
        command_name    check-jabber
        command_line    /usr/lib64/nagios/plugins/check_jabber -H $ARG1$ -w $ARG2$ -c $ARG3$ -t $ARG4$
}
The service entry is "check-jabber!jabber.host.name.here.net!10!20!60"

I forced max debug on logs and the command line final run command was:
/usr/lib64/nagios/plugins/check_jabber -H jabber.host.name.here.net -w 10 -c 20 -t 60

When I run that command manually (as nagios user) it works fine with the output:
JABBER OK - 0.098 second response time on port 5222|time=0.097569s;10.000000;20.000000;0.000000;60.000000

I also use check_imap which links to check_tcp (as does check_jabber) and it works fine on our mail hosts. Permissions of both check_imap and check_jabber is exactly the same. I only mention it since they both use check_tcp.

We are running on RHEL 6.2, using Nagios 3.4.1, nagios-plugins 1.4.16.

Anyone have a possible clue what this could be?
agriffin
Posts: 876
Joined: Mon May 09, 2011 9:36 am

Re: return code 127 on test that seems invalid

Post by agriffin »

That's odd, is there anything in your Nagios logs about it (found in /usr/local/nagios/var or /var/log)?
ucemike
Posts: 56
Joined: Wed Nov 16, 2011 3:13 pm

Re: return code 127 on test that seems invalid

Post by ucemike »

agriffin wrote:That's odd, is there anything in your Nagios logs about it (found in /usr/local/nagios/var or /var/log)?
Other than the error 127 no. I was hoping the expanded debug output would give me more details but unfortunately all it did was make this all the more confusing because everything looked ok (as far as the final command line).
agriffin
Posts: 876
Joined: Mon May 09, 2011 9:36 am

Re: return code 127 on test that seems invalid

Post by agriffin »

Are you running anything like SELinux that might be preventing the plugin from running in certain contexts?
ucemike
Posts: 56
Joined: Wed Nov 16, 2011 3:13 pm

Re: return code 127 on test that seems invalid

Post by ucemike »

agriffin wrote:Are you running anything like SELinux that might be preventing the plugin from running in certain contexts?
No. I also verified we were not hitting some thread/open processes cap as well.
ucemike
Posts: 56
Joined: Wed Nov 16, 2011 3:13 pm

Re: return code 127 on test that seems invalid

Post by ucemike »

I did some further testing and removed about 1/2 of the tests we run. When I did that this error stopped. So I enabled them again (we have 4500 service tests) and the problem came back. I then cut half the hosts out (we have 200 hosts) and the same thing happened. The test started working again. I re-enabled all hosts and it broke again.

I monitored the open files with all hosts/tests enabled and saw at most 550ish (total, not just nagios user). I also added a limits.conf config for nagios user increasing those values in the 10k range. It did not help.

This seems to be a problem with a "cap" of some sort but I cannot narrow down where...I am hoping someone else might have some tips.

edit: we had also updated to 3.4.3RC1 to resolve a downtime schedule bug.
ucemike
Posts: 56
Joined: Wed Nov 16, 2011 3:13 pm

Re: return code 127 on test that seems invalid

Post by ucemike »

Is there anyway to get the exact output the OS gave nagios when the command was executed? The debug logs taken to max do not detail it (just give full command line run which runs fine when manually executed as user nagios).

Does nagios store this information anywhere? I want to see exactly what the command and/or OS is telling nagios for it to give an error 127.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: return code 127 on test that seems invalid

Post by slansing »

Can you post exactly what the command is you run manually from the Nagios terminal to get the result in your first post?
ucemike
Posts: 56
Joined: Wed Nov 16, 2011 3:13 pm

Re: return code 127 on test that seems invalid

Post by ucemike »

slansing wrote:Can you post exactly what the command is you run manually from the Nagios terminal to get the result in your first post?
I am not sure what you are asking for. The command listed in the first post is the command I run manually. It was copy/pasted from the nagios.debug output to verify that the command line/values I used were not causing the problem.

I tested the command as user nagios.
I forced max debug on logs and the command line final run command was:
/usr/lib64/nagios/plugins/check_jabber -H jabber.host.name.here.net -w 10 -c 20 -t 60

When I run that command manually (as nagios user) it works fine with the output:
JABBER OK - 0.098 second response time on port 5222|time=0.097569s;10.000000;20.000000;0.000000;60.0000
ucemike
Posts: 56
Joined: Wed Nov 16, 2011 3:13 pm

Re: return code 127 on test that seems invalid

Post by ucemike »

Did some additional testing and when I have more than 2770-2780 service tests the 127 errors start up. When I reduce the number to below that it starts working.
Locked