Return code of 127 is out of bounds - plugin may be missing

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
mon-team
Posts: 171
Joined: Thu Jun 28, 2012 9:22 am

Return code of 127 is out of bounds - plugin may be missing

Post by mon-team »

Hi,

I get the following errors after adding new configuration file on nagios:
"...Return code of 127 is out of bounds - plugin may be missing"
"Warning: Attempting to execute the command "/bin/echo ..."
"Warning: Attempting to execute the command "/usr/bin/printf ..."
etc ...

The error concerns all the installed plugins so every check service status changes in CRITICAL.

The issue get resolved after removing the new config file.


Running "nagios -v /../nagios.cfg", things look okay.

The file contains services defintion like this:

define service{
use [****]
hostgroup_name [*****]
service_description [******]
check_command check_snmp! -C $USER8$ -P 2c -o .1.3.6.1.4.1.8698.1000.1.14.1.1.11.109.112.115.115.49.45.115.108.111.116.53.45.111.99.116.49 -w 50 -c 60
check_interval 15
retry_interval 15
max_check_attempts 5
contacts [*****]
notifications_enabled 0
}

This defines the check_comand
define command{
command_name check_snmp
command_line $USER1$/check_snmp -H $HOSTADDRESS$ $ARG1$
}


I use a nagios core 3.2.3 on CentOS 5.6

Anyboby has experienced with this issue?

Thank in advance for you feedback.
mon-team
Posts: 171
Joined: Thu Jun 28, 2012 9:22 am

Re: Return code of 127 is out of bounds - plugin may be miss

Post by mon-team »

I just clarify that it's not the usual problem of permissions or plugins not installed.

-rwxr-xr-x 1 nagios nagios 180017 Nov 26 2010 /usr/lib64/nagios/plugins/check_snmp

Everything works fine till that I don't add a certain number of new services which use the check_snmp

I thought the problem could be related to maximum number of open files, so I increased it.

Current settings are

cat /proc/sys/fs/file-max
1800974

[nagios@nagios etc]$ ulimit -Hn
10000
[nagios@nagios etc]$ ulimit -Sn
5000

but it did not work.

At the moment my nagios installation actively monitors 883 hosts and 4225 services
agriffin
Posts: 876
Joined: Mon May 09, 2011 9:36 am

Re: Return code of 127 is out of bounds - plugin may be miss

Post by agriffin »

So everything works fine if you have N hosts, but not N-1? If I understood you correctly, that's very weird. I have no idea what would cause that, but I guess I'll just run through some usual culprits. Are you running SELinux or eliminated it as a possible cause (i.e. disabled it or checked its logs and found nothing relevant there)?
mon-team
Posts: 171
Joined: Thu Jun 28, 2012 9:22 am

Re: Return code of 127 is out of bounds - plugin may be miss

Post by mon-team »

In my case I am trying to add services, not hosts.

There are different checks i need to add on a host_group composed of 40 hosts.

If I add the 30 new services on e.g. 5 hosts (150 new services in total) everything still works.

If I add the 30 new services on all the hostgroup (1200 services in total) nagios get crazy and start to return error code 127.

SELinux is disabled
mon-team
Posts: 171
Joined: Thu Jun 28, 2012 9:22 am

Re: Return code of 127 is out of bounds - plugin may be miss

Post by mon-team »

I tried also a bunch of actions but none of them solved the issue.

- Increased fs file-max parameter

[root@nagios etc]# cat /proc/sys/fs/file-max
2397432

- In the command definition, we substituted the macro with the full path


define command{
command_name check_snmp
command_line $USER1$/check_snmp -H $HOSTADDRESS$ $ARG1$
}


define command{
command_name check_snmp
command_line /usr/lib64/nagios/plugins/check_snmp -H $HOSTADDRESS$ $ARG1$
}


- We created a perl plugin which basically launch an snmpget and configured the new service check to use the
perl plugin instead of the check_snmp executable


- We set the following nagios.cfg parameters and tried the changes one per time:

from

service_inter_check_delay_method=0.01

to

service_inter_check_delay_method=s



from

max_concurrent_checks=0

to

max_concurrent_checks=60

- we copied the check_snmp to shared memory and modified the check command definition to use it
agriffin
Posts: 876
Joined: Mon May 09, 2011 9:36 am

Re: Return code of 127 is out of bounds - plugin may be miss

Post by agriffin »

To be clear, have you actually removed the service from a bunch of hosts and confirmed that it starts working again. I'm just thinking it may have been unrelated that it stopped working when you added a bunch of hosts. I haven't seen this behavior before and it's very weird.
mon-team
Posts: 171
Joined: Thu Jun 28, 2012 9:22 am

Re: Return code of 127 is out of bounds - plugin may be miss

Post by mon-team »

Yes, that's very weird indeed, but unfortunately it's exactly how it goes.

I have these 30 services running on 5 hosts right now and they work fine (see attached screenshot), but if I add the 30 checks to all the hostgroup (40 hosts)
nagios gets crazy.



Today we enabled debug and reproduced the issue.

I attach two abstracts from the debug logs:

debug_nagios_G1BS03_mpss-FpaPool4-slot5-oct2_error.txt contains logs of the test in which we reproduced the error (services configured on the hostgroup)

debug_nagios_G5BS01_mpss-FpaPool3-slot5-oct1_OK.txt contains logs of the test in which everything goes fine (services configured on 5 hosts)

I also attach the services cfg file.

As you can see in the error log, the command is correctly rebuilt from macros and if I launch it from nagios user I obtain the correct result

[nagios@nagios nagios]$ /usr/lib64/nagios/plugins/check_snmp -H G1BS03.kst.lan -C mgmtpublic -P 2c -o .1.3.6.1.4.1.8698.1000.1.14.1.1.12.109.112.115.115.49.45.115.108.111.116.53.45.111.99.116.50 -w 50 -c 60
SNMP OK - 4 | iso.3.6.1.4.1.8698.1000.1.14.1.1.12.109.112.115.115.49.45.115.108.111.116.53.45.111.99.116.50=4

but nagios return code 127 and there's no evident reason for this behavior.

If we do the same on our nagios-test environment (nagios v.3.0.4), it works with services configured on all the hostgroup.

We are stuck at this point... we are going to try a couple of things tomorrow but with a little hope they will work.

Any suggestion/idea is more than welcome.

Thanks in advance.
Attachments
debug_nagios_G5BS01_mpss-FpaPool3-slot5-oct1_OK.txt
debug log OK
(11.78 KiB) Downloaded 652 times
kasat-mpss-fpa-pool_hostgropup.cfg.txt
screenshot of services running on one host
(18.36 KiB) Downloaded 543 times
services configuration file
services configuration file
mon-team
Posts: 171
Joined: Thu Jun 28, 2012 9:22 am

Re: Return code of 127 is out of bounds - plugin may be miss

Post by mon-team »

Here it is debug with debug with ERROR code 127
Attachments
debug_nagios_G1BS03_mpss-FpaPool4-slot5-oct2_error.txt
debug with ERROR code 127
(15.82 KiB) Downloaded 548 times
mon-team
Posts: 171
Joined: Thu Jun 28, 2012 9:22 am

Re: Return code of 127 is out of bounds - plugin may be miss

Post by mon-team »

ISSUE SOLVED!!!

Or better, we applied a workaround, as I really think it's a nagios bug.

The issue is that nagios currently has a limitation on the maximum number of services that can be configured in a single servicegroup.

I am not aware of any such limit from the documentation

I found it already tracked in nagios bug tracker, with ID 0000111.

http://tracker.nagios.org/view.php?id=111#bugnotes

Unfortunately, the resolution is won't fix!

I believe this bug should be fixed, as it's pretty difficult to debug!

It can be easily reproduced.

I cannot tell the exact number of maximum services but should be something between 3600 and 3700 maximum services in the same servicegroup.

As workaround, we simply created a new servicegroup and configured the new 1200 services in it.

Now everything is working fine.

Thanks to agriffin, for your support.

Alessandro
agriffin
Posts: 876
Joined: Mon May 09, 2011 9:36 am

Re: Return code of 127 is out of bounds - plugin may be miss

Post by agriffin »

Glad you were able to figure it out, and thanks for sharing the solution.
Locked