Random error 127 out of bounds

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
darxmurf
Posts: 5
Joined: Mon Nov 18, 2013 4:20 am

Random error 127 out of bounds

Post by darxmurf »

Hi all,

I can't find the solution of this issue.
I have some errors like this in my logs:
[1383038181] SERVICE ALERT: IMU344;APACHE;CRITICAL;SOFT;1;(Return code of 127 is out of bounds - plugin may be missing)
[1383038181] SERVICE EVENT HANDLER: IMU344;APACHE;CRITICAL;SOFT;1;restart-service
[1383038295] SERVICE ALERT: IMU344;APACHE;OK;SOFT;2;HTTP OK: Status line output matched " 200 OK" - 151 bytes in 0.002 second response time
Everything works fine but sometimes I have this "127 is out of bound" error and then a OK.
As this error is usually linked to path or rights issues, I don't understand what's wrong.
All my permissions are ok and as the check works fine 90% of the time, there is something else :?:

If you have any info about that, thanks in advance !

Darx
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Random error 127 out of bounds

Post by abrist »

Can you post the check command and service config?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
darxmurf
Posts: 5
Joined: Mon Nov 18, 2013 4:20 am

Re: Random error 127 out of bounds

Post by darxmurf »

here is the check
$USER1$/check_http -H $HOSTNAME$ -w 10 -t 15 -p 80 -s ALIVE -u /.test-nagios.php -e " 200 OK"
and the service description
define service {
service_description APACHE
use active
host_name IMU344
check_command check-http
}
Our installation is quite big and we are mostly using passive checks
# Active Host / Service Checks: 161 / 6377
# Passive Host / Service Checks: 1122 / 6404
The *bad* point is that we did setup it few years ago with MONARCH... which is, in the end, quite bad and generate huge cfg files not optimized at all.
I did update our setup few months ago from Nagios 2.12 to 3.5.1 but I kept Monarch in place.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Random error 127 out of bounds

Post by slansing »

Puzzling, how long does it stay in a critical out of bounds state when this does happen? Can you correlate this to a specific time each day? or week?
darxmurf
Posts: 5
Joined: Mon Nov 18, 2013 4:20 am

Re: Random error 127 out of bounds

Post by darxmurf »

I think I found something.

All the services which are returning 127 error have been configured with active AND passive checks :geek: (the guy who did this installation wanted to have double check)
And they are hitting the same test script, for example, the APACHE service check is a simple echo php script.
Is here a problem for nagios to have both checks for the same service ?
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Random error 127 out of bounds

Post by slansing »

I guess I am a bit confused, you can't have a service be both active and passive. Do you mean two services, one is active, one is passive? This does not really make too much sense either.
darxmurf
Posts: 5
Joined: Mon Nov 18, 2013 4:20 am

Re: Random error 127 out of bounds

Post by darxmurf »

I approve...

So, in my conf (using Monarch) my APACHE check is in ACTIVE mode but it accepts also passive checks... which is quite useless I think.
I will kick one of them... but the question is: which one to keep ? Active or passive ?

EDIT :
I disabled the passive script on all the machines.

The active check is working fine but few minutes after in the log:
[Thu Nov 21 14:55:32 2013] Warning: Return code of 127 for check of service 'APACHE' on host 'DMU117' was out of bounds. Make sure the plugin you're trying to run actually exists.
[Thu Nov 21 14:55:32 2013] SERVICE ALERT: DMU117;APACHE;CRITICAL;SOFT;1;(Return code of 127 is out of bounds - plugin may be missing)
[Thu Nov 21 14:57:40 2013] SERVICE ALERT: DMU117;APACHE;OK;SOFT;2;HTTP OK: Status line output matched " 200 OK" - 151 bytes in 0.001 second response time
And the critical state change to "OK" after the next check
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Random error 127 out of bounds

Post by abrist »

What timeout do you have set for the check?
Can you reproduce the intermittent behavior by running the check from the cli multiple times?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
darxmurf
Posts: 5
Joined: Mon Nov 18, 2013 4:20 am

Re: Random error 127 out of bounds

Post by darxmurf »

Here is the conf I have
# SERVICE CHECK TIMEOUT
service_check_timeout=20

# HOST CHECK TIMEOUT
host_check_timeout=5

# EVENT HANDLER TIMEOUT
event_handler_timeout=50

# NOTIFICATION TIMEOUT
notification_timeout=30
I tried to launch an apache test every seconds for a while but I can't get any error... it's still running and I'll see.
...
HTTP OK: Status line output matched " 200 OK" - 151 bytes in 0.007 second response time |time=0.007248s;10.000000;;0.000000 size=151B;;;0
HTTP OK: Status line output matched " 200 OK" - 151 bytes in 0.002 second response time |time=0.001514s;10.000000;;0.000000 size=151B;;;0
HTTP OK: Status line output matched " 200 OK" - 151 bytes in 0.008 second response time |time=0.008346s;10.000000;;0.000000 size=151B;;;0
HTTP OK: Status line output matched " 200 OK" - 151 bytes in 0.001 second response time |time=0.001185s;10.000000;;0.000000 size=151B;;;0
HTTP OK: Status line output matched " 200 OK" - 151 bytes in 0.006 second response time |time=0.005511s;10.000000;;0.000000 size=151B;;;0
HTTP OK: Status line output matched " 200 OK" - 151 bytes in 0.001 second response time |time=0.001202s;10.000000;;0.000000 size=151B;;;0
...
The "Monarch" tool we are using did create the all the config files but in a bad way I think.
I have on bloc of text for each service for each machine... and as we have more than 1000 machines monitored...
wc -l /opt/nagios/etc/services.cfg
110296 /opt/nagios/etc/services.cfg
do you think this can bring some random issues or whatever ?
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Random error 127 out of bounds

Post by abrist »

darxmurf wrote:do you think this can bring some random issues or whatever ?
I don't thinks so as 1k hosts in a 100k line file is only 100 lines per host. That would be about 6-9 checks per host at 15 or so lines per service check. Maybe the active checks are timing out while the passive ones are working?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Locked