Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
I can't find the solution of this issue.
I have some errors like this in my logs:
[1383038181] SERVICE ALERT: IMU344;APACHE;CRITICAL;SOFT;1;(Return code of 127 is out of bounds - plugin may be missing)
[1383038181] SERVICE EVENT HANDLER: IMU344;APACHE;CRITICAL;SOFT;1;restart-service
[1383038295] SERVICE ALERT: IMU344;APACHE;OK;SOFT;2;HTTP OK: Status line output matched " 200 OK" - 151 bytes in 0.002 second response time
Everything works fine but sometimes I have this "127 is out of bound" error and then a OK.
As this error is usually linked to path or rights issues, I don't understand what's wrong.
All my permissions are ok and as the check works fine 90% of the time, there is something else
If you have any info about that, thanks in advance !
Can you post the check command and service config?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
define service {
service_description APACHE
use active
host_name IMU344
check_command check-http
}
Our installation is quite big and we are mostly using passive checks
# Active Host / Service Checks: 161 / 6377
# Passive Host / Service Checks: 1122 / 6404
The *bad* point is that we did setup it few years ago with MONARCH... which is, in the end, quite bad and generate huge cfg files not optimized at all.
I did update our setup few months ago from Nagios 2.12 to 3.5.1 but I kept Monarch in place.
All the services which are returning 127 error have been configured with active AND passive checks (the guy who did this installation wanted to have double check)
And they are hitting the same test script, for example, the APACHE service check is a simple echo php script.
Is here a problem for nagios to have both checks for the same service ?
I guess I am a bit confused, you can't have a service be both active and passive. Do you mean two services, one is active, one is passive? This does not really make too much sense either.
So, in my conf (using Monarch) my APACHE check is in ACTIVE mode but it accepts also passive checks... which is quite useless I think.
I will kick one of them... but the question is: which one to keep ? Active or passive ?
EDIT :
I disabled the passive script on all the machines.
The active check is working fine but few minutes after in the log:
[Thu Nov 21 14:55:32 2013] Warning: Return code of 127 for check of service 'APACHE' on host 'DMU117' was out of bounds. Make sure the plugin you're trying to run actually exists.
[Thu Nov 21 14:55:32 2013] SERVICE ALERT: DMU117;APACHE;CRITICAL;SOFT;1;(Return code of 127 is out of bounds - plugin may be missing)
[Thu Nov 21 14:57:40 2013] SERVICE ALERT: DMU117;APACHE;OK;SOFT;2;HTTP OK: Status line output matched " 200 OK" - 151 bytes in 0.001 second response time
And the critical state change to "OK" after the next check
What timeout do you have set for the check?
Can you reproduce the intermittent behavior by running the check from the cli multiple times?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
I tried to launch an apache test every seconds for a while but I can't get any error... it's still running and I'll see.
...
HTTP OK: Status line output matched " 200 OK" - 151 bytes in 0.007 second response time |time=0.007248s;10.000000;;0.000000 size=151B;;;0
HTTP OK: Status line output matched " 200 OK" - 151 bytes in 0.002 second response time |time=0.001514s;10.000000;;0.000000 size=151B;;;0
HTTP OK: Status line output matched " 200 OK" - 151 bytes in 0.008 second response time |time=0.008346s;10.000000;;0.000000 size=151B;;;0
HTTP OK: Status line output matched " 200 OK" - 151 bytes in 0.001 second response time |time=0.001185s;10.000000;;0.000000 size=151B;;;0
HTTP OK: Status line output matched " 200 OK" - 151 bytes in 0.006 second response time |time=0.005511s;10.000000;;0.000000 size=151B;;;0
HTTP OK: Status line output matched " 200 OK" - 151 bytes in 0.001 second response time |time=0.001202s;10.000000;;0.000000 size=151B;;;0
...
The "Monarch" tool we are using did create the all the config files but in a bad way I think.
I have on bloc of text for each service for each machine... and as we have more than 1000 machines monitored...
darxmurf wrote:do you think this can bring some random issues or whatever ?
I don't thinks so as 1k hosts in a 100k line file is only 100 lines per host. That would be about 6-9 checks per host at 15 or so lines per service check. Maybe the active checks are timing out while the passive ones are working?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.