Page 1 of 1

Nagios Bug: ignoring waitpid() return value

Posted: Thu Dec 13, 2012 4:30 pm
by archie
I am seeing this occasional error:

Code: Select all

(Return code of 127 is out of bounds - plugin may be missing)
No, Nagios, the plugin is not missing!

But what is the problem? It's impossible to tell because Nagios is ignoring the return value from the waitpid() system call. Until the waitpid() return value has verified not to be equal to -1, it's not appropriate to even look at the process exit value.

If you look at the Nagios code (I downloaded version 3.4.3), in every occurrence where waitpid() is invoked, the return value is ignored.

Please fix this so we can better understand what is going on in these cases.

Thanks.

Re: Nagios Bug: ignoring waitpid() return value

Posted: Thu Dec 13, 2012 4:37 pm
by sreinhardt
Could you provide an example of a plugin that this seems to be an issue with please?

Re: Nagios Bug: ignoring waitpid() return value

Posted: Thu Dec 13, 2012 4:47 pm
by archie
This is happening with check_by_ssh.

I also just noticed this comment in the pclose() man page (Linux):
Failure to execute the shell is indistinguishable from the shell's failure to execute command, or an immediate exit of the command. The only hint is an exit status of 127.
It may very well be the case that the 127 error code is originating from pclose(), not waitpid(), in which case fixing my complaint is not going to do anything (but it's still good practice to always check error values of course!)

Re: Nagios Bug: ignoring waitpid() return value

Posted: Fri Dec 14, 2012 11:35 am
by slansing
This may be very simple but it is something to cover, do you have execute privileges for the Nagios user on that check? Can you post the full command you are using to run this?