Re: [Nagios-devel] Event broker, dlopen(), and segfaults

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] Event broker, dlopen(), and segfaults

Post by Guest »

Roy and Andreas -

Thanks for your insight. I found this article about HP-UX libraries and
it seems to indicate that deleting the original file and replacing it
with a new one will prevent a segfault. Simply overwriting the file
will cause a segfault, as the inode doesn't change:

http://www.sap-basis-abap.com/unix/repl ... -hp-ux.htm

Hardly ideal. The only real workaround would be to stat() the file to
check to mtime changes before each and every call to a function within
the module. However, the overhead of doing so is too great to make it
a feasible option...

I'll make a note in the docs about this.

Marantz, Roy wrote:
> This is usually caused by updating the contents of the file instead of
> replacing it. i.e. getting a new inode might make this safe.
> You could try write to FILE.new; mv FILE.new FILE to force the new file
> to get a new inode. This might vary by OS or even OS version.
> Roy
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Andreas
> Ericsson
> Sent: Friday, October 19, 2007 3:26 AM
> To: [email protected]; Nagios Developers List
> Subject: Re: [Nagios-devel] Event broker, dlopen(), and segfaults
>
> Ethan Galstad wrote:
>> While doing some debugging of NDOUtils, I've noticed something bad.
>> Event broker modules like ndomod.o will cause Nagios to segfault if
> they
>> are overwritten on the filesystem while they are in use.
>>
>> I assume this is due to the way dlopen() deals with object files. I
> was
>> under the assumption that a complete copy of the module was kept in
>> memory once it was loaded, but perhaps its mmap()'d.
>>
>> The segfault is easily reproducible every time I overwrite ndomod.o
>> while in use. Even if the "new" version of the file doesn't differ
> from
>> the old.
>>
>> Anyone know more details of how this works, or better yet, how to
>> avoid/deal with it?
>>
>
> When a program still has a descriptor to the file, the kernel retains
> the
> diskblocks pointed to until that descriptor is made invalid (ie,
> close()'d).
>
> I just tested this with modules though, and it doesn't work.
>
> Tested locking the file too, and that didn't work either.
>
> Hmm... The only way out I see is to copy the file to a different
> directory
> and loading it from there, but I'm not sure it's worth it. What should
> we
> do when we fail to copy it, fe? Load from the original location? Not
> load
> the module at all? Either way out is wrong, for a certain value of
> right.
>
> For reference, the only bug I found in glibc/BUGS with any connection to
> dlfcn is this one::
>
> Severity: [ *] to [***]
>
> [ **] Closing shared objects in statically linked binaries most of the
> times leads to crashes during the dlopen(). Hard to fix.
>
> Since nagios isn't compiled statically, this doesn't apply, and it
> doesn't
> crash in dlopen(), but rather when running functions in the file.
>



Ethan Galstad,
Nagios Developer
---
Email: [email protected]
Website: http://www.nagios.org





This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked