Page 1 of 1

Re: [Nagios-devel] Event broker, dlopen(), and segfaults

Posted: Fri Oct 19, 2007 7:19 am
by Guest
Roy and Andreas -

Thanks for your insight. I found this article about HP-UX libraries and
it seems to indicate that deleting the original file and replacing it
with a new one will prevent a segfault. Simply overwriting the file
will cause a segfault, as the inode doesn't change:

http://www.sap-basis-abap.com/unix/repl ... -hp-ux.htm

Hardly ideal. The only real workaround would be to stat() the file to
check to mtime changes before each and every call to a function within
the module. However, the overhead of doing so is too great to make it
a feasible option...

I'll make a note in the docs about this.

Marantz, Roy wrote:
> This is usually caused by updating the contents of the file instead of
> replacing it. i.e. getting a new inode might make this safe.
> You could try write to FILE.new; mv FILE.new FILE to force the new file
> to get a new inode. This might vary by OS or even OS version.
> Roy
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Andreas
> Ericsson
> Sent: Friday, October 19, 2007 3:26 AM
> To: [email protected]; Nagios Developers List
> Subject: Re: [Nagios-devel] Event broker, dlopen(), and segfaults
>
> Ethan Galstad wrote:
>> While doing some debugging of NDOUtils, I've noticed something bad.
>> Event broker modules like ndomod.o will cause Nagios to segfault if
> they
>> are overwritten on the filesystem while they are in use.
>>
>> I assume this is due to the way dlopen() deals with object files. I
> was
>> under the assumption that a complete copy of the module was kept in
>> memory once it was loaded, but perhaps its mmap()'d.
>>
>> The segfault is easily reproducible every time I overwrite ndomod.o
>> while in use. Even if the "new" version of the file doesn't differ
> from
>> the old.
>>
>> Anyone know more details of how this works, or better yet, how to
>> avoid/deal with it?
>>
>
> When a program still has a descriptor to the file, the kernel retains
> the
> diskblocks pointed to until that descriptor is made invalid (ie,
> close()'d).
>
> I just tested this with modules though, and it doesn't work.
>
> Tested locking the file too, and that didn't work either.
>
> Hmm... The only way out I see is to copy the file to a different
> directory
> and loading it from there, but I'm not sure it's worth it. What should
> we
> do when we fail to copy it, fe? Load from the original location? Not
> load
> the module at all? Either way out is wrong, for a certain value of
> right.
>
> For reference, the only bug I found in glibc/BUGS with any connection to
> dlfcn is this one::
>
> Severity: [ *] to [***]
>
> [ **] Closing shared objects in statically linked binaries most of the
> times leads to crashes during the dlopen(). Hard to fix.
>
> Since nagios isn't compiled statically, this doesn't apply, and it
> doesn't
> crash in dlopen(), but rather when running functions in the file.
>



Ethan Galstad,
Nagios Developer
---
Email: [email protected]
Website: http://www.nagios.org





This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]