[Nagios-devel] Re: Re: FreeBSD thread issues
Posted: Wed Aug 24, 2005 12:47 am
Hi all,
here is the answer of FreeBSD-hackers list :
This posting demonstrates a fundamental confusion between thread-safe and
async-safe. That is the root of the problem in the communication.=20
Thread-safe functions are a dime a dozen and relatively easy to write.=20
async-safe functions are very rare and much harder to do useful things
with. I've tried to explain the difference below using fgets() as an
example of the difficulties.
> fgets() must also be async-safe, since it's passed its storage-buffer
> from the calling function. It can contain races if several threads (or
> programs for that matter) tries to read FIFO's at the same time or are
> trying to store things to the same piece of memory, but that's neither
> new, strange or in any way non-obvious. Obviously, fgets() relies on
> lower-level IO code which must be thread-safe (read() in this case) on
> account of them being syscalls inside multitasking kernels.
fgets need not be async-safe, but it does need to be thread-safe.
When one fork after pthread_create, one may only call async-safe
functions. The weaker requirements of thread safety can be shown to
not necessarily be async safe. If two different threads call fgets(),
mutexes will keep one thread from running if the other is in the
middle of changing the FILE * internal state. However, if that thread
is interrupted by the scheduler with the mutex held, and fork() is
called, then the new copy of the address space will still have that
mutex held. Any attempt by this new process, with its own address
space, to acquire the lock is doomed to failure. Since the parent and
child execute in different address spaces, there is no way for a
thread that does not exist in the child to unlock the locked mutex.
Normally this happens like so:
Thread A Thread B
fgets(fp, b1, 10);
lock fp's mutex
copy 5 available bytes into b1
fgets(fp, b2, 10)
try lock fp's mutex
unlock fp's mutex
return
attempt to lock finishes
b2 can be updated
unlock mutex.
However, in the fork case:
Thread A Thread B
fgets(fp, b1, 10);
lock fp's mutex
copy 5 available bytes into b1
fork()
fgets(fp, b2, 10)
try lock fp's mutex
At this point B', the only thread in the child, will never be able to
grab this lock because A exists only in the parent and the
parent/child have independent address spaces.
While the above example is not what nagios is doing, it illustrates
the point. There are some functions that necessarily touch global
state. These functions need to coordinate that touching of state. If
one of the is interrupted with locks held, then all bets are off of a
program forks and the threads holding those locks can never unlock
them.
> >> The list of async-signal-safe functions
> >> are here: http://www.opengroup.org/onlinepubs/009 ... frame.html
> >> The restriction on fork() is here (20th bullet down):
> >> http://www.opengroup.org/onlinepubs/009 ... frame.html
>
> Both of those links point to the same document, which is just the
> frameset for the navigation-frames.
>
> For async-safe functions, this is the proper url;
> http://www.opengroup.org/onlinepubs/009 ... ap02_09.h=
tml#tag_02_09_01
This reference is for thread-safe functions. You are confusing
thread-safe and async-safe. The correct url for async-safe is
http://www.opengroup.org/onlinepubs/009 ... 02_04.htm=
l#tag_02_04_03
>> The following table defines a set of functions that shall be either
>> reentrant or non-interruptible by signals and shall be
>> async-signal-safe. Therefore applications may invoke them, without
>> restriction, from signal-catching functions:
>>
Notice that this list is very short, and there are many functions that
one would think should be on here, but in fact aren't.
> For th
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
here is the answer of FreeBSD-hackers list :
This posting demonstrates a fundamental confusion between thread-safe and
async-safe. That is the root of the problem in the communication.=20
Thread-safe functions are a dime a dozen and relatively easy to write.=20
async-safe functions are very rare and much harder to do useful things
with. I've tried to explain the difference below using fgets() as an
example of the difficulties.
> fgets() must also be async-safe, since it's passed its storage-buffer
> from the calling function. It can contain races if several threads (or
> programs for that matter) tries to read FIFO's at the same time or are
> trying to store things to the same piece of memory, but that's neither
> new, strange or in any way non-obvious. Obviously, fgets() relies on
> lower-level IO code which must be thread-safe (read() in this case) on
> account of them being syscalls inside multitasking kernels.
fgets need not be async-safe, but it does need to be thread-safe.
When one fork after pthread_create, one may only call async-safe
functions. The weaker requirements of thread safety can be shown to
not necessarily be async safe. If two different threads call fgets(),
mutexes will keep one thread from running if the other is in the
middle of changing the FILE * internal state. However, if that thread
is interrupted by the scheduler with the mutex held, and fork() is
called, then the new copy of the address space will still have that
mutex held. Any attempt by this new process, with its own address
space, to acquire the lock is doomed to failure. Since the parent and
child execute in different address spaces, there is no way for a
thread that does not exist in the child to unlock the locked mutex.
Normally this happens like so:
Thread A Thread B
fgets(fp, b1, 10);
lock fp's mutex
copy 5 available bytes into b1
fgets(fp, b2, 10)
try lock fp's mutex
unlock fp's mutex
return
attempt to lock finishes
b2 can be updated
unlock mutex.
However, in the fork case:
Thread A Thread B
fgets(fp, b1, 10);
lock fp's mutex
copy 5 available bytes into b1
fork()
fgets(fp, b2, 10)
try lock fp's mutex
At this point B', the only thread in the child, will never be able to
grab this lock because A exists only in the parent and the
parent/child have independent address spaces.
While the above example is not what nagios is doing, it illustrates
the point. There are some functions that necessarily touch global
state. These functions need to coordinate that touching of state. If
one of the is interrupted with locks held, then all bets are off of a
program forks and the threads holding those locks can never unlock
them.
> >> The list of async-signal-safe functions
> >> are here: http://www.opengroup.org/onlinepubs/009 ... frame.html
> >> The restriction on fork() is here (20th bullet down):
> >> http://www.opengroup.org/onlinepubs/009 ... frame.html
>
> Both of those links point to the same document, which is just the
> frameset for the navigation-frames.
>
> For async-safe functions, this is the proper url;
> http://www.opengroup.org/onlinepubs/009 ... ap02_09.h=
tml#tag_02_09_01
This reference is for thread-safe functions. You are confusing
thread-safe and async-safe. The correct url for async-safe is
http://www.opengroup.org/onlinepubs/009 ... 02_04.htm=
l#tag_02_04_03
>> The following table defines a set of functions that shall be either
>> reentrant or non-interruptible by signals and shall be
>> async-signal-safe. Therefore applications may invoke them, without
>> restriction, from signal-catching functions:
>>
Notice that this list is very short, and there are many functions that
one would think should be on here, but in fact aren't.
> For th
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]