Defect report from : Alexander Terekhov , IBM
(Please direct followup comments direct to yyyyyyyyyyyyyy@xxxxxxxxxxxxx)
@ page 1071 line 33597 section pthread_kill() objection {alt-kill-2002-11-06}
Problem:
Defect code : 1. Error
The specification says:
33597 The pthread_kill() function shall fail if:
33598 [ESRCH] No thread could be found corresponding to that specified by the
given thread
33599 ID.
33600 [EINVAL] The value of the sig argument is an invalid or unsupported
signal number.
I think that "shall fail" is wrong here too -- WRT [ESRCH], at least.
The reasoning is as follows:
http://groups.google.com/groups?selm=33D5F29E.3F54%40zko.dec.com
(Subject: Re: [Q.] How to validate a thread ID)
<Butenhof>
A call to pthread_kill with a signal number of 0 will check
whether the thread ID is valid. If it's not valid, pthread_kill
will return ESRCH, however, not -1. No pthread function, either
in POSIX threads or in the obsolete "draft 7" API currently
provided with AIX, will return -1.
A thread ID is "valid" from the time it's created (sometime
within the scope of the call to pthread_create that starts it)
until it has terminated and been detached (either via the
detachstate attribute having been set to PTHREAD_CREATE_DETACHED
or by a successful call to either pthread_detach or pthread_join).
There's no prohibition against using any function (except
pthread_detach and pthread_join) on a thread that's detached.
The catch is that you must ensure that the detached thread
can't have terminated. Although pthread_kill, unlike most
functions operating on a thread ID, is required to detect an
invalid thread ID, the system is allowed to "recycle" a thread
ID immediately upon termination of a detached thread -- so you
might be checking the wrong thread.
There isn't necessarily any good use for pthread_kill(<id>,0).
It's there mostly just because it mirrors the behavior of
kill(<pid>,0). You can't use it to verify whether a detached
thread has terminated unless you already know it CANNOT have
terminated (because the thread ID may have been recycled), and
you can't really use it for joinable threads, either. The ID is
valid until it's been joined/detached, even if it's terminated,
so the pthread_kill won't fail; and, as Patrick says, you
presumably can know whether you've joined with the thread
already. (And if you can't you're still in trouble since a
successful return from pthread_join detaches the thread and
allows the ID to be recycled.)
So the documented, supported, and fully portable behavior of
pthread_kill(<id>,0) is absolutely useless except as a
curiosity.
</Butenhof>
<XSH>
22057 In secure implementations, a process may be restricted from
sending a signal to a process having
22058 a different security label. In order to prevent the existence
or nonexistence of a process from
22059 being used as a covert channel, such processes should
appear nonexistent to the sender; that is,
22060 [ESRCH] should be returned, rather than [EPERM], if pid
refers only to such processes.
22061 Existing implementations vary on the result of a kill()
with pid indicating an inactive process (a
22062 terminated process that has not been waited for by its
parent). Some indicate success on such a
22063 call (subject to permission checking), while others give
an error of [ESRCH]. Since the definition
22064 of process lifetime in this volume of IEEE Std 1003.1-2001
covers inactive processes, the
22065 [ESRCH] error as described is inappropriate in this case.
In particular, this means that an
22066 application cannot have a parent process check for
termination of a particular child with kill().
22067 (Usually this is done with the null signal; this can be
done reliably with waitpid().)
.....
32001 The pthread_cancel() function may fail if:
^^^^^^^^
32002 [ESRCH] No thread could be found corresponding to that
specified by the given thread
32003 ID.
.....
33144 The pthread_getcpuclockid( ) function may fail if:
^^^^^^^^
33145 [ESRCH] The value specified by thread_id does not refer
to an existing thread.
.....
33206 The pthread_getschedparam() function may fail if:
^^^^^^^^
33207 [ESRCH] The value specified by thread does not refer
to an existing thread.
33208 The pthread_setschedparam() function may fail if:
^^^^^^^^
.....
33219 [ESRCH] The value specified by thread does not refer
to a existing thread.
.....
35346 The pthread_setschedprio() function may fail if:
^^^^^^^^
.....
35353 [ESRCH] The value specified by thread does not refer
to an existing thread.
</XSH>
Action:
Make [ESRCH] *OPTIONAL* -- "may fail". Well, I don't really
care about [EINVAL] in the case of pthread_kill(), but please
consider also the following [with respect to ESRCH-vs-EINVAL
and pthread_t values]:
http://groups.google.com/groups?selm=RkQt9.5%24Rr2.256121%40news.cpqcorp.net
(Subject: Re: pthread_join() on detached/exited/garbage thread?)
<Butenhof>
POSIX says it is an error to join a thread that's been joined or detached,
so the program cannot do so.
The implementation, however, is required to detect and report that error,
and it fails to do so. The second pthread_join() in each case must fail
with at least EINVAL. (It cannot succeed.)
The second pthread_join() call MAY also fail with ESRCH, depending on where
the implementation wakes joining threads during termination; timing between
the first join, termination, and the second join; and how the
implementation detects the error conditions.
That is, the target thread shall have always been detached/joined at the
time of the second pthread_join() call, and it cannot succeed. However,
given the ambiguity in definition of thread termination and join, the
target may either have completely terminated at the time of the second join
(in which case an ESRCH is appropriate), or it may still "exist", in which
case EINVAL would be appropriate.
There's nothing wrong with returning EINVAL even when the thread no longer
exists, presumably indicating that the pthread_t value HAD BEEN valid,
where ESRCH would mean the implementation knew it had never been valid, or
at least cannot determine whether it might have been valid. (A thread that
doesn't exist clearly "isn't joinable", after all.) In any case, the second
call to pthread_join() CANNOT succeed.
I presume that you see varying results because the implementation allows a
thread to remain "existing" after the return of pthread_join(), and it
sometimes continues to exist until the second call to pthread_join(). My
guess is that pthread_join() is treating EINVAL as an existence test, and
failing to distinguish that the thread is not joinable when it hasn't yet
terminated completely.
Note to everyone else: Alexander has instigated discussion of several issues
within the Open Group forum, one of which is tightening the definition of
pthread_join() to require that it return only when the target thread has
been fully terminated. If that discussion were to eventually lead to
changes in the standard, there could be no variability in the outcome of
the second call to pthread_join().
Such work need not necessarily tighten the distinction between EINVAL and
ESRCH such that a pthread_t value representing a thread that no longer
exists couldn't legitimately be considered either "not existing" or "not
joinable", though it might.
</Butenhof>
|