Email List: Xaustin-review-lX
[All Lists]

Defect in XSH pthread_kill()

To: yyyyyyyyyyyyyyy@xxxxxxxxxxxxx
Subject: Defect in XSH pthread_kill()
From: yyyyyyyy@xxxxxxxxxx
Date: Wed, 6 Nov 2002 19:04:03 GMT
        Defect report from : Alexander Terekhov , IBM

(Please direct followup comments direct to yyyyyyyyyyyyyy@xxxxxxxxxxxxx)

@ page 1071 line 33597 section pthread_kill() objection {alt-kill-2002-11-06}

Problem:

Defect code :  1. Error

The specification says:

33597 The pthread_kill() function shall fail if:
33598 [ESRCH] No thread could be found corresponding to that specified by the 
given thread
33599 ID.
33600 [EINVAL] The value of the sig argument is an invalid or unsupported 
signal number.

I think that "shall fail" is wrong here too -- WRT [ESRCH], at least. 
The reasoning is as follows:

     http://groups.google.com/groups?selm=33D5F29E.3F54%40zko.dec.com
     (Subject: Re: [Q.] How to validate a thread ID)

     <Butenhof>

     A call to pthread_kill with a signal number of 0 will check 
     whether the thread ID is valid. If it's not valid, pthread_kill 
     will return ESRCH, however, not -1. No pthread function, either 
     in POSIX threads or in the obsolete "draft 7" API currently 
     provided with AIX, will return -1.

     A thread ID is "valid" from the time it's created (sometime 
     within the scope of the call to pthread_create that starts it) 
     until it has terminated and been detached (either via the 
     detachstate attribute having been set to PTHREAD_CREATE_DETACHED 
     or by a successful call to either pthread_detach or pthread_join). 
     There's no prohibition against using any function (except 
     pthread_detach and pthread_join) on a thread that's detached. 
     The catch is that you must ensure that the detached thread 
     can't have terminated. Although pthread_kill, unlike most 
     functions operating on a thread ID, is required to detect an 
     invalid thread ID, the system is allowed to "recycle" a thread 
     ID immediately upon termination of a detached thread -- so you 
     might be checking the wrong thread.

     There isn't necessarily any good use for pthread_kill(<id>,0). 
     It's there mostly just because it mirrors the behavior of 
     kill(<pid>,0). You can't use it to verify whether a detached 
     thread has terminated unless you already know it CANNOT have 
     terminated (because the thread ID may have been recycled), and 
     you can't really use it for joinable threads, either. The ID is 
     valid until it's been joined/detached, even if it's terminated, 
     so the pthread_kill won't fail; and, as Patrick says, you
     presumably can know whether you've joined with the thread 
     already. (And if you can't you're still in trouble since a 
     successful return from pthread_join detaches the thread and 
     allows the ID to be recycled.)

     So the documented, supported, and fully portable behavior of
     pthread_kill(<id>,0) is absolutely useless except as a 
     curiosity.

     </Butenhof>

     <XSH>

     22057 In secure implementations, a process may be restricted from 
           sending a signal to a process having
     22058 a different security label. In order to prevent the existence 
           or nonexistence of a process from
     22059 being used as a covert channel, such processes should 
           appear nonexistent to the sender; that is,
     22060 [ESRCH] should be returned, rather than [EPERM], if pid 
           refers only to such processes.
     22061 Existing implementations vary on the result of a kill() 
           with pid indicating an inactive process (a
     22062 terminated process that has not been waited for by its 
           parent). Some indicate success on such a
     22063 call (subject to permission checking), while others give 
           an error of [ESRCH]. Since the definition
     22064 of process lifetime in this volume of IEEE Std 1003.1-2001 
           covers inactive processes, the
     22065 [ESRCH] error as described is inappropriate in this case. 
           In particular, this means that an 
     22066 application cannot have a parent process check for 
           termination of a particular child with kill().
     22067 (Usually this is done with the null signal; this can be 
           done reliably with waitpid().)

     .....

     32001 The pthread_cancel() function may fail if:
                                         ^^^^^^^^

     32002 [ESRCH] No thread could be found corresponding to that 
           specified by the given thread
     32003 ID.

     .....

     33144 The pthread_getcpuclockid( ) function may fail if:
                                                 ^^^^^^^^

     33145 [ESRCH] The value specified by thread_id does not refer 
           to an existing thread.

     .....

     33206 The pthread_getschedparam() function may fail if:
                                                ^^^^^^^^

     33207 [ESRCH] The value specified by thread does not refer 
           to an existing thread.

     33208 The pthread_setschedparam() function may fail if:
                                                ^^^^^^^^
     .....
     33219 [ESRCH] The value specified by thread does not refer 
           to a existing thread.

     .....

     35346 The pthread_setschedprio() function may fail if:
                                               ^^^^^^^^

     .....
     35353 [ESRCH] The value specified by thread does not refer 
           to an existing thread.

     </XSH>


Action:

Make [ESRCH] *OPTIONAL* -- "may fail". Well, I don't really 
care about [EINVAL] in the case of pthread_kill(), but please 
consider also the following [with respect to ESRCH-vs-EINVAL
and pthread_t values]:

http://groups.google.com/groups?selm=RkQt9.5%24Rr2.256121%40news.cpqcorp.net
(Subject: Re: pthread_join() on detached/exited/garbage thread?)

<Butenhof>

POSIX says it is an error to join a thread that's been joined or detached, 
so the program cannot do so.

The implementation, however, is required to detect and report that error, 
and it fails to do so. The second pthread_join() in each case must fail 
with at least EINVAL. (It cannot succeed.)

The second pthread_join() call MAY also fail with ESRCH, depending on where 
the implementation wakes joining threads during termination; timing between 
the first join, termination, and the second join; and how the 
implementation detects the error conditions.

That is, the target thread shall have always been detached/joined at the 
time of the second pthread_join() call, and it cannot succeed. However, 
given the ambiguity in definition of thread termination and join, the 
target may either have completely terminated at the time of the second join 
(in which case an ESRCH is appropriate), or it may still "exist", in which 
case EINVAL would be appropriate.
 
There's nothing wrong with returning EINVAL even when the thread no longer 
exists, presumably indicating that the pthread_t value HAD BEEN valid, 
where ESRCH would mean the implementation knew it had never been valid, or 
at least cannot determine whether it might have been valid. (A thread that 
doesn't exist clearly "isn't joinable", after all.) In any case, the second 
call to pthread_join() CANNOT succeed.

I presume that you see varying results because the implementation allows a 
thread to remain "existing" after the return of pthread_join(), and it 
sometimes continues to exist until the second call to pthread_join(). My 
guess is that pthread_join() is treating EINVAL as an existence test, and 
failing to distinguish that the thread is not joinable when it hasn't yet 
terminated completely.

Note to everyone else: Alexander has instigated discussion of several issues 
within the Open Group forum, one of which is tightening the definition of 
pthread_join() to require that it return only when the target thread has 
been fully terminated. If that discussion were to eventually lead to 
changes in the standard, there could be no variability in the outcome of 
the second call to pthread_join().

Such work need not necessarily tighten the distinction between EINVAL and 
ESRCH such that a pthread_t value representing a thread that no longer 
exists couldn't legitimately be considered either "not existing" or "not 
joinable", though it might.

</Butenhof>

<Prev in Thread] Current Thread [Next in Thread>
  • Defect in XSH pthread_kill(), terekhov <=