Email List: Xaustin-group-lX
[All Lists]

Re: Bug in XSHd3 2.4.3

To: Geoff Clare <gwc@xxxxxxxxxxxxx>
Subject: Re: Bug in XSHd3 2.4.3
From: Mark Harris <mh-austin@xxxxxx>
Date: Tue, 17 Jul 2007 04:46:21 -0700
Cc: austin-group-l@xxxxxxxxxxxxx
References: <20070711145818.GA17840@xxxxxx>
Geoff Clare wrote:
> While merging in the new description of si_code, I noticed a problem
> in that SI_USER and SI_QUEUE are cases where the signal is sent by a
> process.  This overlaps with the (formerly XSI) requirement that
> si_code is <= 0 when the signal is from a process.  Current XSI systems
> handle this by defining SI_USER and SI_QUEUE with values <= 0.

The actual requirement seems to be the reverse of that:
(page 1871 line 59780 section sigaction)

    "If the value of si_code is less than or equal to 0, then the
    signal was generated by a process and si_pid and si_uid,
    respectively, indicate the process ID and the real user ID of
    the sender.  The <signal.h> header description contains
    information about the signal-specific contents of the
    elements of the siginfo_t type."

It does not seem to be required that any value of si_code be <= 0.
In that case the proposed change stating that SI_USER and SI_QUEUE
must be <= 0 on XSI systems is a new XSI requirement, which could
break the ABI.  (On at least AIX 5.3 and Mac OS X 10.4, SI_QUEUE
is > 0).  Similarly the proposed change that implementation-defined
values of si_code must be <= 0 if the signal was generated by a
process is a new requirement (for both XSI and non-XSI systems).
Whether existing implementation-defined values of si_code meet this
requirement depends on the definition of "the signal was generated
by a process", which doesn't seem to be defined anywhere.

XRAT B.2.4.2 page 3411-3412 line 116229-116244 mentions existing
implementations, where si_code > 0 implies the validity of
signal-specific members.  It seems to suggest using negative
numbers for the standard non-signal-specific values of si_code on
such systems, implying that this would meet the requirement quoted
above for values of si_code <= 0.  So since the definition of "the
signal was generated by a process" apparently includes SI_TIMER,
SI_ASYNCIO, and SI_MESGQ, it is a broad definition.  Solaris seems
to follow this suggestion, defining these as negative values and
filling in si_pid/si_uid based on the process requesting the
timer/aio/mesgq operation.  However with such a broad definition,
it could be argued that many implementation-specific values of
si_code are for signals generated by a process, and therefore
would not meet the proposed new requirement if currently > 0,
without ABI breakage.

Not all implementations fill in the signal-specific content when
si_code > 0, and the existing standard does not seem to mention
under what conditions these members are filled in.  For example
HP-UX and Tru64 define SI_USER/QUEUE <= 0 and
SI_TIMER/ASYNCIO/MESGQ > 0; presumably neither set includes the
signal-specific content so it is not dependent on si_code > 0.
Page 317 line 10990 seems to say that the listed members are always
valid for the respective signal, but on existing implementations
si_pid/si_uid overlap these members so that cannot be true at least
for SI_USER and SI_QUEUE.  This raises the question as to how a
portable application is to know when the signal-specific contents
of siginfo_t, defined at the top of page 317 in <signal.h>, are
valid.

In the case of Linux, generally the signal-specific members are
valid for si_code > 0 (except SI_KERNEL?).  However it is not
XSI-compliant because it defines SI_TIMER as negative (for
user-space), but it has a separate set of members that it
fills in specifically for SI_TIMER which does not include si_pid
or si_uid.  This means that it could not comply at all with the
revision, without ABI breakage.  (It supplies the timer id and
timer overrun count in fields which overlap si_pid and si_uid.)


> If we
> extend this requirement to non-XSI systems it will break the ABI of any
> implementations where SI_USER and/or SI_QUEUE are positive.  However,
> if we allow SI_USER and SI_QUEUE to be positive it will break any XSI
> applications that just check whether si_code is <= 0.  The only
> solution I can see is to have an explicit XSI requirement that SI_USER
> and SI_QUEUE are <= 0.  Non-XSI POSIX applications would check whether
> the signal is from a process by seeing if si_code is either SI_USER or
> SI_QUEUE or a value <= 0.

A different solution, which would not break any existing ABI,
would be to define a macro to indicate whether si_pid/si_uid are
valid.  A second macro could be used to indicate whether the
signal-specific members are valid, which seems to be a related
issue unless I am missing something.  In a case such as SI_TIMER
on Linux, neither would evaluate to true.  On implementations
such as Solaris, the first condition is simply a check for
si_code <= 0, and in fact Solaris has such a macro already:
  #define SI_FROMUSER(sip)        ((sip)->si_code <= 0)

As an example, on Solaris it appears that the signal-specific
members are valid when si_code > 0 except perhaps when si_code
is SI_NOINFO or SI_DTRACE (when only si_signo and si_code are
non-zero) or SI_RCTL (which has its own set of members).  If
that is accurate then the second macro on Solaris might check
for si_code > 0 && si_code < SI_RCTL.


>     If the signal was not generated by one of the functions or events
>     listed above, and not for one of the signal-specific reasons
>     described in [xref to <signal.h> page], si_code shall be set to an
>     implementation-defined value that is:
> 
>       a. not equal to any of the values defined for si_code in
>          POSIX.1-200x;
> 
>       b. less than or equal to 0 if the signal was generated by a process.

This seems to prohibit the existing practice on some systems
of reusing the same numeric values of si_code for different
signals.  For example, on Solaris the implementation-defined
TRAP_RWATCH is equal to the value for CLD_DUMPED:

#define TRAP_RWATCH     3       /* read access watchpoint trap */
#define CLD_DUMPED      3       /* child has coredumped */

(It could be argued that a watchpoint trap is "generated by a process"
as well, especially if async I/O is "generated by a process".  In
that case this would break both parts of the new rule.)

Also, I don't know how the [xref] will show up in the text, but
if it is a link that appears as simply "<signal.h>" then this
could be interpreted as meaning the signal.h provided by the
implementation.  That would change the meaning of the requirement.

 - Mark

<Prev in Thread] Current Thread [Next in Thread>