Schwarz, Konrad wrote:
...
> > The second takes the address of a data pointer variable and
> > asks that the representation of a function pointer be stored
> > at that address. Since this is undefined behavior too, a
> > compiler could in principle guess what the intent is and do
> > the right thing; but in practice, any compiler will do
> > exactly what it's asked to do.
>
> Actually, the C standard requires compilers to do the "wrong"
> thing. See below.
Actually, that depends on details that are implementation-defined,
possibly unspecified, and not entirely clear in the C standard. The
pointer conversion may violate alignment requirements on some
implementations, making the behaviour completely undefined (which means
that the C standard does *not* require the compiler to do the "wrong"
thing, and in principle a compiler could read the programmer's mind and
do the "right" thing). Also, if function pointers are smaller than void
pointers, the assignment overwrites bytes outside of the object it's
meant to assign to, again causing undefined behaviour. But yes, on
implementations where the pointer conversion works and produces a
pointer to a sufficiently large object, the standard requires the
assignment to store a char-pointer representation in (a portion of) an
object declared as a function pointer. But whether that is the "wrong"
thing or close enough to the "right" thing depends on whether pointers
to void and pointers to functions have identical size and representation
on the particular implementation.
> "Type punning", as in the original POSIX example, is not
> allowed: "The meaning of a value stored in an object or
> returned by a function is determined by the type of the
> expression used to access it.", C89, 6.1.2.5. For example,
> if on a machine as described above, a character pointer value
> were stored into an object that has been typed as a character
> pointer, it is stored using the character pointer
> representation (where else could it put the extra bits?). If
> it is later retrieved from that same object typed as a
> non-character pointer, the same value, *without* conversion
> of its representation, is extracted: "the meaning of the
> value is determined by the type of the expression used to
> access it". I.e., no conversion!
I would object to saying that it's "the same value" -- it really is a
value with the same representation. In your Cray example, a pointer to
any word-aligned type (including, presumably, pointers to functions) is
represented by the machine address of a word, whereas a pointer to a
character type (or to void) is represented by the word address
multiplied by four, added to the index of a byte inside the word.
Converting between a byte pointer and any other data pointer is done by
shifting the representation by two bits in the appropriate direction.
Converting between a data pointer and a function pointer is undefined
behaviour and the compiler is free to implement it in whatever way it
wants (or not implement it at all).
If compiler developers had read the original Posix example, they could
interpret it as saying that for functions, dlsym() is supposed to return
a void* pointer whose *representation* matches the representation of the
correct function pointer. In other words, the unshifted word address of
the function, rather than a "proper" pointer to the first byte of the
function. (In other words, the void* pointer would seem to point to a
byte within the word located at one-fourth of the real machine address
of the function.)
If they also read the new example, they would interpret it as saying
that the *conversion* of what dlsym() returns to a function pointer must
work as well. They could choose to satisfy both requirements by
implementing conversions between byte pointers and function pointers
*without* the usual address shifting. This would make both POSIX
examples of dlsym() work, but would make those conversions behave in an
unusual way that could break programs that attempt to process code as
data, such as loaders.
|