Email List: Xaustin-group-lX
[All Lists]

Re: bug 167 needs more rationale changes

To: austin-group-l@xxxxxxxxxxxxx
Subject: Re: bug 167 needs more rationale changes
From: Geoff Clare <gwc@xxxxxxxxxxxxx>
Date: Tue, 3 Nov 2009 17:05:42 +0000
References: <20091016111522.GA22507@squonk.masqnet> <4ad86b9b.0BBie0MEkAodDZzD%Joerg.Schilling@fokus.fraunhofer.de> <alpine.BSF.2.00.0910161114000.11151@thor.farley.org> <20091019095943.GA15186@squonk.masqnet> <4AE6E72B.4050104@byu.net> <20091029093155.GA22277@squonk.masqnet>
I wrote, on 29 Oct 2009:
>
> Eric Blake <ebb9@byu.net> wrote, on 27 Oct 2009:
> >
> > I'm hoping, along with Joerg, that we can modify the proposed wording of
> > 167 to explicitly allow modification of environ itself [...]

> This seems like a workable compromise.  There is no teleconference
> today, but if at least two of the ORs reply to this mail saying they
> would support this modification, then I would be happy to put together
> an updated set of changes for discussion in next week's teleconference.

Two of the ORs indicated that they would support the modification.

Below is the new set of changes I have come up with.  One problem
that occurred to me is that when the implementation detects that
environ has changed, it would have to treat all the strings in
the new environment as "volatile" (as if they had been added
by putenv()), in case the application overwrites any of the
strings later on.  I have tried to circumvent that by explicitly
allowing the implementation to copy the strings to a new array
and assign environ to point to it, and warning applications not
to rely on those strings remaining part of the environment.

Changes to XBD...

At page 173 line 5476 section 8.1, change:

    manipulating the environ variable

to:

    assigning a new value to the environ variable

Add a new paragraph after line 5478:

    If the application modifies the pointers to which environ
    points, the behavior of all interfaces described in the System
    Interfaces volume of POSIX.1-2008 is undefined.

Changes to XSH...

Remove the undefined behavior note in getenv (line 33856), setenv
(line 59347) and unsetenv (line 68256).  The change history should
note that these are no longer needed because XBD 8.1 says the
behaviour of all interfaces is undefined.

Modify the rationale for getenv on page 1009, line 33885 from:

    Conforming applications are required not to modify environ
    directly, but to use only the functions described here to
    manipulate the process environment as an abstract object. Thus,
    the implementation of the environment access functions has
    complete control over the data structure used to represent the
    environment (subject to the requirement that environ be
    maintained as a list of strings with embedded <equals-sign>
    characters for applications that wish to scan the environment).
    This constraint allows the implementation to properly manage
    the memory it allocates, either by using allocated storage for all
    variables (copying them on the first invocation of setenv() or
    unsetenv()), or keeping track of which strings are currently in
    allocated space and which are not, via a separate table or some
    other means. This enables the implementation to free any allocated
    space used by strings (and perhaps the pointers to them) stored in
    environ when unsetenv() is called.  A C runtime start-up procedure
    (that which invokes main() and perhaps initializes environ) can
    also initialize a flag indicating that none of the environment has
    yet been copied to allocated storage, or that the separate table
    has not yet been initialized.

    In fact, for higher performance of getenv(), the implementation
    could also maintain a separate copy of the environment in a data
    structure that could be searched much more quickly (such as an
    indexed hash table, or a binary tree), and update both it and the
    linear list at environ when setenv() or unsetenv() is invoked.

to:

    Conforming applications are required not to directly modify the
    pointers to which environ points, but to use only the setenv(),
    unsetenv() and putenv() functions, or assignment to environ
    itself, to manipulate the process environment.  This constraint
    allows the implementation to properly manage the memory it
    allocates.  This enables the implementation to free any space it
    has allocated to strings (and perhaps the pointers to them)
    stored in environ when unsetenv() is called.  A C runtime start-up
    procedure (that which invokes main() and perhaps initializes
    environ) can also initialize a flag indicating that none of the
    environment has yet been copied to allocated storage, or that the
    separate table has not yet been initialized.  If the application
    switches to a complete new environment by assigning a new value
    to environ, this can be detected by getenv(), setenv(), unsetenv()
    or putenv() and the implementation can at that point reinitialize
    based on the new environment.  (This may include copying the
    environment strings into a new array and assigning environ to
    point to it.)

    In fact, for higher performance of getenv(), implementations
    that do not provide putenv() could also maintain a separate copy
    of the environment in a data structure that could be searched
    much more quickly (such as an indexed hash table, or a binary
    tree), and update both it and the linear list at environ when
    setenv() or unsetenv() is invoked.  On implementations that do
    provide putenv(), such a copy might still be worthwhile but
    would need to allow for the fact that applications can directly
    modify the content of environment strings added with putenv().
    For example, if an environment string found by searching the
    copy is one that was added using putenv(), the implementation
    would need to check that the string in environ still has the
    same name (and value, if the copy includes values), and whenever
    searching the copy produces no match the implementation would
    then need to search each environment string in environ that
    was added using putenv() in case any of them have changed their
    names and now match.  Thus each use of putenv() to add to the
    environment would reduce the speed advantage of having the copy.

After page 772 line 25712 section exec, add two new paragraphs:

    Applications can change the entire environment in a single
    operation by assigning the environ variable to point to an array
    of character pointers to the new environment strings. 
    After assigning a new value to environ, applications should
    not rely on the new environment strings remaining part of the
    environment, as a call to getenv(), [XSI]putenv(),[/XSI]
    setenv(), unsetenv() or any function that is dependent on an
    environment variable may, on noticing that environ has changed,
    copy the environment strings to a new array and assign environ
    to point to it.

    Any application that directly modifies the pointers to which the
    environ variable points has undefined behavior.

At page 779 line 25989 section exec, change:

    The environ array should not be accessed directly by the
    application.

    The new process might be invoked in a non-conforming environment
    if the envp array does not contain implementation-defined
    variables required by the implementation to provide a
    conforming environment. See the _CS_V7_ENV entry in <unistd.h>
    and confstr() for details.

to:

    When assigning a new value to the environ variable, applications
    should ensure that the environment to which it will point contains
    at least the following:

        a. Any implementation-defined variables required by the
           implementation to provide a conforming environment. See the
           _CS_V7_ENV entry in <unistd.h> and confstr() for details.

        b. A value for PATH which finds conforming versions of all
           standard utilities before any other versions.

    The same constraint applies to the envp array passed to execle()
    or execve(), in order to ensure that the new process image is
    invoked in a conforming environment.

At page 1717 line 54880 section putenv, after:

    The setenv() function is preferred over this function.

add (as part of the same paragraph):

    One reason is that putenv() is optional and therefore less
    portable.  Another is that using putenv() can slow down
    environment searches, as explained in the RATIONALE for getenv().

At page 1717 line 54882 section putenv, change:

    The standard developers noted that putenv() is the only function
    available to add to the environment without permitting memory
    leaks.

to:

    Refer to the RATIONALE section in [xref to setenv()].

Add to the end of the setenv() rationale (P1858 L59381):

    See also the RATIONALE section in [xref to getenv()].

-- 
Geoff Clare <g.clare@opengroup.org>
The Open Group, Thames Tower, Station Road, Reading, RG1 1LX, England

<Prev in Thread] Current Thread [Next in Thread>