Email List: Xaustin-group-lX
[All Lists]

Re: Re: set -e and SIGCHLD

To: David Korn <yyy@xxxxxxxxxxxxxxxx>
Subject: Re: Re: set -e and SIGCHLD
From: Marc Aurele La France <yyy@xxxxxxxxxxx>
Date: Tue, 13 Mar 2001 10:13:20 -0700 (MST)
Cc: yyyyyyyyyyyyyy@xxxxxxxxxxxxx
On Tue, 13 Mar 2001, David Korn wrote:

> Maybe a bit of history will help resolve the SIGCHLD problem.

> System V Release 2 had a signal named SIGCLD and BSD 4.1 has
> a signal named SIGCHLD which were both referred to as
> sig child.

> Both of these were added to earlier version of UNIX so both
> were added in a way that would not affect existing
> programs by default.  Thus, their default behavior was
> to behave is if the signal were ignored.  Thus, no blocking
> call would be generated by either of these signals.

> The SIGCHLD or SIGCLD were used for implementing job control.
> Since I had implemented job control for both BSD and System V,
> I pointed out to the standards group that except for SIG_DFL,
> these signals had different semantics.

> If a signal handler was set for SIGCLD, then a signal would
> be generated if there were any unreaped child processes.
> When the signal handler was caught in System V, it was reset
> by default to SIG_DFL.  However, if a process did not
> reap a child and instead reestablished the signal handler,
> it would go into an infinite loop since the signal would
> be generated again.  The SIGCLD SIG_IGN behavior was that
> the system reaped the child when it completed so
> that the application didn't have to deal with it.
> However, I believe that a process blocked in wait() would
> be awakened, but I am not certain of this.

> The SIGHCLD signal on the other hand was generated when
> a child completed if a signal handler was set at that time.
> No signal would be generated if a signal handler was
> established while there was waiting children.
> The SIGCHLD signal was also generated when a child process stopped.
> I believe that BSD treated SIGHCLD SIG_IGN the same way
> that it treated SIGHCLD SIG_DFL.  

> The standard adopted the BSD SIGHCLD signal semantics with the
> following changes:

> 1.    The SA_NOCLDSTOP flag was added so that programs that
>       did not expect a signal on stop would not be affected.
> 2.    The behavior of SIGCHLD was made unspecified when SIG_IGN
>       is specified.

> The problem that is being presented is the case in which
> a process has SIGCHLD set to SIG_IGN and then execs a new
> process.  A conforming application would not set  SIGCHLD to SIG_IGN
> since the standard leaves this behavior unspecified.  An application
> that does set SIGCHLD to SIG_IGN  should set it back to SIG_DFL
> before the call to exec.

> The standard clearly states that signals set to SIG_IGN by
> the calling process image shall be set to be ignored by
> the new process image.  However, the fact that the behavior is
> unspecified, allows an implementation to treat this
> is if SIG_DFL were set and not automatically reap children,
> even if setting to SIG_IGN by the process itself would reap children.

Right.  So to summarise, an application cannot be strictly
POSIX-conforming unless it does one of the following:

- enlist the cooperation of whatever executable exec()'s it, or
- on entry, always set its SIGCHLD disposition to SIG_DFL or a function
  (and clear SA_NOCLDWAIT, if applicable), or
- not spawn child processes.

The first is unrealistic (an executable has no control over what invokes
it).

The second is impractical for a number of reasons.  For example, all BSD
applications (that assume SIG_DFL and SIG_IGN are semantically equivalent
for SIGCHLD) should not see their POSIX conformance affected by what they 
inherit through SIGCHLD.  Then there's the task of educating application
developers that SIG_DFL and SIG_IGN are >not< semantically equivalent.

The third obviously limits function.

The net effect of all this is that, of the plethora of applications that
claim conformance, few actually do conform.

The issue >isn't< that POSIX didn't take a stand and choose between SIGCLD
and SIGCHLD.  In fact, as I see it, POSIX had few degrees of freedom here.
Rather, the issue is that POSIX got "side-swipped" by (or turned a blind
eye towards) the technical consequences that inheriting automatic child
reaping would have on application conformance, consequences POSIX
unavoidably inherited from Spec 1170.

To my way of thinking, the solution is clear:  have conforming exec()
implementations >always< reset SIGC[H]LD dispositions to SIG_DFL, and 
SA_NOCLDWAIT to zero.  This is the only technically sound solution.

However, in the standards business, technical justifications provide only
a (very) small part of the puzzle.  There is an almost universally held 
perception that POSIX, and its lineage, could not possibly have gone wrong
like this.  And, yet, in my some seven years of dealing with this issue, I
have yet to see this perception justified by pointing to a real-world
application that actually does depend on inheriting automatic child
reaping.  My feeling is that such an application cannot exist nor be
useful.

Marc.

+----------------------------------+-----------------------------------+
|  Marc Aurele La France           |  work:   1-780-492-9310           |
|  Computing and Network Services  |  fax:    1-780-492-1729           |
|  352 General Services Building   |  email:  yyy@xxxxxxxxxxx          |
|  University of Alberta           +-----------------------------------+
|  Edmonton, Alberta               |                                   |
|  T6G 2H1                         |     Standard disclaimers apply    |
|  CANADA                          |                                   |
+----------------------------------+-----------------------------------+
XFree86 Core Team member.  ATI driver and X server internals.

<Prev in Thread] Current Thread [Next in Thread>