Email List: Xaustin-review-lX
[All Lists]

RE: Defect in XBD 9.3.6

To: "Geoff Clare" <yyy@xxxxxxxxxxxxx>, <yyyyyyyyyyyyyyy@xxxxxxxxxxxxx>
Subject: RE: Defect in XBD 9.3.6
From: "Donn Terry" <yyyyyy@xxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 1 Mar 2004 08:20:04 -0800
Thread-index: AcP/oCKqLnFTfFIJR5mIdv4ycyEWZAACH4sQ
Thread-topic: Defect in XBD 9.3.6
(This is intended to trigger other peoples' memories.)  I vaguely
remember that there were a few places where the historical behavior of a
command and the historical (or desired) behavior of a library that could
implement that command differed, and that in at least a few cases the
difference was retained explicitly (and should have been mentioned in
the rationale).  Is this one of those cases?

Donn

-----Original Message-----
From: Geoff Clare [mailto:yyy@xxxxxxxxxxxxx] 
Sent: Monday, March 01, 2004 7:16 AM
To: yyyyyyyyyyyyyyy@xxxxxxxxxxxxx
Subject: Defect in XBD 9.3.6

@ page 172 line 6105-6109 section 9.3.6 comment [gwc BRE nested
subpatterns]

Defect code :  3. Clarification required

Problem:

There seems to be a discrepancy between the description of BREs in
XBD6 and the description of regcomp() in XSH6 as regards the treatment
of nested subpatterns with a following * repeater.

The example that prompted this is whether:

    echo aba | sed 's/\(a\(b\)*\)*/<\1|\2>/'

should output <a|b> or <a|>.

According to the description of BREs:

    "If the subexpression referenced by the back-reference matches
    more than one string because of an asterisk ( '*' ) or an interval
    expression (see item (5)), the back-reference shall match the last
    (rightmost) of these strings."

which seems to imply that the correct output is <a|b> since the last
string that the second subpattern matches is the b that it got from
the first iteration of the first subpattern.  (In the second iteration
of the first subpattern the second subpattern doesn't match anything.)
Several sed implementations do output <a|b>.

However, the description of regcomp() says:

"3. If subexpression i is contained within another subexpression
    j, and i is not contained within any other subexpression that
    is contained within j, and a match of subexpression j is
    reported in pmatch[j], then the match or non-match of
    subexpression i reported in pmatch[i] shall be as described in
    1. and 2. above, but within the substring reported in pmatch[j]
    rather than the whole string. The offsets in pmatch[i] are
    still relative to the start of string."

Since the second subpattern in the example does not match anything
within the last string matched by the first subpattern, this implies
that regcomp() would report the second subpattern as a non-match.

Although there is no requirement for sed to be implemented using
regcomp(), presumably this is an unintentional difference between the
descriptions of regcomp() and BREs.

The text quoted above on BREs was an addition in POSIX.1-2001 intended
to clarify what happens for repeated subpatterns.   Possibly the added
text is too simplistic and should have treated nested subpatterns in
the way that regcomp() does.  On the other hand, since several current
sed implementations do not behave as the regcomp() description implies,
perhaps it is the regcomp() description that needs alteration.
(So far I am only aware of one, historical, sed implementation which
outputs <a|> for the above example, however I have not tested many.)

Action:

Issue an official interpretation of the current requirement.

Depending on the outcome of the interpretation, implement a
clarification or correction in the next revision.

<Prev in Thread] Current Thread [Next in Thread>