Email List: Xaustin-regexp-lX
[All Lists]

starting point

To: yyyyyyyyyyyyyyy@xxxxxxxxxxxxx
Subject: starting point
From: David Korn <yyy@xxxxxxxxxxxxxxxx>
Date: Thu, 23 May 2002 11:02:16 -0400 (EDT)
I am the chair for the new regular expression subgroup of
the Austin group.  Or charter is to resolve three aardvarks
that have been submitted to correct defects in the current
definition of regular expressions and to the regcmp()/regex()
interface in the standard.

I want to start with ERN 17 which I have included below.  It
deals with issues that were debated and resolved at a meeting of
regular expression experts held in Toronto in 1995 that were
never merged into the current standard.  I don't expect
that there are many controversal issues here, but we need to
make sure that it is correct and unambiguous.

The issues with ERN 18 and ERN 19 are interrelated and offer
contradictory views and therefore should be discussed simultaneously
after we have completed ERN 17.

On useful suggestion that I have received from Glenn Fowler,
is to have an open source test harness that can be used
by all conforming implementations to verify all examples
we discuss or are in the standard.  I will ask Glenn to
present his proposal to this group.

 _____________________________________________________________________________
 OBJECTION                                       Enhancement Request Number 17
 yyyyyy@xxxxxxxxxxx              Defect in XBD Regular Expressions (rdvk#  43)
 {eggert20020430a}                        Wed, 1 May 2002 00:17:32 +0100 (BST)
 _____________________________________________________________________________
 Accept_____    Accept as marked below_X___     Duplicate_____     Reject_____
 Rationale for rejected or partial changes:

This is being deferred to a new  subgroup.

 _____________________________________________________________________________
 Page: 167  Line: 5889  Section: Regular


 Problem:

 Defect code :  3. Clarification required

 The Regular Expressions section
 <http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap09.html#tag_09>
 does not reflect the resolutions of the June 1995 POSIX RE experts
 meeting as reported by David Korn in
 
<http://www.opengroup.org/sophocles/show_mail.tpl?source=L&listname=austin-group
 -l&id=3713>


 Action:

 Adopt the interpretations of the June 1995 POSIX RE experts meeting as
 referenced above, with the exceptions of the I18N interpretations
 (i.e, those interpretations starting with Interp #41, Part 7 and
 continuing to the end of the meeting notes).  The I18N interpretations
 are now somewhat obsolete, since that part of the standard was changed
 in POSIX 1003.1-2001.  However, the other interpretations are to parts
 of the standard that have not changed, so they are still relevant.

 [Ed recommendation: None

(From X-Mailing-List: austin-group-l:archive/latest/3971)

In the copy below, I have edited the page, section, and line numbers
so that they refer to the 2001 final text, and (as I mentioned
earlier) I am omitting the I18N-related actions.  Also, for ease of
reference I am placing an "ACTION:" annotation at the start of each
recommended action.  These are the only changes that I made.

          o Interp #43, Part 15.  To which match is a backreference
            to a duplicated subexpression bound?
ACTION:
            To resolve Interp
            #43, Part 15, add on XBD page 172, Section 9.3.6(3)
            after line 6109, "When a referenced
            subexpression does not match any string (not even the
            empty string), the backreference expression fails to
            match.  When subexpressions are nested, the substrings
            matching them are similarly nested.  When a contained
            subexpression fails to participate in the last match of
            its containing subexpression, backreferences to the
            contained subexpression fail to match."

            For example
                    \(a*\(b\)*\)\{2\}\2
            fails to match
                    ba

          o Interp #43, Part 12.  Can a duplicated subexpression
            match the null string?  If so, will the duplication be
            repeated until the expression does match the null
            string?  There was a consensus that applying a
            duplicator to a RE that could match the empty string
            should be unspecified.  However, if specified, the
            specification in P1003.1-2001 is incorrect and should
            be changed.
ACTION:
            On XBD page 172, Section 9.3.6, change line 6127 to, "When a
            subexpression or a backreference is repeated by an
            asterisk(*) or an interval expression, the
            subexpression shall not match a null".
            Also, change XBD page 175 section 9.4.6 lines 6239-6241
            to, "When an ERE enclosed in parentheses is repeated by
            a *, ?, +, or an interval expression, the ERE enclosed
            in parentheses shall not match the empty string unless
            it is necessary to satisfy the exact or minimum number
            of occurrences for the + or interval expression."

          o Interp #43, Part 14.  What does it mean by the left-
            to-right order in a match on line 5908?  For example,
            with pattern
                    ((..)*(.....)*)*
            and string xxxxx, what should \1 be?  Lines 5911 and
            5908 would give contradictory answers.
ACTION:
            To resolve Interp #43, Part 14, add on XBD page 167, Section
            9.1 after sentence ending on line 5908,
            "An enclosed subpattern is deemed to be to the right of
            an enclosing pattern."  On XBD page 172, Section 9.3.6,
            change lines 6090-6091 to "The following rules, in
            conjunction with the general requirements of Sections 9.1 and 9.2,
            shall be used to construct BREs matching multiple
            characters".  On XBD page 172, Section 9.3.6(2),
            line 6095 replace "whatever" with "any string".  On XBD
            page 175, Section 9.4.6, change lines 6205-6206
            to "The following rules, in conjunction with the
            general requirements of Sections 9.1 and 9.2, shall be used to
            construct EREs matching multiple characters".  On XBD page
            175, Section 9.4.6(1), line 6209 replace
            "whatever" with "any string".

          o Interp #44.  There was unanimous agreement that the
            error numbers must be unique.

          o Interp #45.  Current interp and change OK.

          o Interp #60.  This needs to be fixed in .1.

          o Interp #73.  Current interp and change OK.

          o Interp #82.  The current interpretation is incorrect.
            In section 9.3.6(3), lines 6098-6099 the standard says
            that a backreference matches a "string of characters".
            Therefore, the standard requires that the expression
            \(^b\)\1 must match the first two characters of bbbb.

          o Interp #85.  Agreement except that dumping core should
            not be allowed for bad expressions.
ACTION:
            Therefore on XBD section 9 lines 5927, 5942, 5970, 5982, 6075,
            6125, 6171, 6180, 6185, 6193, 6238, 6468, undefined should be
            changed to unspecified.

          o Interp #86.  Current interp and change OK.

          o Interp #88.  Wording for interp #45, part 15 should
            take care of this.

          o Interp #125.  What is the meaning of BRE\{0,0\}?  The
            current wording leaves the behavior unspecified.
ACTION:
            On XBD page 172, Section 9.3.6(4), add after end of
            sentence on line 6112, "Zero occurrences of a BRE match
            the empty string".  On XBD page 175, Section 9.4.6(3),
            add after end of sentence on line 6219, "Zero
            occurrences of an ERE match the empty string".

            Note
            that this added sentence must also apply to parts (3),
            (4) and (5).

          o Doug #6.  Does case folding apply to backreferences?
ACTION:
            On XBD page 172, Section 9.3.6(3), add after line
            6109, "When pattern matching is being performed without
            regard to case, the backreference match will occur
            without regard to case."
ACTION:
            Also, on XBD page 170, Section 9.3.5(3), add after
            end of sentence on line 6022, "Whenever pattern
            matching is being performed without regard to case,
            each character or collating element shall be deemed to
            stand for itself and all its case counterparts."  On
            XBD page 168, Section 9.2, line 5954 change
            "counterpart" to "counterparts".

            It wasn't clear
            whether there is such a thing as upper and lower
            multi-character collating elements.


 _____________________________________________________________________________

David Korn
research!dgk
yyy@xxxxxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>