Email List: Xaustin-regexp-lX
[All Lists]

Re: Comments on reg exp standard

To: yyyyyyyyyyyyyyy@xxxxxxxxxxxxx
Subject: Re: Comments on reg exp standard
From: Paul Eggert <yyyyyy@xxxxxxxxxxx>
Date: Tue, 28 May 2002 19:47:11 -0700 (PDT)
Cc: "Jon Hitchcock" <yyyyyyyyyyyy@xxxxxxxxxxx>
References: <F133757011RycMCuuMj0000f8e1@hotmail.com>
> From: "Jon Hitchcock" <yyyyyyyyyyyy@xxxxxxxxxxx>
> Date: Tue, 28 May 2002 15:51:23 +0100

> Some of my comments relate to ERN 17, and I assume that the proposed
> definition of "subexpression" in ERN 18 is accepted, so that the term
> is defined for both BREs and EREs.

Unfortunately ERN 18 conflicts with ERN 19, so we have to be careful
about the term "subexpression" before we resolve the ERN 18 and 19 issues.

> Line 5908:  Is it necessary to say "For this purpose, a null string
> shall be considered to be longer than no match at all"?  The paragraph
> is about choosing one of a number of matches, so something which does
> not match is irrelevant.  If some subtle point is being made, it would
> help to have an example giving the different results with and without
> this rule.

Such an example is given in line 5911.  If it were not for that rule,
matching the BRE /\(a*\)*/ against "bc" would cause \1 to not match.

> Line 5908:  ERN 17 suggests adding "An enclosed subpattern is deemed to
> be to the right of an enclosing pattern."  To me, this seems contrived,
> and it would be more natural to have a recursive description saying
> that, where subexpressions are nested, the rule is applied first to the
> whole expression and then to the subexpressions.  An alternative view
> is that the proposed wording is consise and precise, and that an
> explanation can be put in the rationale.

The latter should suffice.

How about if we add the following text after XRAT line 2370?

        Since enclosing patterns are deemed to be to the left of
        enclosed patterns, it is more important to maximize the
        overall length of /XY/ than it is to maximize the length of
        /X/ within /XY/.  For example, when we match the ERE
        /((week|wee)(night|knights))(s*)/ to the entire string
        "weeknights", the longest consistent match for
        /((week|wee)(night|knights))/ is the entire string
        "weeknights", and therefore in that context /(week|wee)/
        matches "wee", not "week".

        The relationship between enclosed and enclosing patterns also
        holds for duplicated expressions.  For example, when we match
        the pattern /(a.*z|b.*y)*/ to the entire string "azbazby", the
        last match for /(a.*z|b.*y)/ is the longest suffix that is
        consistent with an overall match, namely "bazby".


> Line 6095:  ERN 17 suggests changing "whatever" to "any string".  Does
> this make a difference?....
> Line 6209:  As for line 6095 ("whatever").

I don't see any technical difference, but the change does tighten up
and clarify the text a bit.

> Lines 6105-6109: The two sentences ["When the referenced subexpression
> matched more than one string, the back-referenced expression shall refer
> to the last matched string.  If the subexpression referenced by the
> back-reference matches more than one string because of an asterisk
> (’*’) or an interval expression (see item (5)), the back-reference
> shall match the last (rightmost) of these strings."] seem to say the
> same thing in different words.  If this is so, I suggest removing the
> first sentence to avoid any doubt that its meaning is subtly different.

Yes, it does appear to me that the second sentence was a
rewrite/clarification of the first one, and that somehow the first one
was not deleted as it should have been.  I agree with your suggestion.

> Line 6137:  The same precedence rules apply when subexpressions and
> back-references are duplicated.  I suggest changing
> "Single-character-BRE duplication" to just "Duplication"....
> Line 6254:  The same precedence rules apply when EREs enclosed in
> parentheses are duplicated.  I suggest changing "Single-character-ERE
> duplication" to just "Duplication".

Yes, this would correct labeling errors in those two tables.

> Line 6162:  I am sure I am not the only person who has been misled by
> the word "Extended" in ERE.  I suggest adding:
> 
>      Note:  The specification for EREs is not purely an extension of
>      that for BREs.  Back-references are not available, and the
>      notation for subexpressions and interval expressions is different.

Good point, but I think this is more suitable for the rationale.  It
could be added (without the "Note:") after line XRAT line 2555, say.

> Line 6253:  The term "Grouping" is used nowhere else.  I suggest
> changing it to "Subexpressions".

This partly runs into the controversy over the definition of
"subexpression" alluded to above.  Perhaps a less controversial name
would be "Parenthesized expression", by analogy with "Bracket
Expression" in the previous line.

<Prev in Thread] Current Thread [Next in Thread>