Email List: Xaustin-group-lX
[All Lists]

Re: RE_CONCAT: question about RE concatenation and subpattern matching

To: yyyyyyyyyyyyyy@xxxxxxxxxxxxx
Subject: Re: RE_CONCAT: question about RE concatenation and subpattern matching
From: Paul Eggert <yyyyyy@xxxxxxxxxxx>
Date: Tue, 9 Apr 2002 16:11:34 -0700 (PDT)
Cc: yyyyy@xxxxxxxxxxxxxx, yyyyy@xxxxxxxxxx, yyyy@xxxxxxx
References: <200204092202.SAA08966@raptor.research.att.com>
> Date: Tue, 9 Apr 2002 18:02:03 -0400 (EDT)
> From: David Korn <yyy@xxxxxxxxxxxxxxxx>
> 
> In section 9.3.6 on page 172 of the new standard, item 2 it
> defines a subexpression as the characters between \( and \) for a BRE
> ( for ERE it would be (...) ).

This suggests the interpretation that only parenthesized subpatterns
should count for the purposes of maximizing match length, and that
there need be no attempt to maximize the length of matches for
unparenthesized subpatterns.

Tom Lord nicely objected to this suggestion in
<http://www.opengroup.org/sophocles/show_mail.tpl?source=L&listname=austin-group-l&id=3715>
His argument, slightly edited, is:

   The "parenthesized subpatterns only" interpretation is, in my
   experience, a bad idea -- not only is it harder to implement, but
   it confuses users when adding apparently innocent parentheses to a
   pattern leads to a different matching behavior.

Another argument against the "parenthesized subpatterns only"
interpretation is that the standard does not use the term
"subexpression" in the the crucial sentence that we're trying to
interpret.  That sentence is:

   Consistent with the whole match being the longest of the leftmost
   matches, each subpattern, from left to right, shall match the
   longest possible string.

The term "subpattern" appears nowhere else in the standard, but in the
light of the June 1995 minutes a natural interpretation of
"subpattern" is that it is one of the nodes of the RE's parse tree,
and this includes non-parenthesized subpatterns as well as
parenthesized ones.

<Prev in Thread] Current Thread [Next in Thread>