> Date: Tue, 9 Apr 2002 18:02:03 -0400 (EDT)
> From: David Korn <yyy@xxxxxxxxxxxxxxxx>
>
> In section 9.3.6 on page 172 of the new standard, item 2 it
> defines a subexpression as the characters between \( and \) for a BRE
> ( for ERE it would be (...) ).
This suggests the interpretation that only parenthesized subpatterns
should count for the purposes of maximizing match length, and that
there need be no attempt to maximize the length of matches for
unparenthesized subpatterns.
Tom Lord nicely objected to this suggestion in
<http://www.opengroup.org/sophocles/show_mail.tpl?source=L&listname=austin-group-l&id=3715>
His argument, slightly edited, is:
The "parenthesized subpatterns only" interpretation is, in my
experience, a bad idea -- not only is it harder to implement, but
it confuses users when adding apparently innocent parentheses to a
pattern leads to a different matching behavior.
Another argument against the "parenthesized subpatterns only"
interpretation is that the standard does not use the term
"subexpression" in the the crucial sentence that we're trying to
interpret. That sentence is:
Consistent with the whole match being the longest of the leftmost
matches, each subpattern, from left to right, shall match the
longest possible string.
The term "subpattern" appears nowhere else in the standard, but in the
light of the June 1995 minutes a natural interpretation of
"subpattern" is that it is one of the nodes of the RE's parse tree,
and this includes non-parenthesized subpatterns as well as
parenthesized ones.
|