> Geoff Clare wrote:
> I spotted a couple of minor things in the new text:
> 1. The phrase "back-reference expressions to the contained subexpression"
> seems a bit odd. I'm guessing it was first drafted as "back-references
> to ..." and then "back-references" got changed to "back-reference
> expressions". Maybe "back-reference expressions corresponding to the
> contained subexpression" would work.
Thanks. This was the toughest part to translate (from the regexec()
subexpression[i]/subexpression[j] text.)
> 2. The last example is wrong. The expression "\(ab*\)*\1" does
> match 'ababbab' - it matches the first four characters (i.e. the
> subexpression matches 'ab' and then \1 matches 'ab'). The example
> would work okay with an anchored RE.
I took a short cut (i.e., didn't try it) on the last one and paid for it.
Here is the revised text:
The string matched by a contained subexpression shall be within the
string matched by the containing subexpression. If the containing
subexpression does not match, or if there is no match for the contained
subexpression within the string matched by the containing subexpression
then back-reference expressions corresponding to the contained
subexpression shall not match. When a subexpression matches more than
one string, a back-reference expression corresponding to the
subexpression shall refer to the last matched string. For example, the
expression "^\(.*\)\1$" matches lines consisting of two adjacent
appearances of the same string, the expression "\(a\)*\1" fails to
match 'a', the expression "\(a\(b\)*\)*\2" fails to match 'abab', and
the expression "^\(ab*\)*\1$" matches 'ababbabb' but fails to match
'ababbab'.
Also, I ran ERN-7 by Doug McIlroy and he noted that the sed substitute
command description only specifies what is substituted when a backreference
expression refers to a subexpression that matches:
The characters "\n", where n is a digit, shall be replaced by the
text matched by the corresponding backreference expression.
This should probably be revised to handle cases where the backreference
expression does not match:
The characters "\n", where n is a digit, shall be replaced by the
text matched by the corresponding backreference expression, or by
the empty string if the the corresponding backreference expression
does not match.
This seems to align with sed behavior.
-- Glenn Fowler -- AT&T Labs Research, Florham Park NJ --
|