Email List: Xaustin-group-lX
[All Lists]

Re: RE_CONCAT: question about RE concatenation and subpattern matching

To: yyy@xxxxxxxxxxxxxxxx
Subject: Re: RE_CONCAT: question about RE concatenation and subpattern matching
From: Paul Eggert <yyyyyy@xxxxxxxxxxx>
Date: Thu, 25 Apr 2002 16:14:42 -0700 (PDT)
Cc: yyyyyyyyyyyyyy@xxxxxxxxxxxxx, yyyyy@xxxxxxxxxx, yyyyy@xxxxxxxxxxxxxx, yyyy@xxxxxxx, yyyy@xxxxxxxxxxx
References: <200204250622.CAA59455@raptor.research.att.com> <200204250901.g3P91ck02627@sic.twinsun.com> <200204252036.QAA24970@raptor.research.att.com>
> Date: Thu, 25 Apr 2002 16:36:28 -0400 (EDT)
> From: Glenn Fowler <yyy@xxxxxxxxxxxxxxxx>
> 
> RE-ASSOC(1d) accepts either left-recursive or right-recursive
> productions for ERE_branch and defines the subpatterns as the sequence
> of the nonterminal ERE_expression produced by a depth-first left-right
> traversal of the input extended_reg_exp parse tree.  This sequence is
> the same for either left-recursive or right-recursive ERE_branch
> productions.

I see two main problems with RE-ASSOC(1d) as an interpretation of the
standard.  First, it's a special case for ERE_branch that does not
apply to the other instances of recursion in the grammar.  This
special treatment isn't mentioned in the standard, so it's unlikely
that it was intended.

Second, RE-ASSOC(1d) seems to contradict the resolution of
Interpretation #43, Part 14, in the June 1995 POSIX RE experts
meeting.  This resolution says "An enclosed subpattern is deemed to be
to the right of an enclosing pattern" when interpreting the standard's
requirement that "Consistent with the whole match being the longest of
the leftmost branches, each subpattern, from left to right, shall
match the longest possible string."  Applying the left-recursive
productions specified by the POSIX ERE grammar, /(A)(B)/ and /(C)/ are
both deemed to be to the right of /(A)(B)(C)/, and /(A)/ and /(B)/ are
both deemed to be to the right of /(A)(B)/.  This analysis disagrees
with a depth-first left-right traversal of /(A)(B)(C)/, since such a
traversal would deem /(A)(B)/ to be to the right of /(A)/ and /(B)/.

> Date: Thu, 25 Apr 2002 16:36:28 -0400 (EDT)
> From: Glenn Fowler <yyy@xxxxxxxxxxxxxxxx>
> 
> I can reply point by point if you really want,

I think that would be helpful.  It's quite a bit to ask, though, so if
you're pressed for time I personally would find it most helpful if you
would focus on which of the two behaviors is more desirable for users.
RE-ASSOC(1b), (1c), and (1d) all have the same behavior from the
user's point of view, so the only real choice is between that behavior
and RE-ASSOC(1a)'s behavior.

Tom Lord has argued for RE-ASSOC(1a) from the user's viewpoint, and as
far as I can tell nobody has presented countervailing arguments.  He
argues that RE-ASSOC(1a) has some extra algebraic properties that are
useful when writing regular pexressions.  You can see some of his
ideas in:

http://www.opengroup.org/sophocles/show_mail.tpl?source=L&listname=austin-group-l&id=3724
http://www.opengroup.org/sophocles/show_mail.tpl?source=L&listname=austin-group-l&id=3732
http://www.opengroup.org/sophocles/show_mail.tpl?source=L&listname=austin-group-l&id=3747

and some followup messages by others in agreement, in:

http://www.opengroup.org/sophocles/show_mail.tpl?source=L&listname=austin-group-l&id=3749
http://www.opengroup.org/sophocles/show_mail.tpl?source=L&listname=austin-group-l&id=3751

<Prev in Thread] Current Thread [Next in Thread>