Email List: Xaustin-group-lX
[All Lists]

Re: RE-ASSOC: a question about the associativity of RE concatenation

To: yyyyyy@xxxxxxxxxxx, yyy@xxxxxxxxxxxxxxxx
Subject: Re: RE-ASSOC: a question about the associativity of RE concatenation
From: Glenn Fowler <yyy@xxxxxxxxxxxxxxxx>
Date: Mon, 29 Apr 2002 23:13:30 -0400 (EDT)
Cc: yyyyyyyyyyyyyy@xxxxxxxxxxxxx, yyyyy@xxxxxxxxxxxxxx, yyyy@xxxxxxxxxxx, yyyyy@xxxxxxxxxx
Organization: AT&T Labs Research
References: <200204091857.OAA35259@raptor.research.att.com> <200204092229.g39MTeX00802@shade.twinsun.com> <200204121008.DAA20963@morrowfield.home> <200204290716.DAA01769@raptor.research.att.com> <200204291846.g3TIkB109921@shade.twinsun.com>
On Mon, 29 Apr 2002 11:46:11 -0700 (PDT) Paul Eggert wrote:
> > Date: Mon, 29 Apr 2002 03:16:47 -0400 (EDT)
> > From: Glenn Fowler <yyy@xxxxxxxxxxxxxxxx>

> > let's run through an example
> > the first part is
> >     [a-z]
> > adding the second part [a-z0-9]* yields 
> >     [a-z][a-z0-9]*
> > quick, how many parts are there? 2

> If there are 2, then we must be talking only about the top-level
> subpatterns, since /[a-z0-9]/ is also an ERE and we didn't count it.

the sarcastic discourse was about "parts", not "subpatterns", to make a point
about how to interpret the "sub" prefix in english

> > now add a third part [:]
> >     [a-z][a-z0-9]*[:]
> > now how many parts are there? 3

> The answer is 2, if we only count the top-level parts /[a-z][a-z0-9]*/

what do you mean by top-level?
can a list have levels?

> and /[:]/ indicated by the ERE grammar.  Or it is 4, if we count all
> the subpatterns, including /[a-z0-9]/.  Or it is 3, if we count the
> top-level parts indicated by an infinite grammar that allows n-ary
> concatenation as a single parse tree node.

if /[a-z0-9]/ is a part then /*/ must also be a part
how does /*/ match?

> > now add parenthesis to the expression to illustrate the rule
> > "match the first part, then the second part, then the third part"

> That rule isn't in the standard; the standard says something quite a
> bit more cryptic than that, something that you've interpreted as a
> rule that corresponds to the infinite-grammar interpretation mentioned
> above.

its an exercise to show how a right-associative reading of the standard
is possible -- the standard and/or implementation is so ingrained in some
reader's minds that I paraphrased to illuminate the possibility that other
interpretations might be valid

> > thus the left-associative "subpattern" list
> >     [a-z][a-z0-9]*
> >     [a-z]
> >     [a-z0-9]*
> >     [:]
> > but this requires that subpatterns overlap;

> But subpatterns overlap under all interpretations.  Under the right
> associative interpretation, /[a-z0-9]*/ and /[a-z0-9]*[:]/ overlap.

subpatterns only overlap when subpatterns are allowed to have subpatterns

a subpattern==ERE_expression is `atomic'; ERE_expressions don't overlap
because the grammar doesn't allow it

> Under all interpretations, /[a-z0-9]*/ and /[a-z0-9]/ overlap.  The
> fact that subpatterns overlap is not an argument against any
> particular interpretation.

again /*/ is not a pattern, so under *some* interpretations

> > this results in the strange situation where you can't reconstruct
> > the original pattern given the (left-associative) "subpattern" list

> I don't follow this point.  The first pattern in the list should
> be the entire pattern, so it's easy to reconstruct the entire pattern:
> just look at the first item in the list.

ok, lame point withdrawn

> > with right associativity there is no recursion unless there are
> > parenthesized parts;

> Only if you mentally flatten out the other instances of recursion
> in the POSIX ERE grammar, namely:
>    ERE_branch : ERE_branch ERE_expression ;
>    extended_reg_exp : extended_reg_exp '|' ERE_branch ;
>    ERE_expression : ERE_expression ERE_dupl_symbol ;
> You're not counting these instances as recursive ones, since you've
> come up with an iterative method to interpret them, one that you think
> is equivalent to the recursively defined grammar.

there is no flattening when subpattern==ERE_expression
separate yourself from the parse tree and try to see that there
could be at least one other way to traverse an extended_reg_exp

1: traverse(extended_reg_exp) {
2:      for each ERE_branch in extended_reg_exp
3:              for each ERE_expression in ERE_branch
4:                      if ERE_expression == '(' extended_reg_exp' ')'
5:                              traverse(extended_reg_exp')
6:}

2 invokes the ERE alternation rule
3 invokes the matching rule
5 invokes the "recursively apply" rationale

> But isn't this begging the question?  These grammar rules are
> recursive: to deduce what they mean, we need to reason recursively.
> We can't reliably transform them into iterative rules until we first
> know what they mean as written.

there are two types of recursion in the grammar:
        left-recursion (alternatively right-recursion)
        self-recursion
left-recursion (and right-recursion) are tools to express list constructs
these ERE grammar productions are left-recursive:
        extended_reg_exp : extended_reg_exp '|' ERE_branch ;
        ERE_branch : ERE_branch ERE_expression ;
extended_reg_exp is a *list* of ERE_branch separated by '|'
ERE_branch is a *list* of ERE_expression separated by 'concatenation'
self-recursion is used to recognize recursive input
the only ERE self-recursion is defined in this production:
        ERE_expression : '(' extended_reg_exp ')' ;
this is as recursive as the grammar can get -- it goes right back to
the start symbol

I didn't make these terms up; check some grammar references

to recap:

EREs have iterative components and recursive components
the iterative components are
        extended_reg_exp via '|'
        ERE_branch via 'concatenation'
the recursive components are
        '(' extended_reg_exp ')'
iteration can be used to analyze the iterative components
recursion can be used to analyze the recursive components

if you start off thinking 'parse tree' then you will bring in another
recursive component to traverse the parse tree, but parse tree recursion
is an artifact of applying grammar productions; it is possible to analyze,
crack, and match EREs without a parse tree; that's what the standard rules do

<Prev in Thread] Current Thread [Next in Thread>