> Date: Mon, 29 Apr 2002 03:16:47 -0400 (EDT)
> From: Glenn Fowler <yyy@xxxxxxxxxxxxxxxx>
> let's run through an example
> the first part is
> [a-z]
> adding the second part [a-z0-9]* yields
> [a-z][a-z0-9]*
> quick, how many parts are there? 2
If there are 2, then we must be talking only about the top-level
subpatterns, since /[a-z0-9]/ is also an ERE and we didn't count it.
> now add a third part [:]
> [a-z][a-z0-9]*[:]
> now how many parts are there? 3
The answer is 2, if we only count the top-level parts /[a-z][a-z0-9]*/
and /[:]/ indicated by the ERE grammar. Or it is 4, if we count all
the subpatterns, including /[a-z0-9]/. Or it is 3, if we count the
top-level parts indicated by an infinite grammar that allows n-ary
concatenation as a single parse tree node.
> now add parenthesis to the expression to illustrate the rule
> "match the first part, then the second part, then the third part"
That rule isn't in the standard; the standard says something quite a
bit more cryptic than that, something that you've interpreted as a
rule that corresponds to the infinite-grammar interpretation mentioned
above.
> thus the left-associative "subpattern" list
> [a-z][a-z0-9]*
> [a-z]
> [a-z0-9]*
> [:]
> but this requires that subpatterns overlap;
But subpatterns overlap under all interpretations. Under the right
associative interpretation, /[a-z0-9]*/ and /[a-z0-9]*[:]/ overlap.
Under all interpretations, /[a-z0-9]*/ and /[a-z0-9]/ overlap. The
fact that subpatterns overlap is not an argument against any
particular interpretation.
> this results in the strange situation where you can't reconstruct
> the original pattern given the (left-associative) "subpattern" list
I don't follow this point. The first pattern in the list should
be the entire pattern, so it's easy to reconstruct the entire pattern:
just look at the first item in the list.
And even if we ignored the first pattern in the list, if we know that
the top level is a concatenation we could still deduce it from the
rest of the list, under any of the interpretations: you simply parse
the second pattern in the list, see where its subtree ends, and then
concatenate it to the result of analyzing the rest of the list. This
is true for both left- and right-associative interpretations, and even
for the infinite-grammar interpretation, so I don't see why it's an
argument for one interpretation over the other.
> with right associativity there is no recursion unless there are
> parenthesized parts;
Only if you mentally flatten out the other instances of recursion
in the POSIX ERE grammar, namely:
ERE_branch : ERE_branch ERE_expression ;
extended_reg_exp : extended_reg_exp '|' ERE_branch ;
ERE_expression : ERE_expression ERE_dupl_symbol ;
You're not counting these instances as recursive ones, since you've
come up with an iterative method to interpret them, one that you think
is equivalent to the recursively defined grammar.
But isn't this begging the question? These grammar rules are
recursive: to deduce what they mean, we need to reason recursively.
We can't reliably transform them into iterative rules until we first
know what they mean as written.
|