On Fri, 12 Apr 2002 03:08:30 -0700 (PDT) Tom Lord wrote:
> Left-to-right maximization of subexpressions practically demands left
> associativity.
I found this strange on first reading and then forgot to respond
the post goes on:
> The more I think about it, the less I think this is a purely
> theoretical consideration. When developing a long, complex pattern -
> it's pretty natural for users to test it out in bits in pieces. In
> other words, people will in fact write /A/, then write /(B)/, and then
> expect not to be surprised when forming /A(B)/. Right-associativity
> will surprise them.
ok, so patterns can be built up one part at a time
let's run through an example
the first part is
[a-z]
adding the second part [a-z0-9]* yields
[a-z][a-z0-9]*
quick, how many parts are there? 2
now add a third part [:]
[a-z][a-z0-9]*[:]
now how many parts are there? 3
now add parenthesis to the expression to illustrate the rule
"match the first part, then the second part, then the third part"
[a-z]([a-z0-9]*([:]))
what do we call the associativity of the parenthesized expression? right
how did the rule mention the parts? in order: one, two, three
in what direction? from left to right
what have we learned?
"left-to-right matching of parts is right associative"
what are the parts a part of? a pattern
is there a common English term for this? subpattern
now someone please explain how, in the absense of a definition
for subpattern, a reading of the matching rule
Consistent with the whole match being the longest of the
leftmost matches, each subpattern, from left to right, shall
match the longest possible string.
can result in left associativity for subpattern matching?
sarcasm off now, apologies to the innocent reader
its clear now why proponents of left associativity think this iterative
part of the matching rule is recursive; the forced left-associative
grouping for the example pattern illustrates this:
([a-z]([a-z0-9]*))[:]
the matching rule must matche part one and part two as a group, then must
go inside the group and match part one and part two separately (recursion),
then back out to part three; thus the left-associative "subpattern" list
[a-z][a-z0-9]*
[a-z]
[a-z0-9]*
[:]
but this requires that subpatterns overlap; this results in the strange
situation where you can't reconstruct the original pattern given the
(left-associative) "subpattern" list
with right associativity there is no recursion unless there are
parenthesized parts; these are matched by recursive application
of the matching rule to the stuff inside the parenthesis, as
mentioned in the rationale
-- Glenn Fowler <yyy@xxxxxxxxxxxxxxxx> AT&T Labs Research, Florham Park NJ --
|