I have two messages that, taken together, make the case that left
associativity makes much more sense than right associativity.
One message is an abstract look at regexp algebra. The other is
a concrete and specific example that can plausibly come up in
the real world.
Here is the abstract look:
-t
** Regexp Algebra
Speaking of associativity, let's look at some regexp algebra.
Suppose I have a pattern `<A>' and a pattern `<B>' and a string `S'.
Suppose there is a prefix of `S', call it `Sa', that is the longest
prefix matching `<A>'. The rest of the string, the suffix, we can
call `Sb' and suppose that it matches `<B>'.
We can name the substring positions in these matches. When `<A>'
matches `Sa', we get substring positions `pmatch_A_Sa'. When `<B>'
matches `Sb', we get substring positions `pmatch_B_Sb'.
Suppose that `<A>' has `Na' parenthesized subexpressions, and that `<B>'
has `Nb' parenthesized subexpressions.
Now think of the pattern `(<A>)(<B>)'. In both left and right
associative worlds, we know that that pattern matches `S'. That will give
us `pmatch_AB_S'.
In fact, we know that:
<<<
pmatch_AB_S[0].rm_so == 0
pmatch_AB_S[0].rm_eo == length(S)
for X in 1 .. Na+1
pmatch_AB_S[X] == pmatch_A_Sa[X-1]
for X in Na+2 .. Na+2+Nb
if pmatch_B_Sb[X-(Na+2)] != -1
pmatch_AB_S[X] == pmatch_B_Sb[X-(Na+2)] - length(Sa)
otherwise
pmatch_AB_S[X] == {-1, -1}
>>>
That's very handy and is equally true in left and right associative
worlds.
However, parentheses are an expensive operator and they interfere with
the usefulness of backreferences. Optimizations that can eliminate
parentheses are worth knowing about. Some special cases can be used
to eliminate parentheses!
*** Special Case #1 in the Left Associative World
Suppose that the lowest precendence operator in `<A>' is _not_ `|'.
Then an invariant like the one above (but with slightly different
pmatch index offsets) applies to the pattern `<A>(<B>)'. One
less set of parentheses!
Note that the pattern produced by this special case, `<A>(<B>)' is
itself a pattern in which the lowest precendence operator is not
`|', so we can apply this special case rule to that pattern again,
giving us `<A>(<B>)(<C>)', then `<A>(<B>)(<C>)(<D>)', etc. Very nice.
*** Special Case #1 in the Right Associative World
But, aha! That's nothing special. Suppose that this time the
lowest precendence operator in `<B>' is not `|'. Then in the
right associative world, the similar invariant applies to
`(<A>)<B>'. Again, one less set of parentheses than in the general
rule!
Note that in _this_ special case, we didn't require anything special
of pattern `<A>'. If we have `<C>', `<D>', etc. in which the
lowest precendence operator is not `|', then we can again use this
rule repeatedly to build up `((<A>)<B>)<C>', then
`(((<A>)<B>)<C>)<D>', etc.
Isn't that just as nice? It's a little odd that the reason we can
repeat this rule came out differently -- but let's gloss over that
for the moment.
*** So aren't Left and Right Associativity Simply Symmetric?
So far, it looks like symmetry will kill off any advantage of
left-associativity. But wait -- I'll soon fix that. Let's look at
some special cases of the special cases.
*** Special case #2 in the Left Associative World
As before, the lowest precendence operator in `<A>' is _not_ `|'.
In addition, this time, assume that `<B>' is either a parenthesized
expression, an expression matching a single character, or a
pattern followed by a duplication or option operator (`?, {n,m}, *, +').
Now this time, invariants like the ones given above (again, changing
the `pmatch' offsets slightly) hold for the pattern: `<A><B>'.
Huzzah! No new parens at all. The resulting pattern is ready to be
fed back into this special-special case as the new `<A>', and we
can keep adding simple expressions to build up `<A><B><C>',
`<A><B><C><D>' etc. [I've given a concrete example of how this
is useful in the next message.]
*** Special case #2 in the Right Associative World.
As before, the lowest precendence operator in `<B>' is _not_ `|'.
In addition, this time, assume that `<A>' is either a parenthesized
expression, an expression matching a single character, or a
pattern followed by a duplication or option operator (`?, {n,m}, *, +').
Now this time, invariants like the ones given above (again, changing
the `pmatch' offsets slightly) hold for the pattern: `<A><B>'.
Hrumph! No new parens at all. The resulting pattern is a new
pattern in which the lowest precendence operator is _not_ `|', so
special case #2 applies to the new pattern in the role of `<B>': We
can feed the new pattern back into this special-special case
and keep adding simple expressions to build up `<C><A><B>',
`<D><C><A><B>' etc.
*** Not Quite Symmetric
Opps! The symmetry has clearly broken. Special case #2 gives the
left associative world a simple way to build up parentheses-less
patterns from left to right. In the right associative world, special
case #2 gives us a simple way to build up parentheses-less patterns
from right to left. I think it's obvious that building up regexps
from left to right is the activity we want to support [and the next
message will make this clearer through a specific example].
Really, this same bug was already present in the special case #1 for
the right associative world -- that's what we "glossed over" -- but
special case #2 is what makes the consequences of the bug starkly
apparent.
When we build up complex patterns from simple ones, we are more
likely to be trying to extend a match to the right than to start it
earlier in the string. The left-or-right associativity question
isn't an unimportant question at all: it relates in a signifcant
way to the fact that the left part of a pattern matches the left
part of a string, the right part of a pattern the right part of a
string. While the pattern algebras may be symmetric, the two ends
of the string are "beginning" and "end" which are opposite things,
not equivalent things.
*** Your Left-Leaning Friends are Right
So, left associativity makes it easier for us to build up efficient
regexps by extending existing regexps to the right. In order to
provide a more useful regexp algebra, concatenation is left
associative. If we compare `(wee|week)(night|knights)(s+)' to
`weeknightssss', `\1' is `wee' and `\3' is `sss'.
|