Email List: Xaustin-group-lX
[All Lists]

Re: RE-ASSOC: a question about the associativity of RE concatenation

To: yyyyyy@xxxxxxxxxxx
Subject: Re: RE-ASSOC: a question about the associativity of RE concatenation
From: Tom Lord <yyyy@xxxxxxxxxxx>
Date: Tue, 9 Apr 2002 23:17:53 -0700 (PDT)
Cc: yyyyyyyyyyyyyy@xxxxxxxxxxxxx, yyyyy@xxxxxxxxxxxxxx, yyyyy@xxxxxxxxxx
References: <200204091857.OAA35259@raptor.research.att.com> <200204092229.g39MTeX00802@shade.twinsun.com> <200204092254.PAA02436@morrowfield.home> <200204092340.g39Neua00876@shade.twinsun.com>
        >       3. It has the property that additional subexpressions
        >          to the right, which don't extend the match, have
        >          no effect on what is matched by the subexpressions
        >          to the left.

        Can you please elaborate on this a bit?  It seems to me
        RE-ASSOC(1a) and RE-ASSOC(1b) both have this nice property, so
        it doesn't argue for one interpretation over the other.


Ok, let's look at this abstractly, then specifically.

E.2.8.2 line 2797 in IEEE 1003.2-1992 says:

        each subpattern, from left to right, shall match the longest
        possible string.

Abstractly, suppose I have two patterns: /A/ and /(B)/.  Presume that
/(B)/ can match the null string.  Assume we have a string, S.  The
pattern /A/ matches all of S.  The pattern /A(B)/ therefore also
matches all of S.  I interpret E.2.8.2 as saying that the
subexpression positions within /A/ should be the same for both matches
-- because /A/ is clearly the left subpattern, and because /(B)/ can
match the null string.

My interpretation implies that concatenation is, as the grammar
suggests, left associative.  Here's a specific illustration:

Consider the expression

        (wee|week)(night|knights)

That matches

        weeknightsxyzzy

as

        wee knights

with leftover

        xyzzy

Now what if we add (s?)

        (wee|week)(night|knights)(s?)

Under a left associative interpretation (1a), we still get

        wee knights

(with the same leftover) because the length of what is matched by
subexpression group

        (wee|week)(night|knights)

is maximized, because the expressions are grouped

        (wee|week)(night|knights)       (s?)

The additional subexpression on the right, which doesn't extend the
match, makes no difference.

Under a right associative interpretation (1b), we get

        week night s

because the expressions are grouped

        (wee|week)      (night|knights)(s?)

That's part of why left-associativity seems natural to me in the
context of "maximized, left-to-right"

Abstractly again, we had /A/ and /(B)/ -- suppose that /A/ is actually
/LM/, the concatenation of two subexpressions.  Does that mean that
/A/ is not the left subpattern in /A(B)/?  I don't think so, since I
think concatenation is left-associative.  The flip-side, though, is
that /A/, if it is /LM/, is _not_ the *right* subpattern of /(B)A/.
Nevertheless, given the choice of which "algebraic" property is
better, choosing left associativity is certainly more consistent with
the grammar as written, and, in my opinion, more aesthetically
consistent with the left-to-right bias of a "leftmost-longest" rule.

-t
yyyy@xxxxxxxxxxx, yyyy@xxxxxxx
---

        How valuable is my contribution? Share your feedback at
        Affero:
        http://svcs.affero.net/rm.php?r=tomlord

<Prev in Thread] Current Thread [Next in Thread>