Email List: Xaustin-group-lX
[All Lists]

Re: RE-ASSOC: a question about the associativity of RE concatenation

To: yyyyyy@xxxxxxxxxxx
Subject: Re: RE-ASSOC: a question about the associativity of RE concatenation
From: Tom Lord <yyyy@xxxxxxx>
Date: Fri, 5 Apr 2002 17:22:42 -0800 (PST)
Cc: yyyyyyyyyyyyyy@xxxxxxxxxxxxx, yyyyy@xxxxxxxxxxxxxx, yyyyy@xxxxxxxxxx
References: <200204040900.g34907s13879@sic.twinsun.com> <55CB8F56-48DD-11D6-A96D-0030657603C6@us.ibm.com> <20020405230100.GA6667@rap.rap.dk> <200204052352.PAA17315@morrowfield.home> <200204060028.g360SLq06468@shade.twinsun.com>

   From: Paul Eggert <yyyyyy@xxxxxxxxxxx>
   > From: Tom Lord <yyyy@xxxxxxx>
   > 
   > [(1b)] means that (A).* will produce a different result
   > for some values of A even when the length of the overall match
   > does not change.


   Can you please give an example of what you mean by this?
   The other points in your message are all well taken,
   but unfortunately I don't understand this one.


I was typing too fast: the parentheses are an error.  That should have
been /(A)(B)/ vs. /(A)(B).*/, or /A.*/ vs. /A/.  As in:


        re1: (w|wx)(y|x.*)
        re2: (w|wx)(y|x.*).*

        input: wxyz

Under 1b, because the primary goal, apart from longest-overall, is to
maximize the substring for "(w|wx)" and the substring for
"(w|wx)(y|x.*)" is not considered:

        for re1, \1 == w
        for re2, \2 == wx

Under 1a, because the grammar tells us that "(w|wx)(y|x.*)" is the
leftmost, outermost subexpression, whose length should therefore be
maximized:

        for re1 and re2,        \1 == w

The point being that "intuitive" or "expected result" with the
strength we're using those terms would be (I claim) that appending
".*" to an expression is a noop in the case where the length of the
match is not extended.  After all, if matching is "left-to-right", and
stuff to the left can explain the entire match, then why should extra
stuff to the right ever matter?

-t

<Prev in Thread] Current Thread [Next in Thread>