Email List: Xaustin-group-lX
[All Lists]

Re: RE-ASSOC: a question about the associativity of RE concatenation

To: yyyyyy@xxxxxxxxxxx
Subject: Re: RE-ASSOC: a question about the associativity of RE concatenation
From: Tom Lord <yyyy@xxxxxxxxxxx>
Date: Fri, 12 Apr 2002 03:20:47 -0700 (PDT)
Cc: yyyyyyyyyyyyyy@xxxxxxxxxxxxx, yyyyyyyyyyyyyy@xxxxxxxxxxxxx, yyyyy@xxxxxxxxxxxxxx, yyyyy@xxxxxxxxxx
References: <200204091857.OAA35259@raptor.research.att.com> <200204092229.g39MTeX00802@shade.twinsun.com> <200204092254.PAA02436@morrowfield.home> <200204092340.g39Neua00876@shade.twinsun.com>

       eggert:

       > From: Tom Lord <yyyy@xxxxxxxxxxx>
       >
       > The conclusions the committee reached regarding repeated nullable
       > expressions create some new problems with regard to finding the
       > longest overall match -- but we can get to those in a chat about the
       > RE-ITERATE question.
       
       Can you please elaborate on this a bit?  I don't get the
       connection between RE-ITERATE and the June 1995 minutes.  I
       assume you're talking about Interp #43, Part 12 "Can a
       duplicated subexpression match the null string", but
       unfortunately I don't understand all the consequences of that
       part of the minutes.


The minutes say:

        Can a duplicated subexpression match the null string?  If so,
        will the duplication be repeated until the expression does
        match the null string?  There was a consensus that applying a
        duplicator to a RE that could match the empty string should be
        unspecified.

That's just silly.  A repeated nullable expression should match the
longest possible string that is consistent with the overall match.  In
the case of a subsequent backreference to that nullable expression,
the empty match may be required to find the longest overall match,
even if the empty match is preceeded by 1 or more non-empty matches of
the repeated expression.

Example:

        RE: (x?)*y\1z?
        input: xxxxyz

Look at it as Hume suggested: the rules for building up expressions
are a set construction.  For each expression, you get a possibly
infinite set of strings.  The Kleene closure (* operator) of a set
that includes the empty string is perfectly well defined -- it
includes the empty string.  Now generalize the Kleene closure to
handle backreferences.  Unless you take a very awkward approach,
you'll permit extending a non-null repetition of a nullable expression
with the null string -- and then use that null string in subsequent
backreferences.  So of course a repeated parenthesized nullable
expression can wind up mapping to an empty substring.  It isn't
undefined at all.

-t
---
        How valuable is my contribution? Share your feedback at
        Affero:
        http://svcs.affero.net/rm.php?r=tomlord

<Prev in Thread] Current Thread [Next in Thread>