Email List: Xaustin-group-lX
[All Lists]

Re: RE-ASSOC: a question about the associativity of RE concatenation

To: yyy@xxxxxxxxxxxxxxxx
Subject: Re: RE-ASSOC: a question about the associativity of RE concatenation
From: Tom Lord <yyyy@xxxxxxxxxxx>
Date: Tue, 9 Apr 2002 13:28:57 -0700 (PDT)
Cc: yyyyyyyyyyyyyy@xxxxxxxxxxxxx, yyyyyyyyyyyyyy@xxxxxxxxxxxxx, yyyyyy@xxxxxxxxxxx, yyy@xxxxxxxxxxxxxxxx, yyyyy@xxxxxxxxxx
References: <200204091857.OAA35259@raptor.research.att.com>

      I believe that this issue was resolved in the June 1995 POSIX RE
      experts meeting in Toronto which I chaired.  I have enclose the
      minutes below.

Looking over the minutes, I saw nothing that actually addresses this
issue.  Can you be more specific?  It looks to me like the experts
meeting didn't address the RE-ASSOC issue at all.

You gave two explanations.  The first was:

    Once the longest leftmost match of the complete string is found, 
"weeknights",
    which could be matched by or

        wee knights
        week night s

    The longest leftmost match of the leftmost parenthesised group is
    matched.  This, the result is

        week night s


That could mean either that, for the purpose determining the meaning
of "subpattern" in E.2.8.2., concatenation is considered to be right
associative, or it could mean that "subpattern" always means (only)
parenthesised subexpressions, or it could mean something more
complicated and even less obviously related to the language of the
spec.

For reasons previously mentioned in this thread, I am certain that
"subpattern" is best explained by a _left-associative_ interpretation
of concatenation and, in general, by the grammar.  Thus, the
match should be:

        wee knights

Interestingly, the grammar-oriented interpretation makes the added
language about nested subexpressions redundant, though perhaps
(indirectly) clarifying.

For other reasons previously mentioned in this thread, I think the
right-associative interpretation is undesirable.

The "parenthesised expressions only" interpretation is, in my
experience, a bad idea -- not only is it harder to implement, but it
confuses users when adding apparently innocent parentheses to an
expression leads to a different matching behavior.

Other more complicated interpretations of "subpattern" are
interesting, but I don't think they have anything to do with the spec.

You also said:

        Notice that nested subexpressions are to be considered to the
        right of the expression that it nests.

I'm not sure why you mention that.

The conclusions the committee reached regarding repeated nullable
expressions create some new problems with regard to finding the
longest overall match -- but we can get to those in a chat about
the RE-ITERATE question.

-t

<Prev in Thread] Current Thread [Next in Thread>