Email List: Xaustin-group-lX
[All Lists]

Re: RE-ASSOC: a question about the associativity of RE concatenation

To: yyyyyyyyyyyyyy@xxxxxxxxxxxxx
Subject: Re: RE-ASSOC: a question about the associativity of RE concatenation
From: Paul Eggert <yyyyyy@xxxxxxxxxxx>
Date: Tue, 9 Apr 2002 15:29:40 -0700 (PDT)
Cc: yyyyy@xxxxxxxxxxxxxx, yyyyy@xxxxxxxxxx, yyyy@xxxxxxx
References: <200204091857.OAA35259@raptor.research.att.com>
> Date: Tue, 9 Apr 2002 14:57:22 -0400 (EDT)
> From: David Korn <yyy@xxxxxxxxxxxxxxxx>
>
>       /(week|wee)(night|knights)(s*)/
> Once the longest leftmost match of the complete string is found,
> "weeknights", which could be matched by or
>       wee knights
>       week night s
> The longest leftmost match of the leftmost parenthesised group is
> matched.  This, the result is
>       week night s

This example is consistent with the following interpretations:

 * RE-CONCAT interpretation (1) plus RE-ASSOC interpretation (1b).
 * RE-CONCAT interpretation (2) plus RE-ASSOC interpretation (2).

Since RE-CONCAT(2) is rejected (as described in my previous message),
I guess you must be intending RE-ASSOC(1b); or perhaps you intend some
other interpretation for RE-ASSOC that is consistent with the examples
in the June 1995 minutes.  But, like Tom Lord, I don't see anything in
the June 1995 minutes that addresses this point.

If we interpret our example ERE /(week|wee)(night|knights)(s*)/ using
the terminology of the June 1995 minutes, RE-ASSOC(1b) requires that
/(night|knights)(s*)/ must be deemed to be "to the right of"
/(week|wee)/.  But this is inconsistent with the POSIX grammar for
REs.  The grammar says that the ERE's parse tree looks like this:

                   1
                  / \
                 /   \
                /     \
               2       3
              / \     (s*)
             /   \
            /     \
           /       \
          /         \
         /           \
        /             \
       4               5
   (week|wee)    (night|knights)

and, given this parse tree, Interp #43 Part 14 says that the
left-to-right order of these subpatterns is:

   1, 2, 4, 5, 3

and this is equivalent to RE-ASSOC(1a), not (1b).

Here's another way of putting it.  The POSIX grammar for EREs says
that /(night|knights)(s*)/ is not a subpattern of
/(week|wee)(night|knights)(s*)/, so it doesn't contribute to the ERE's
semantics.

However, I see that Andrew Hume writes that RE-ASSOC(1b) was the
intended meaning.  If so, the simplest fix is to modify the POSIX
grammar so that RE concatenation is right-associative, as suggested at
the end of RE-ASSOC.

<Prev in Thread] Current Thread [Next in Thread>