Email List: Xaustin-group-lX
[All Lists]

Re: RE-ASSOC: a question about the associativity of RE concatenation

To: yyyyyyyyyyyyyy@xxxxxxxxxxxxx
Subject: Re: RE-ASSOC: a question about the associativity of RE concatenation
From: Paul Eggert <yyyyyy@xxxxxxxxxxx>
Date: Fri, 5 Apr 2002 16:12:44 -0800 (PST)
Cc: yyyyy@xxxxxxxxxxxxxx, yyyyy@xxxxxxxxxx, yyyy@xxxxxxx
References: <200204040900.g34907s13879@sic.twinsun.com> <55CB8F56-48DD-11D6-A96D-0030657603C6@us.ibm.com> <20020405230100.GA6667@rap.rap.dk>
> Date: Sat, 6 Apr 2002 01:01:00 +0200
> From: <yyyy@xxxxxxxx>
> 
> As far as I can tell, and this is also what the grammar
> indicates, the regexp is done left to right, finding
> for each component the longest match.

Yes, but the question is concerned with what "left to right" means in
the context of the parse trees indicated by the grammar.  The grammar
specifies the following parse tree for the example ERE:

                   1
                  / \
                 /   \
                /     \
               2       3
              / \     (s*)
             /   \
            /     \
           /       \
          /         \
         /           \
        /             \
       4               5
   (week|wee)    (night|knights)

Now clearly the matches for parse tree nodes 4, 5, and 3 must be
maximized in that order; otherwise it would not be a left to right
match.  It is also clear that the match for node 1 must be maximized
before the match for node 4, since POSIX requires that the overall
match must be the longest.  So it's indisputable that we have:

   1 > 4 > 5 > 3

where "X > Y" means "it is more important to maximize the length of
the match for parse tree node X than it is to maximize the length of
the match for parse tree node Y".

The question is: where does 2 belong in the ">" order?

Interpretation (1a) says 1 > 2 > 4 > 5 > 3.

Interpretation (1b) says that the implementation can behave as if the
tree were parsed this way instead:

              1
             / \
            /   \
           /     \
          4       6
     (week|wee)  / \
                /   \
               /     \
              /       \
             /         \
            /           \
           5             3
    (night|knights)     (s*)

and that 1 > 4 > 6 > 5 > 3, which leads to more intuitive behavior.

Another equivalent interpretation, which just came to mind, is that
the implementation can behave as if the tree were parsed this way:

                       1
                      /|\
                     / | \
                    /  |  \
                   /   |   \
                  /    |    \
                 /     |     \
                /      |      \
               /       |       \
              /        |        \
             /         |         \
            /          |          \
           /           |           \
          /            |            \
         4             5             3
    (week|wee)  (night|knights)     (s*)

I suppose we could call this (1c), but it has the same effect as (1b)
so the bottom line would be the same.

These interpretations agree about (1 > 4 > 5 > 3); the only question
is whether it's more important to maximize 2 than it is to maximize 4.

I hope this discussion helps to motivate the suggestion at the end of
RE-ASSOC to make the grammar right-associative instead of
left-associative here.  If the grammar is right-associative, the
question of interpretation does not arise, since the intuitive
behavior agrees with the formal behavior.

<Prev in Thread] Current Thread [Next in Thread>