Email List: Xaustin-group-lX
[All Lists]

Re: RE-ASSOC: a question about the associativity of RE concatenation

To: "Donn Terry" <yyyyyy@xxxxxxxxxxxxx>
Subject: Re: RE-ASSOC: a question about the associativity of RE concatenation
From: James Youngman <yyy@xxxxxxx>
Date: 07 Apr 2002 03:02:14 +0100
Cc: <yyyyyyyyyyyyyy@xxxxxxxxxxxxx>, <yyyyy@xxxxxxxxxxxxxx>, <yyyyy@xxxxxxxxxx>, <yyyy@xxxxxxx>
References: <FE465D8F724E3F4F811D067203A214AE045CE0F4@red-msg-08.redmond.corp.microsoft.com>
"Donn Terry" <yyyyyy@xxxxxxxxxxxxx> writes:

> The critical piece of information that's missing here is that for
> the most popular implementations (Spencer, "original UNIX", Linux,
> come to mind but might not be the only choices) what actually
> happens today.  (And historically, if things have changed.)
> 
> Given the "existing practice" decisions that were made during the
> last revision (to the extent of backing out text that was in .2-1992
> that was intentionally and with full knowledge not compatible with
> existing practice when it was written), it would be VERY hard to
> argue that anything but existing practice (if there's a consensus
> among the older implementations, anyway) should be accepted.  (I
> suspect that the subtleties of existing practice match the grammars,
> but that's yet to be proven.)


Existing practice is important:

1. because it reduces the maintenance and testing workload for
   implementations and reduces change to "stable" code. 

2. because any given existing practice bears with it, one assumes, a
   community of users who expect that particular behaviour.  

However, our goal - or at least one of them - is application
portability.  For this reason I think that it would be a bad idea to
decide that some regular expressions can match differently on
different implementations.  

That is to say that I would like to be able to write a regexp and be
confident that it will match a particular input (or not) and its
subexpressions will fall in a particular way.  

I would rather not have to write the regexp differently depending on
how the implementation interprets the spec.  After all, if I maintain
a small number of regexps in a program, I don't want to have to track
how the various implementations on which my code is supposed to run
interpret the rules.  That isn't just a one-off excercise because if
the rules allow alternative implementations I would also have to watch
out for implementations which change their interpretations.

On a more personal front, I think there are already too many
interpretations of regexps, and the more we do to align the various
implementations closely, the better.

-- 
James Youngman
Manchester, UK.  +44 161 226 7339
PGP (GPG) key ID for <yyy@xxxxxxx> is 64A95EE5 (F1B83152).

<Prev in Thread] Current Thread [Next in Thread>