> Date: Fri, 12 Apr 2002 17:28:47 +0900
> From: Isamu Hasegawa <yyyyy@xxxxxxxxxxxxxx>
>
> You mean the syntax of (1a), (1b), and (1c) are different each other?
Yes, because they generate different sets of parse trees.
(1a) says /(A)(B)(C)/ is parsed as if it were /((A)(B))(C)/, aside
from the issues of parenthesis renumbering and the redundant
parse tree node introduced by the redundant parenthesization.
(1b) says it is parsed as if it were /(A)((B)(C))/.
(1c) says that those three REs each have distinct parse trees.
For POSIX REs, (1b) and (1c) have equivalent effect. Since there is
no way for portable user code to distinguish between (1b) and (1c), an
implementation could substitute (1b) for (1c) or vice versa without a
conformance issue arising. However, we have identified a way for
portable user code to distinguish between (1a) and the the other two
interpretations, so there is a conformance issue in making the
distinction between (1a) and the others.
For classical REs, (1a), (1b) and (1c) are all equivalent. I briefly
looked at the theoretical computer science literature and it appears
to me that (1a) is nearly universal, but that result has limited
significance since there's no difference between between the three
interpretations for classical REs.
I should mention that (1c) cannot be expressed by any context-free
grammar. To express it you would need a more powerful formalism, for
example a van Wijngaarden grammar. So I don't think we should propose
(1c) for any rewording of the standard. I brought it up only because
it seemed to be related to Andrew Hume's intuitive understanding of
how RE matching was supposed to work.
|