Email List: Xaustin-regexp-lX
[All Lists]

Comments on reg exp standard

To: yyyyyyyyyyyyyyy@xxxxxxxxxxxxx
Subject: Comments on reg exp standard
From: "Jon Hitchcock" <yyyyyyyyyyyy@xxxxxxxxxxx>
Date: Tue, 28 May 2002 15:51:23 +0100
First, I will explain that I am not an expert on regular expressions.
While reviewing the proposed changes (ERNs 17, 18, 19), there were some
places where I found it hard to understand the existing standard. I
think the standard should aim to be clear, not just to regular
expression experts, but also to people like me.

Lines numbers below refer to the published Base Definitions volume.
Some of my comments relate to ERN 17, and I assume that the proposed
definition of "subexpression" in ERN 18 is accepted, so that the term
is defined for both BREs and EREs.

------------------------------------------------------------------------

Line 5908: Is it necessary to say "For this purpose, a null string
shall be considered to be longer than no match at all"? The paragraph
is about choosing one of a number of matches, so something which does
not match is irrelevant. If some subtle point is being made, it would
help to have an example giving the different results with and without
this rule.

Line 5908: ERN 17 suggests adding "An enclosed subpattern is deemed to
be to the right of an enclosing pattern." To me, this seems contrived,
and it would be more natural to have a recursive description saying
that, where subexpressions are nested, the rule is applied first to the
whole expression and then to the subexpressions. An alternative view
is that the proposed wording is consise and precise, and that an
explanation can be put in the rationale.

Line 6095: ERN 17 suggests changing "whatever" to "any string". Does
this make a difference? What else does a subexpression match apart from
a string. If some subtle point is being made, an example would help.

Lines 6105-6109: The two sentences ["When the referenced subexpression
matched more than one string, the back-referenced expression shall refer
to the last matched string. If the subexpression referenced by the
back-reference matches more than one string because of an asterisk
(’*’) or an interval expression (see item (5)), the back-reference
shall match the last (rightmost) of these strings."] seem to say the
same thing in different words. If this is so, I suggest removing the
first sentence to avoid any doubt that its meaning is subtly different.

Line 6137: The same precedence rules apply when subexpressions and
back-references are duplicated. I suggest changing
"Single-character-BRE duplication" to just "Duplication".

Line 6162: I am sure I am not the only person who has been misled by
the word "Extended" in ERE. I suggest adding:

Note: The specification for EREs is not purely an extension of
that for BREs. Back-references are not available, and the
notation for subexpressions and interval expressions is different.

Line 6209: As for line 6095 ("whatever").

Line 6253: The term "Grouping" is used nowhere else. I suggest
changing it to "Subexpressions".

Line 6254: The same precedence rules apply when EREs enclosed in
parentheses are duplicated. I suggest changing "Single-character-ERE
duplication" to just "Duplication".


_________________________________________________________________
Join the world’s largest e-mail service with MSN Hotmail. http://www.hotmail.com

<Prev in Thread] Current Thread [Next in Thread>