This from Andrew Hume (yyyyyy@xxxxxxxxxx):
----- Original Message -----
From: Andrew Hume <yyyyyy@xxxxxxxxxxxxxxxx>
To: Nick Stoughton <yyyy@xxxxxxxxxx>
Cc: <yyyyyy@xxxxxxxxxx>
Sent: Monday, April 08, 2002 8:39 PM
Subject: Re: Fw: RE_CONCAT: question about RE concatenation and subpattern
matching
>----- Original Message -----
>From: "Mark Brown" <yyyyy@xxxxxxxxxx>
>To: "Austin Group" <yyyyyyyyyyyyyy@xxxxxxxxxxxxx>
>Cc: "Isamu Hasegawa" <yyyyy@xxxxxxxxxxxxxx>; <yyyy@xxxxxxx>; "Paul Eggert"
><yyyyyy@xxxxxxxxxxx>
>Sent: Friday, April 05, 2002 12:23 PM
>Subject: RE_CONCAT: question about RE concatenation and subpattern matching
>
>
>>RE-CONCAT: a question about RE concatenation and subpattern matching
>>
>>[Some of the people in the GNU projects (Paul Eggert, Tom Lord,
>>Isamu Hasegawa, myself) were considering consolidating the various
>>regex implementations in GNU, and came up with a series of questions
>>concerning the Specification. This is one of those questions.]
>>
>>When the regular expression /XY/ matches an entire string S, must the
>>subpattern /X/ match the longest prefix of S that is consistent with
>>the overall match?
>>
>>This is a question of interpretation about the following sentence in
>>the POSIX standard:
>>
>> Consistent with the whole match being the longest of the leftmost
>> matches, each subpattern, from left to right, shall match the
>> longest possible string.
>>
>>
>>
><http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap09.html#tag
_
>09_01_02>
>
>>For reference, the POSIX rationale goes on to say:
>>
>> It is possible to determine what strings correspond to
>> subexpressions by recursively applying the leftmost longest rule to
>> each subexpression, but only with the proviso that the overall
>> match is leftmost longest.
>>
>>
>>
><http://www.opengroup.org/onlinepubs/007904975/xrat/xbd_chap09.html#tag_01_
0
>9_01>
>
>>Here is an example to illustrate the question. Suppose we match the
>>ERE /((week|wee)(night|knights))(s*)/ to the entire string
>>"weeknights", and want to know which subpatterns match which
>>substrings. Here are some possible interpretations:
>>
>> (1) The "subpatterns" are the two immediate subexpressions. The
>> longest consistent match for /((week|wee)(night|knights))/ is
>> "weeknights". Therefore, /(s*)/ matches the empty string.
>>
>> (2) The "subpatterns" are the (not necessarily immediate)
>> subexpressions, regardless of whether they contribute directly to
>> the concatenation. One subexpression is considered to be left of
>> another subexpression if it occurs before the other in the
>> postorder traversal of the ERE's parse tree, when the ERE is
>> parsed according to the POSIX grammar for EREs. In this
>> left-to-right order, the subexpressions are:
>>
>> /week/
>> /wee/
>> /week|wee/
>> /(week|wee)/
>> /night/
>> /knights/
>> /night|knights/
>> /(night|knights)/
>> /(week|wee)(night|knights)/
>> /((week|wee)(night|knights))/
>> /s/
>> /s*/
>> /(s*)/
>> /((week|wee)(night|knights))(s*)/
>>
>> (There are actually more subexpressions -- for example, /night/
>> has eight subexpressions /n/, /i/, /ni/, /g/, /nig/, /h/, /nigh/,
>> /t/ -- -- but these extra subexpressions do not contribute to the
>> analysis and are omitted for brevity.)
>>
>> We apply the longest-match rule to each of these subexpressions
>> in order. The longest consistent match for /week/ is "week", so
>> /week|wee/ and /(week|wee)/ match "week". The longest
>> consistent match for /night/ against the trailing suffix "nights"
>> is "night", so /night|knights/ and /(night|knights)/
>> both match "night". Hence /(week|wee)(night|knights)/ and
>> /((week|wee)(night|knights))/ both match "weeknight". Finally,
>> /s/, /s*/, and /(s*)/ match "s".
>>
the above analysis is simply wrong. a recursive analysis is
/((week|wee)|(night|knights))(s*)/ matches weeknights.
the left subexpression ((we ... ts)) can match weeknights, so it must
(longest match),
and thus (s*) is the empty string.
recursing, the longest match for (week|wee) satisfying overall match is wee,
which yields (night|knights) matching knights.
>>
>>
>> (3) The standard is ambiguous, and both the above interpretations are
>> valid. Perhaps other interpretations are valid as well. A
>> conforming implementation might even use different
>> interpretations at different times.
>>
>>Here are some arguments for and against these interpretations. These
>>lengthy arguments are briefly summarized at the end of this section.
>>
>> pro (1):
>>
>> (1) is simple and straightforward.
>>
>> con (2):
>>
>> (2) is counterintuitive, because it causes subexpressions to not
>> find a longest consistent match, despite the longest-match rule.
>> For example, under (2), when matching /(.|..).*/ against "ab",
>> most people would expect /(.|..)/ to match "ab" because it is a
>> longer match, but (2) requires it to match the shorter substring
>> "a". Conversely, most people would expect /(.|..)/ and /(.{1,2})/
>> to be equivalent, but (2) says that when matching /(.{1,2}).*/
>> against "ab", /(.{1,2})/ matches "ab" instead of the "a" that
>> /(.|..)/ would match in its place.
>>
>> Here is another example. (2) causes the interpretation of a
>> regular expression X to be changed merely because X is followed by
>> /.*/. This contradicts the left-to-right nature of the subpattern
>> rule. For example, under (2), when matching /(.|..)/ against
>> "ab", /(.|..)/ matches the whole string; but when matching
>> /(.|..).*/ against "ab", /(.|..)/ matches only "a" even though it
>> would be consistent for it to match the same "ab" as before.
>>
>> In general, submatching must be context-dependent, but it is
>> counterintuitive for /.*/ to provide context from the right in a
>> matcher that is supposed to be left-to-right. This
>> counterintuitive behavior does not occur with (1), nor does it
>> occur with a purely greedy NFA-style matcher that does not conform
>> to POSIX because it does not always find the longest match; but it
>> does occur with (2).
>>
>> A more general statement of this point is that '|' is not
>> commutative. For example, /(week|wee)(night|knights)(s*)/ differs
>> from /(wee|week)(night|knights)(s*)/ when matching against
>> "weeknights". The former pattern matches /(s*)/ to "s", whereas
>> the latter one matches /(s*)/ to the empty string. This surprises
>> some users and is an annoyance for programs that construct regular
>> expressions.
>>
>> pro (2):
>>
>> Many people want the left subexpression of '|' to have priority.
>>
>> con (2):
>>
>> POSIX explicitly rejected the expectation that the left
>> subexpression of '|' should have priority, by ruling that it's
>> more important to have the longest match than to have the left
>> subexpression match. The left subexpression does not have
>> priority in /(.|..)/, for example.
>>
>> pro (2):
>>
>> If you want the longest match, then in almost all cases you can
>> reorder the alternatives of '|' so that the leftmost subexpression
>> matches the longest substring. With (2) you have the option of
>> preferring a shorter match for a '|' subexpression, if that is
>> what you want. (1) does not give you that option.
>>
>> con (2):
>>
>> But under (2) you can't give an alternative priority at the top
>> level, unless the alternatives happen to be the same length.
>>
>> It is odd for (2) to have one rule at the top level, but a
>> different rule for subexpressions. It would be more consistent if
>> the same rule applied at all levels. It is particularly odd that
>> the standard doesn't mention this discrepancy; it indicates that
>> the discrepancy was not intended.
>>
>> con (2):
>>
>> It's implausible that (2) was intended, given that the standard
>> does not mention issues like postorder traversal which are
>> inherent to (2). Had the POSIX authors intended the
>> counterintuitive behaviors inherent to (2), they would have
>> mentioned them.
>>
>> pro (2):
>>
>> In effect, (1) specifies a preorder traversal, and (2) specifies a
>> postorder traversal. The standard does not specify either
>> preorder or postorder, so neither is more plausible than the other
>> and postorder should be allowed.
>>
bollocks. the standard talks about a simple rule, not about apparently
equivalent
questions about how parsers work.
>>
>>
>> con (2):
>>
>> The POSIX authors were clearly thinking about DFA-based
>> implementations. This is why
>>
>>
><http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap09.html#tag
_
>09_01_02>
>
>> gives the example where /(wee|week)(knights|night)/ matches all of
>> "weeknights". Here, the authors are making the point that a
>> DFA-based matcher must be run while a further match is possible,
>> and that one cannot stop the DFA on the first match.
>>
>> Had the POSIX authors wanted to allow (2), they would have been
>> inclined to use an example that distinguishes between a greedy
>> NFA-style matcher and (2), because a greedy NFA-style matcher is
>> incorrect but is superficially similar to (2). For example, they
>> could have used the example of matching
>> /(week|wee)(night|knights)/ against "weeknights", where a greedy
>> NFA-style matcher will match only "weeknight" instead of the
>> "weeknights" required by (2).
>>
>> The fact that the POSIX authors did not give such an example
>> indicates that they were not considering (2) as a possible
>> interpretation.
>>
i was one of those closely involved with the original standard and this
is not true.
in fact, you cannot know where subexpressions span with a pure DFA. we
were a
diverse bunch of implementations even at that time (DFA, NFA, hybrid
schemes)
and there was a plain sense that we should define meaning independent of
what
any implementation technique gave you easily or for free.
>>
>>
>> pro (1):
>>
>> (1) takes less space to implement than (2) does. If P is the
>> pattern length and S is the string length, (1) requires only O(P)
>> space, but (2) requires O(S) space and the constant multiple of S
>> is relatively large in practice. (2) needs this space to record
>> the states of the DFA match, so that it can then run an NFA
>> forward across the same string and use the results of the DFA
>> match as an oracle at each nondeterministic branch. (2) can be
>> done in less than O(S) space, but only with an exponential time
>> overhead that would be unacceptable in practice.
>>
>> pro (2):
>>
>> There is little difference in performance in practice if you
>> optimize (2) carefully enough, because the worst cases occur only
>> in pathological examples.
>>
>> con (2):
>>
>> Optimizing (2) is tricky, and this trickiness is a maintenance
>> burden that it would be better to avoid.
>>
>> pro (2):
>>
>> Many existing POSIX matchers seem to use (2), including Solaris 8
>> and the GNU C library 2.2.
>>
>> con (2):
>>
>> However, these implementations predate POSIX.2-1992, and though
>> they were modified to conform to POSIX, quite possibly this
>> particular issue hasn't been tested in the available test suites,
>> or perhaps the test suites themselves are not correct. Since
>> POSIX.2-1992 explicitly rejected the common existing practice of
>> greedy NFA-style matching, one shouldn't rely too heavily on the
>> perhaps-unmodified continuation of that existing practice.
>>
i think this is true. doug mcilroy (retired, but at dartmouth) wrote a
posix RE parser
from scratch from teh spec and i think had some nice test cases.
>>
>>
>> con (2):
>>
>> The BSD matcher uses (2), but it is not a POSIX matcher because it
>> does not consistently find the longest overall match. Under BSD,
>> matching the BRE /\(\(abc\)\{0,2\}\(abc\)\{0,2\}\)\3/ against the
>> string "abcabc" incorrectly yields only "abc" in both FreeBSD 3.0
>> and OpenBSD 3.0. This suggests that (2) is correlated with the
>> pre-POSIX practice that POSIX.2-1992 rejected.
>>
>> pro (1):
>>
>> Some POSIX matchers use (1). These include the Rx matcher and the
>> Spencer-derived matchers used by Tcl and others.
>>
>> pro (3):
>>
>> Insisting on either (1) or (2) would invalidate some existing
>> implementations.
>>
>> con (3):
>>
>> Insisting on either (1) or (2) would invalidate few if any
>> existing applications, and applications are more important than
>> implementations.
>>
>> The POSIX authors explicitly chose DFA-style semantics for REs
>> because they are "easier to define and describe" than other
>> semantics. They also wrote:
>>
>> It is thought that dependencies on the choice of rule are rare;
>> carefully contrived examples are needed to demonstrate the
>> difference.
>>
>> (Both these quotes are taken from
>>
>>
><http://www.opengroup.org/onlinepubs/007904975/xrat/xbd_chap09.html#tag_01_
0
>9_02>
>
>>.)
>>
>> This suggests that the POSIX authors desired a simple,
>> deterministic semantics for regular expressions, and thought that
>> most users wouldn't care so much which semantics were chosen, so
>> long as the behavior was standardized. It is therefore unlikely
>> that the POSIX authors intended the complicated, ambiguous
>> semantics that would result if multiple interpretations were
>> valid.
>>
we certainly did not intend ambiguous or complicated semantics.
that is why we chose the simple rule.
>>
>>
>>The bottom line of this discussion:
>>
>> (1) is simpler and more intuitive, and it better fits the intent of
>> POSIX's overall longest-match rule. (2) is more popular in
>> practice. The differences between (1) and (2) rarely arise in
>> practice, so it wouldn't affect applications much if either
>> interpretation were chosen.
>>
i concur. but surely better test suites would help.
>>
>>
>>
>>---
>>Mark S. Brown
>>Senior Technical Staff Member, IBM Server Group
>>512.838.3926 fax 512.838.3882
>>yyyyy@xxxxxxxxxx
>>
>
>RE-ASSOC: a question about the associativity of RE concatenation
>
>[Some of the people in the GNU projects (Paul Eggert, Tom Lord,
>Isamu Hasegawa, myself) were considering consolidating the various
>regex implementations in GNU, and came up with a series of questions
>concerning the Specification. This is one of those questions.]
>
>Must pattern matching strictly follow the left-associative POSIX
>grammar for regular expressions, or is pattern matching allowed to
>behave as if the grammar had been right-associative?
>
>This question follows up the earlier question RE-CONCAT; please see
>the discussion of the earlier question for much of the background
>behind this question.
>
>The POSIX grammar for EREs contains this production:
>
> ERE_branch : ERE_expression
> | ERE_branch ERE_expression
>
>
><http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap09.html#tag
_
>09_05_03>
>
>This grammar is left-associative, which means that the ERE /ABC/ is
>parsed as if it were /(AB)C/, aside from parenthesis numbering.
>
>In the REs of classical formal language theory developed by Stephen
>Kleene and others, the concatenation of REs is associative, in the
>sense that the REs /(AB)C/ and /A(BC)/ describe the same regular sets.
>But POSIX REs might have different possible interpretations depending
>on how one associates subpatterns to matched strings.
>
>For example, suppose we match the ERE /(week|wee)(night|knights)(s*)/
>to the string "weeknights", and want to know which subpatterns match
>which substrings. Here are some possible interpretations:
>
> (1a) The "subpatterns" are the two immediate subexpressions,
> interpreted according to the grammar, namely
> /(week|wee)(night|knights)/ and /(s*). The longest consistent
> match for /(week|wee)(night|knights)/ is "weeknights".
> Therefore, /(s*)/ matches the empty string.
>
> (1b) Even though the grammar is left-associative, the matching
> semantics are right-associative. The "subpatterns" are the two
> subexpressions /(week|wee)/ and /(night|knights)(s*)/. The
> longest consistent match for /(week|wee)/ is "week". We then
> apply the rule recursively, and find that the longest consistent
> match for /(night|knights)/ against "nights" is "night".
> Therefore, /(s*)/ match "s".
>
> (2) The "subpatterns" are all the subexpressions, not just the
> concatenated subexpressions (see interpretation (2) of question
> RE-CONCAT), so the matching semantics are inherently
> right-associative. This returns the same match as (1b), but for
> the slightly different ERE /(wee|week)(night|knights)(s*)/, (1b)
> still says /(s*)/ matches "s" but (2) says /(s*)/ matches the
> empty string.
>
> (3) The standard is ambiguous, and some or all the above
> interpretations are valid. Perhaps other interpretations are
> valid as well. A conforming implementation might even use
> different interpretations at different times.
>
>(1a) and (1b) are both consistent with interpretation (1) of question
>RE-CONCAT. Note that a user can override either (1a) or (1b) by using
>explicit parenthesization. (2) is consistent with RE-CONCAT
>interpretations (2); under (2), explicit parenthesization is
>irrelevant.
>
>Here are some arguments for and against these interpretations. These
>lengthy arguments are briefly summarized at the end of this section.
>
> pro (1a):
>
> The most natural interpretation of the standard and rationale is
> that the implementation must recurse through the regular
> expression's parse tree, as indicated by the grammar.
>
> The ERE must be parsed strictly according to the grammar, because
> the standard says that the ERE "grammar takes precedence
>over the text".
>
><http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap09.html#tag
_
>09_05>
>
> con (1a):
>
> (1a) is counterintuitive, because it says that when matching
> /XYZ/, it is more important to maximize the length of the match
> for /XY/ than it is to maximize the length of the match for /X/.
> This contradicts the left-to-right nature of the subpattern rule.
>
> Had the POSIX authors intended (1a), they would have written the
> grammar to be right-associative, to avoid this counterintuitive
> behavior of (1a).
>
>
>
> pro (1a):
>
> Had the POSIX authors intended that it be more important to
> maximize the length of /X/ than to maximize the length of /XY/
> when matching /XYZ/, they could easily done so by using a
> right-associative grammar for ERE_branch. However, they used a
> left-associative grammar, indicating that (1a) was intended over
> (1b).
>
> This is reminiscent of the situation in the C programming
> language. Classical addition is associative, but Standard C
> requires that X+Y+Z must be evaluated according to C's
> left-associative grammar, and must therefore be evaluated as
> (X+Y)+Z and not as X+(Y+Z). The distinction matters, for example,
> if floating-point arithmetic is used, due to rounding errors.
> Similarly, classical RE concatenation is associative, but POSIX RE
> concatenation is not, due to the side effects of subpattern
> matching, so it is important to follow the associativity rules
> specified by the grammar.
>
> con (1a):
>
> The C Standard explicitly says that X+Y+Z must be evaluated as
> (X+Y)+Z, whereas the POSIX Standard does not explicitly say how to
> follow the RE grammar when determining matched subpatterns.
> Admittedly the X+Y+Z statement in the C Standard is a
> non-normative footnote, but the POSIX authors could also have
> added a footnote if they had wanted to make it clear that (1a) was
> intended.
>
> pro (1b) and (2):
>
> The semantics should not be contorted by the particular method used
> to define the syntax. The standard says:
>
> Portions of this volume of IEEE Std 1003.1-2001 are
>expressed in terms
> of a special grammar notation. It is used to portray the complex
> syntax of certain program input. The grammar is based on
>the syntax
> used by the yacc utility. However, it does not represent fully
> functional yacc input, suitable for program use; the lexical
> processing and all semantic requirements are described
>only in textual
> form. The grammar is not based on source used in any traditional
> implementation and has not been tested with the semantic code that
> would normally be required to accompany it.
>
><http://www.opengroup.org/onlinepubs/007904975/utilities/xcu_chap01.html#ta
g
>_01_10>
>
> and this indicates that the grammar should not be taken overly
> seriously when interpreting the semantics.
>
> pro (1a):
>
> Some POSIX matchers use (1a). These include the Rx matcher and the
> Spencer-derived matchers used by Tcl and others.
>
> See also the arguments pro and con RE-CONCAT's interpretations
> (1), (2), and (3).
>
>The bottom line of this discussion:
>
> This question is probably moot unless RE-CONCAT interpretation (1) is
> chosen, in which case the realistic choices are (1a) or (1b).
>
> (1a) follows the POSIX standard more strictly, but (1b) typically
> matches user intent better. The differences between (1a) and (1b)
> rarely arise in practice, so it wouldn't affect applications much if
> either interpretation were chosen. A user can override either (1a)
> or (1b) by using explicit parenthesization.
>
>Suggestion:
>
> Modify the POSIX grammar so that RE concatenation is
> right-associative. This would cause (1a) to have the semantics of
> (1b), thus unifying the two interpretations and removing the
> confusion.
>
i concur. when we were discussing this at the time, for the expression
(a)(b)(c),
we intended that the submatch rule apply to (a), (b), and (c) in that order.
again, we explicitly considered that some REs might change their
meaning but anyone that close to the edge was asking for trouble anyway.
>
>
>
>---
>Mark S. Brown
>Senior Technical Staff Member, IBM Server Group
>512.838.3926 fax 512.838.3882
>yyyyy@xxxxxxxxxx
>
>
>[Some of the people in the GNU projects (Paul Eggert, Tom Lord,
>Isamu Hasegawa, myself) were considering consolidating the various
>regex implementations in GNU, and came up with a series of questions
>concerning the Specification. This is one of those questions.]
>
i think this one is straightforward.
>
>
>RE-ITERATE: a question about RE iteration and subpattern matching
>
>When the ERE /(R)*/ matches an entire string S, is the subpattern
>/(R)/ required to match the longest possible suffix of S that is
>consistent with an overall match?
>
>This question follows up the earlier questions RE-CONCAT and RE-ASSOC;
>please see the discussion of the earlier question for much of the
>background behind this question.
>
>This question is about the requirements given by the following quotes
>from the regcomp description:
>
> (Subexpression i begins at the ith matched open parenthesis,
> counting from 1.)...
>
> 1. If subexpression i in a regular expression is not contained
> within another subexpression, and it participated in the match
> several times, then the byte offsets in pmatch[i] shall delimit
> the last such match.
>
>
><http://www.opengroup.org/onlinepubs/007904975/functions/regcomp.html>:
>
>Suppose, for example, we match the pattern /(a.*z|b.*y)*.*/ to the
>string "azbazbyc", and want to know which substring was matched by the
>subpattern /(a.*z|b.*y)/. Here are some possible interpretations:
>
> (1) /(a.*z|b.*y)*/ matches the longest possible prefix "azbazby".
> The last match for /(a.*z|b.*y)/ is therefore the longest suffix
> of this prefix that is consistent with an overall match, namely
> "bazby".
>
> (2) First the subpattern /(a.*z|b.*y)/ is applied left to right
> across the string, looking for matches that are consistent with
> an overall match. This procedure matches "azbaz" and "by" in
> turn. The last match for /(a.*z|b.*y)*/ is therefore "by".
>
> (3) The standard is ambiguous, and both the above interpretations are
> valid. Perhaps other interpretations are valid as well. A
> conforming implementation might even use different
> interpretations at different times.
>
>Interpretation (2) needs some further explanation, since it relies on
>a depth-first search procedure that may not be immediately obvious.
>According to this procedure, to match the ERE /(A)*/ to the prefix of
>a string S under the constraint of an overall match to S, apply the
>following steps:
>
> (i) Apply A to the prefix of S once, subject to the constraint
> of overall match.
>
> (ii) Suppose that A's match consumes the initial substring S1,
> which must be the quasi-longest such substring.
>
> (iii) If S1 is empty, succeed and report that A matched the
>empty string.
>
> (iv) Otherwise, repeatedly apply A to the rest of S, each time
> finding the quasi-longest initial prefix matching A under the
> constraint of overall match. Repeat this process until the
> next submatch would either fail or would be empty.
>
> (v) Report the last match, which in this case must be nonempty.
>
>The term "quasi-longest" is used above because (2)'s matching
>procedure is greedy and in some cases doesn't find the longest match
>for a subexpression. Instead, it finds the longest match that a
>greedy matcher can find subject to the constraint of overall match.
>
>Interpretations (1)-(3) are related to interpretations (1)-(3) of
>question RE-CONCAT, and share many of their pros and cons. Here are
>some more specific arguments for and against these interpretations:
>
> pro (1):
>
> The standard explicitly says that the subpattern must match the
> "longest possible string"
>
><http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap09.html#tag
_
>09_01_02>
>.
> "bazby" is the longest possible string here, so it must be
>the match.
>
> pro (2):
>
> The phrase "left to right" implies that the matcher must apply the
> subpattern several times, left to right, greedily but consistently
> with the overall match being the longest.
>
> con (2):
>
> Even more so than with RE-CONCAT, interpretation (2) is quite
> complicated. It is unlikely that the POSIX authors intended (2),
> since (1) is a much more straightforward interpretation.
>
(2) is wrong. it has to be the longest match; end of argument. what
"left to right"
means ha sto do with what REs actually mean, not how they are implemented.
recall that a RE generates a (possibly infinite) set of strings. when we
say that
an RE "matches" a string, we mean that that string is a member of the
set of strings
generated by that RE. so how does
/(a.*z|b.*y)*.*/ generate azbazby ??
consistent with the leftmost longest, and left to right rules, the RE
generates
teh string as "az.bazby.c". thus, the last, or rightmost, repetition of
the parenthesised
subexpression is bazby.
>
>---
>Mark S. Brown
>Senior Technical Staff Member, IBM Server Group
>512.838.3926 fax 512.838.3882
>yyyyy@xxxxxxxxxx
>
>
--
Andrew Hume (best -> Telework) +1 732-886-1886
yyyyyy@xxxxxxxxxxxxxxxx (Work) +1 973-360-8651
AT&T Labs - Research; member of USENIX and SAGE
|