On Fri, 19 Apr 2002 08:58:28 -0700 (PDT) Tom Lord wrote:
> gsf has implicitly reframed all of the preceeding discussion in terms
> of the grammars and changes to the standard language. That's good.
> However, he also "picked sides" and, in my view, picked the less
> plausible interpretation.
The purpose of the posting was to resolve perceived ambiguities,
and since I don't consider `undefined' an acceptable resolution
in this case, yes, I "picked sides".
> gsf, I don't believe your changes to the grammar really match the
> intent of the standard.
> Suppose I have two patterns, /<A>/ and /<B>/, and I form a third
> pattern: /<A><B>/. You're saying that /<A>/ is *not* a subpattern of
> the new pattern?!?
According to the reworded grammar you have two EREs, /<A>/ and /<B>/,
and you form a third ERE: /<A><B>/, it *is* possible that <A> is not
a subpattern of /<A><B>/. Consider:
/<A>/ a|b
/<B>/ c|d
/<A><B>/ a|bc|d
> Evidence for this is that, in general, the length of /<A>/ in /<A><B>/
> will not be maximized, even though it sure looks like a subpattern and
> sure looks like a subpattern that is to the left of <B>. The
> canonical example of this bug is:
> <A> = (wee|week)(night|knights)
> <B> = .*
> input = weeknights
`looking like' a subpattern is a human perception, `being a subpattern' is
a consequence of the (reworded) grammar. And a major point of the posting
is that the grammar does not specify how to do a match; the definition
of `matched' does that.
> However, the idea of making it clear what grammar productions are
> identical to subpatterns is a good one. It should be:
> BRE:
> subpattern == RE_expression
> ERE:
> subpattern == ERE_branch
> No grammar changes required. A single sentence added to the standard
> could clarify the issue if that's really even necessary: "In BRE
> syntax, a ``subpattern'' is any part of a pattern generated by the
> production for `RE_expression'; in ERE syntax, any part generated by
> `ERE_branch'".
> Note that those two productions, in spite of the names, have analogous
> roles in the two grammars.
The purpose of the posting was not to rewrite the standard; it was to provide
consistent terminology between the text and the grammar so that further
discussion can be precise. If the discussions determine that the standard
needs change then that will require further discussion ...
And there must be complete and precise definitions and agreement for:
pattern
subpattern
expression
subexpression
ERE
Until that's done all future discussions will just lead to more confusion.
My intention was to provide the precise connection between text and grammar,
and to re-emphasize the exact roles of the text and grammar:
text==semantics
grammar==syntax
> Incidentally, you might wonder about /<A>|<B>/. Are /<A>/ and /<B>/
> both subpatterns of that, as ordinary english would suggest? The
> answer is that you can think of it that way, and you'll always get the
> right answer, but if you want to be pedantic, in a long list of `|'
> operators, it is each separate branch that is a subpattern. Thus, if
> /<A>/ is really /<j>|<k>/ and /<B>/ is really /<m>|<n>/ then neither
> /<A>/ nor /<B>/ is a subpattern of /<A>|<B>/, but /<j>/, /<k>/, /<m>/,
> and /<n>/ are. It's a distinction without a difference, though -- the
> answer comes out the same either way.
If you are talking about the syntactic entities <A> and <B> then don't
consult ordinary English, consult the grammar. According to the (reworded)
grammar:
/<A>|<B>/ is an ERE composed of the two patterns /<A>/ and /<B>/.
/<A>/ and /<B>/ may themselves be composed of one or more
subpatterns.
> Also, don't become confused by this red herring from gsf:
> Also, and probably most important, the `subpattern' order is the
> same for either left or right productions.
> That sounds confusingly close to saying "concatenation is
> associative", which would be nice if it could be true. However, gsf's
> concatenation is not associative -- it's right associative. Under
> gsf's interpretation, /<A><B><C>/ is always equivalent (ignoring
> parenthesis numbering) to /<A>(<B><C>)/ but not to /(<A><B>)<C>/.
> gsf's is just a fancy restating of what I think we've been calling
> (1b).
The confusion is that grammar and definitions are being mixed.
The grammar determines a left-to-right ordered list of concatenated
subpatterns; you can use left or right productions in the grammar and
you will get the same list. That's syntax and that's what the grammar
defines. The grammar doesn't define how to use the subpattern list to
do a match; i.e., it defines neither left nor right associativity.
The match associativity is described in the text ((2.2) in my post),
and yes, that is right associativity -- but that declaration of right
associativity is independent of the grammar.
> (It isn't clear to me, though, whether gsf is saying "I think the
> standard means (1b), here is how to clarify the language" or
> "As a point of historic interest, it is a fact that the authors
> intended (1b) rather than (1a) -- here is how to make that intention
> more explicit.")
I'm saying both :)
I do have a correction to make in the posting.
The order for example RE-ASSOC.3696.a should be:
order: 1 1.1 1.2 1.1.1 1.1.2 1.2.1 1.1.1.1 1.1.2.1
Also, and this will look like hit and run, our building wide
UPS (ha) is going down for maintenance this weekend, so I won't
be back online until late Sunday.
regards,
-- Glenn Fowler <yyy@xxxxxxxxxxxxxxxx> AT&T Labs Research, Florham Park NJ --
|