On Wed, Aug 16, 2000 at 10:37:13AM -0400, Sandra O'donnell USG wrote:
>
> > Will collation *order* and collation *sequence* truly be different
> > things? Will there be any localedef syntax changes?
>
> No localedef syntax changes. I'm just emitting another table. And
> yes, the order and the sequence are very different. Look at the
> LC_COLLATE specification in ISO 14651 (which is actually a standard)
>
> First, the JTC1/SC22/WG20 Web site I've looked at says ISO 14651 is still in
> the third FCD, not that it has been finalized. Either way, its status
> is irrelevant. There are hundreds of locales on existing implementations,
> and they won't all go away because 14651 exists. Any proposed APIs must
> deal with the reality of sequences and orders defined in these existing
> locales.
The FCD 14651 passed its 3rd FCD ballot and is currently issued for
FDIS ballot.
> The *sequence* of these characters is such that all versions of A's
> come before all versions of B's which come before all versions of
> C's, and so on. Thus, in the sequence, lowercase and uppercase
> are intermixed.
>
> The *order* is that lowercase (<MIN>) comes before uppercase (<CAP>),
> and other weights (<GRAVE>,<ACUTE>,<WIDECAP>,etc.) apply as specified,
> but an uppercase A still comes before a lowercase b. How does either
> the sequence or the order as defined in 14651 address your perceived
> problem with respect to [a-c] matching "a A b B c"?
In my mind [a-c] should include a A b B c C (and diacritcs).
New and novice users would expect that behaviour.
It is only due to heritage from the POSIX/C locale that
some people expect it not to include the uppercase letters
and the diacritics. Why shall we be doomed by an outdated
locale like the POSIX locale for the rest of the lifespan of UNIX-like systems?
I acknowledge the dangerous aspects of REs in commands like "rm".
Keld
|