"Sandra O'donnell USG" <yyyyyyyy@xxxxxxxxxxx> writes:
> So which users have requested this functionality? Although I agree
> there's no easy way to get the collation sequence information today,
> who needs this? To do what?
Have you read Davd's mails? Everybody who has to implement fnmatch()
or regex() outside the libc needs it. E.g., in the GNU shell (bash)
there is a separate glob/fnmatch implementation because it must be
interruptible. David needs it for a portable implementation of ksh
where he cannot assume a correct implementation on the target system.
> How do the APIs address this? Assume the sequence is defined
> in a case-mixed way, the APIs exist, and a program uses them
> to access the sequence. Now what?
Then the range will contain the upper/lowercase mixture. I've made
sure all the locale description I use are corrected.
> First, the JTC1/SC22/WG20 Web site I've looked at says ISO 14651 is still in
> the third FCD, not that it has been finalized. Either way, its status
> is irrelevant. There are hundreds of locales on existing implementations,
> and they won't all go away because 14651 exists. Any proposed APIs must
> deal with the reality of sequences and orders defined in these existing
> locales.
Not at all. And: the proposed solution does not depend on 14651. I
works with any locale description in the POSIX format.
Anyway, I don't know what your problem is. If you want to handle it
differently, go on. The standards leave it unspecified. What Gary
Miller proposed is just one possibility. It happens to be one I like
and implemented but that's all.
> However, if I look at the sequence in 14651 as an example, I see this:
>
> . . .
> <U0061> <S0061>;<BASE>;<MIN>;<U0061> % LATIN SMALL LETTER A
> <UFF41> <S0061>;<BASE>;<WIDE>;<UFF41> % FULLWIDTH LATIN SMALL LETTER A
> <U249C> <S0061>;<BASE>;<COMPAT>;<U249C> % PARENTHESIZED LATIN SMALL LETTER A
> <U24D0> <S0061>;<BASE>;<CIRCLE>;<U24D0> % CIRCLED LATIN SMALL LETTER A
> <U0041> <S0061>;<BASE>;<CAP>;<U0041> % LATIN CAPITAL LETTER A
I know that the official locale description in the draft is not
following this. But it doesn't matter. You can easily rearrange the
lines without changing the collation order. Just put the entries with
<MIN> and <CAP> in separate blocks in the input. I've done this.
--
---------------. ,-. 1325 Chesapeake Terrace
Ulrich Drepper \ ,-------------------' \ Sunnyvale, CA 94089 USA
Red Hat `--' drepper at redhat.com `------------------------
|