Email List: Xaustin-group-lX
[All Lists]

Re: AI 2000-05-010: proposed interface

To: yyyyyyy@xxxxxxxxxx (Ulrich Drepper)
Subject: Re: AI 2000-05-010: proposed interface
From: "Sandra O'donnell USG" <yyyyyyyy@xxxxxxxxxxx>
Date: Tue, 15 Aug 2000 15:15:41 -0400
Cc: yyyyyyyyyyyy@xxxxxxxxxxxxx, yyyyyyyy@xxxxxxxxxxx
Ulrich --

   One AI on me from the last meeting was to come up with a proposal for
   a new interface to make the collation sequence information accessible
   to the user (just a reminder: the collation sequence is what is just
   to resolve range expressions liek [a-c] in regular expression in in
   fnmatch).

I have several questions about this.

Are you one of those who might use this interface? If so,
would you expect to produce significantly different output for
your regular expressions than is typical for users of POSIX
internationalized OSes?

I'm asking because I remember you were concerned about the existing
behavior of internationalized reg ex's, because you didn't want
(among other things) case-folding. That is, you didn't want a
range like [a-c] to match

a A b B c

(I've omitted characters with diacritics to simplify this example.)

. . .
RATIONALE

    The designers of the POSIX locale model explicitly chose to use for
    range expressions in pattern matching not the collation order but
    instead the collation sequence.  This term is not further specified
    but the original intend was it to have the collation sequence being
    defined by the order of the lines in the LC_COLLATE specification.

So the APIs will use the order of the lines in the LC_COLLATE section.
How, if at all, does the end result differ from what users get with
existing POSIX locales and existing regular expression implementations?
Will collation *order* and collation *sequence* truly be different
things? Will there be any localedef syntax changes?

BTW, you also mention that 

APPLICATION USAGE

    This interface is mainly meant to provide the user with a direct
    interface to retrieve the information used by the regular expression
    matching functions and the fnmatch() function to handle range expressions
    of the form "[a-c]". . .

There are no interfaces for directly retrieving information in other
parts of the locale. For example, there are no APIs for getting the
info in the LC_CTYPE section. The fact that an API does not exist
doesn't mean it's required.

I'm also wondering why it might be appropriate to add APIs for
handling some specific Latin-script-based examples, when we're
mostly way beyond Latin-only support.

                -- Sandra
-----------------------
Sandra Martin O'Donnell
Compaq Computer Corporation
yyyyyyyyyyyyyyy@xxxxxxxxxx
yyyyyyyy@xxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>