Ulrich --
One AI on me from the last meeting was to come up with a proposal for
a new interface to make the collation sequence information accessible
to the user (just a reminder: the collation sequence is what is just
to resolve range expressions liek [a-c] in regular expression in in
fnmatch).
I have several questions about this.
Are you one of those who might use this interface? If so,
would you expect to produce significantly different output for
your regular expressions than is typical for users of POSIX
internationalized OSes?
I'm asking because I remember you were concerned about the existing
behavior of internationalized reg ex's, because you didn't want
(among other things) case-folding. That is, you didn't want a
range like [a-c] to match
a A b B c
(I've omitted characters with diacritics to simplify this example.)
. . .
RATIONALE
The designers of the POSIX locale model explicitly chose to use for
range expressions in pattern matching not the collation order but
instead the collation sequence. This term is not further specified
but the original intend was it to have the collation sequence being
defined by the order of the lines in the LC_COLLATE specification.
So the APIs will use the order of the lines in the LC_COLLATE section.
How, if at all, does the end result differ from what users get with
existing POSIX locales and existing regular expression implementations?
Will collation *order* and collation *sequence* truly be different
things? Will there be any localedef syntax changes?
BTW, you also mention that
APPLICATION USAGE
This interface is mainly meant to provide the user with a direct
interface to retrieve the information used by the regular expression
matching functions and the fnmatch() function to handle range expressions
of the form "[a-c]". . .
There are no interfaces for directly retrieving information in other
parts of the locale. For example, there are no APIs for getting the
info in the LC_CTYPE section. The fact that an API does not exist
doesn't mean it's required.
I'm also wondering why it might be appropriate to add APIs for
handling some specific Latin-script-based examples, when we're
mostly way beyond Latin-only support.
-- Sandra
-----------------------
Sandra Martin O'Donnell
Compaq Computer Corporation
yyyyyyyyyyyyyyy@xxxxxxxxxx
yyyyyyyy@xxxxxxxxxxx
|