One AI on me from the last meeting was to come up with a proposal for
a new interface to make the collation sequence information accessible
to the user (just a reminder: the collation sequence is what is just
to resolve range expressions liek [a-c] in regular expression in in
fnmatch).
I exchanged some mail with Gary Miller and Mark Brown and we came up
with the append proposal. It should be discussed in the next plenary
meeting.
--
---------------. ,-. 1325 Chesapeake Terrace
Ulrich Drepper \ ,-------------------' \ Sunnyvale, CA 94089 USA
Red Hat `--' drepper at redhat.com `------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The POSIX.2 standards mentions in the discussion of the handling of
range expressions (for regular expressions and globbing purposes)
"collation sequences". This is the only time this term appears and
discussions with the originators of the standard confirmed that this
text is chosen deliberately. Collation sequence is not the same as
collation order.
Therefore the implementation of fnmatch() and regcomp() must use this
additional sorting information to determine the results of a ranges.
The original intend was that the collation sequence information is
derived from the order in which the lines contain the collation
information appear in the file fed to the localedef program.
The problem with this is that there is no official interface for the
user to get this information. For the collation order there is, of
course, the strcoll() function. This function can be used by
application developers to learn all about the sorting behavior which
is necessary for interoperability.
To resolve this problem the following new interface is proposed:
int strseq (const char *s1, const char *s2);
or
int posix_strseq (const char *s1, const char *s2);
The first name is probably a too drastic intrusion into the
application namespace. Alternatively (if POSIX does not agree to add
this function) the name could be
int xpg_strseq (const char *s1, const char *s2);
The semantics of the function is to return a value <0, ==0, or >0
depending of the collation sequence value of the first character
following the prefix common to both strings s1 and s2 (the prefix can
e the empty string).
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NAME
strseq -- string comparison using collation sequence information
SYNOPSIS
#include <string.h>
int strseq (const char *s1, const char *s2);
DESCRIPTION
The strseq() function shall compare the string pointed to by s1 to
the string pointed to by s2, both interpreted as appropriate to the
collation sequence information in the LC_COLLATE category of the
current locale.
The strseq() function shall not change the setting of errno if
successful.
Because no return value is reserved to indicate an error, an application
wishing to check for error situations should set errno to 0, then call
strseq(), then check errno.
RETURN VALUE
Upon successful completion, strseq() shall return an integer greater
than, equal to, or less than 0, according to whether the collation
sequence value of the string pointed to by s1 is greater than, equal
to, or less than the string pointed to by s2, when both are interpreted
as appropriate to the current locale. On error, strseq() may set errno,
but no return value is reserved to indicate an error.
ERRORS
The strseq() function may fail if:
EINVAL The s1 or s2 arguments contain characters outside the domain
of the collating sequence.
APPLICATION USAGE
This interface is mainly meant to provide the user with a direct
interface to retrieve the information used by the regular expression
matching functions and the fnmatch() function to handle range expressions
of the form "[a-c]". How expressions like these are interpreted is
implementation defined and cannot be deduced without such an interface.
RATIONALE
The designers of the POSIX locale model explicitly chose to use for
range expressions in pattern matching not the collation order but
instead the collation sequence. This term is not further specified
but the original intend was it to have the collation sequence being
defined by the order of the lines in the LC_COLLATE specification.
SEE ALSO
strcmp(), strcoll()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NAME
wcsseq -- wide-character string comparison using collation sequence
information
SYNOPSIS
#include <wchar.h>
int wcsseq (const wchar_t *s1, const wchar_t *s2);
DESCRIPTION
The wcsseq() function shall compare the wide-character string
pointed to by s1 to the string pointed to by s2, both interpreted
as appropriate to the collation sequence information in the
LC_COLLATE category of the current locale.
The wcsseq() function shall not change the setting of errno if
successful.
Because no return value is reserved to indicate an error, an application
wishing to check for error situations should set errno to 0, then call
wcsseq(), then check errno.
RETURN VALUE
Upon successful completion, wcsseq() shall return an integer greater
than, equal to, or less than 0, according to whether the collation
sequence value of the wide-character string pointed to by s1 is greater
than, equal to, or less than the wide-character string pointed to by s2,
when both are interpreted as appropriate to the current locale. On error,
wcsseq() may set errno, but no return value is reserved to indicate an
error.
ERRORS
The wcsseq() function may fail if:
EINVAL The s1 or s2 arguments contain wide-character codes outside
the domain of the collating sequence.
APPLICATION USAGE
This interface is mainly meant to provide the user with a direct
interface to retrieve the information used by the regular expression
matching functions and the fnmatch() function to handle range expressions
of the form "[a-c]". How expressions like these are interpreted is
implementation defined and cannot be deduced without such an interface.
RATIONALE
The designers of the POSIX locale model explicitly chose to use for
range expressions in pattern matching not the collation order but
instead the collation sequence. This term is not further specified
but the original intend was it to have the collation sequence being
defined by the order of the lines in the LC_COLLATE specification.
SEE ALSO
wcscmp(), wcscoll()
|