Email List: Xaustin-group-lX
[All Lists]

AI 2000-05-010: proposed interface

To: yyyyyyyyyyyy@xxxxxxxxxxxxx
Subject: AI 2000-05-010: proposed interface
From: Ulrich Drepper <yyyyyyy@xxxxxxxxxx>
Date: 14 Aug 2000 12:46:54 -0700
Cc: Andrew Josey <yyyyyy@xxxxxxxxxxxxxxxxx>
One AI on me from the last meeting was to come up with a proposal for
a new interface to make the collation sequence information accessible
to the user (just a reminder: the collation sequence is what is just
to resolve range expressions liek [a-c] in regular expression in in
fnmatch).

I exchanged some mail with Gary Miller and Mark Brown and we came up
with the append proposal.  It should be discussed in the next plenary
meeting.

-- 
---------------.                          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The POSIX.2 standards mentions in the discussion of the handling of
range expressions (for regular expressions and globbing purposes)
"collation sequences".  This is the only time this term appears and
discussions with the originators of the standard confirmed that this
text is chosen deliberately.  Collation sequence is not the same as
collation order.

Therefore the implementation of fnmatch() and regcomp() must use this
additional sorting information to determine the results of a ranges.
The original intend was that the collation sequence information is
derived from the order in which the lines contain the collation
information appear in the file fed to the localedef program.

The problem with this is that there is no official interface for the
user to get this information.  For the collation order there is, of
course, the strcoll() function.  This function can be used by
application developers to learn all about the sorting behavior which
is necessary for interoperability.


To resolve this problem the following new interface is proposed:

  int strseq (const char *s1, const char *s2);
or
  int posix_strseq (const char *s1, const char *s2);

The first name is probably a too drastic intrusion into the
application namespace.  Alternatively (if POSIX does not agree to add
this function) the name could be

  int xpg_strseq (const char *s1, const char *s2);

The semantics of the function is to return a value <0, ==0, or >0
depending of the collation sequence value of the first character
following the prefix common to both strings s1 and s2 (the prefix can
e the empty string).


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NAME
    strseq -- string comparison using collation sequence information

SYNOPSIS
    #include <string.h>

    int strseq (const char *s1, const char *s2);

DESCRIPTION

    The strseq() function shall compare the string pointed to by s1 to
    the string pointed to by s2, both interpreted as appropriate to the
    collation sequence information in the LC_COLLATE category of the
    current locale.

    The strseq() function shall not change the setting of errno if
    successful.

    Because no return value is reserved to indicate an error, an application
    wishing to check for error situations should set errno to 0, then call
    strseq(), then check errno.

RETURN VALUE

    Upon successful completion, strseq() shall return an integer greater
    than, equal to, or less than 0, according to whether the collation
    sequence value of the string pointed to by s1 is greater than, equal
    to, or less than the string pointed to by s2, when both are interpreted
    as appropriate to the current locale.  On error, strseq() may set errno,
    but no return value is reserved to indicate an error.

ERRORS

    The strseq() function may fail if:

    EINVAL    The s1 or s2 arguments contain characters outside the domain
              of the collating sequence.

APPLICATION USAGE

    This interface is mainly meant to provide the user with a direct
    interface to retrieve the information used by the regular expression
    matching functions and the fnmatch() function to handle range expressions
    of the form "[a-c]".  How expressions like these are interpreted is
    implementation defined and cannot be deduced without such an interface.

RATIONALE

    The designers of the POSIX locale model explicitly chose to use for
    range expressions in pattern matching not the collation order but
    instead the collation sequence.  This term is not further specified
    but the original intend was it to have the collation sequence being
    defined by the order of the lines in the LC_COLLATE specification.

SEE ALSO

    strcmp(), strcoll()

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NAME
    wcsseq -- wide-character string comparison using collation sequence
              information

SYNOPSIS
    #include <wchar.h>

    int wcsseq (const wchar_t *s1, const wchar_t *s2);

DESCRIPTION

    The wcsseq() function shall compare the wide-character string
    pointed to by s1 to the string pointed to by s2, both interpreted
    as appropriate to the collation sequence information in the
    LC_COLLATE category of the current locale.

    The wcsseq() function shall not change the setting of errno if
    successful.

    Because no return value is reserved to indicate an error, an application
    wishing to check for error situations should set errno to 0, then call
    wcsseq(), then check errno.

RETURN VALUE

    Upon successful completion, wcsseq() shall return an integer greater
    than, equal to, or less than 0, according to whether the collation
    sequence value of the wide-character string pointed to by s1 is greater
    than, equal to, or less than the wide-character string pointed to by s2,
    when both are interpreted as appropriate to the current locale.  On error,
    wcsseq() may set errno, but no return value is reserved to indicate an
    error.

ERRORS

    The wcsseq() function may fail if:

    EINVAL    The s1 or s2 arguments contain wide-character codes outside
              the domain of the collating sequence.

APPLICATION USAGE

    This interface is mainly meant to provide the user with a direct
    interface to retrieve the information used by the regular expression
    matching functions and the fnmatch() function to handle range expressions
    of the form "[a-c]".  How expressions like these are interpreted is
    implementation defined and cannot be deduced without such an interface.

RATIONALE

    The designers of the POSIX locale model explicitly chose to use for
    range expressions in pattern matching not the collation order but
    instead the collation sequence.  This term is not further specified
    but the original intend was it to have the collation sequence being
    defined by the order of the lines in the LC_COLLATE specification.

SEE ALSO

    wcscmp(), wcscoll()

<Prev in Thread] Current Thread [Next in Thread>