Email List: Xaustin-group-lX
[All Lists]

Re: multibyte C locale

To: austin-group-l@xxxxxxxxxxxxx
Subject: Re: multibyte C locale
From: Geoff Clare <gwc@xxxxxxxxxxxxx>
Date: Mon, 2 Nov 2009 10:48:19 +0000
References: <8CC273626A167C9-708C-A347@webmail-m008.sysops.aol.com> <4AEAA953.7070605@jacaranda.org> <20091030093610.GA31100@squonk.masqnet> <20091030131925.GH28296@prunille.vinc17.org> <20091030140706.GA12871@squonk.masqnet> <20091030181139.GI28296@prunille.vinc17.org> <8CC27A9B7A337AB-1BD0-9799@webmail-d053.sysops.aol.com> <4AEB5378.5070402@byu.net> <19179.21604.450124.747123@khavrinen.csail.mit.edu> <4AEB6133.9050803@byu.net>
Eric Blake <ebb9@byu.net> wrote, on 30 Oct 2009:
>
> According to wollman+austin-group@lcs.mit.edu on 10/30/2009 3:02 PM:
> > <<On Fri, 30 Oct 2009 14:58:32 -0600, Eric Blake <ebb9@byu.net> said:
> > 
> >> That's a restriction that I think we should be very reluctant to make.  In
> >> other words, there has to be a good reason why we are willing to require
> >> the POSIX locale to require a unibyte charset.  And I don't think we have
> >> come up with one yet.
> > 
> > Turn it around: of what value is the POSIX locale without such a
> > requirement?
> 
> The POSIX locale is already specified to be portable in character contexts
> only when you use just the portable characters.

Not true.  The POSIX locale is required to contain some control
characters that are not in the portable character set.  (See
XBD section 7.3.1 under "LC_CTYPE Category in the POSIX Locale".)

> Both unibyte and UTF-8
> meet the following criteria: 1. all portable characters are single bytes,
> 2. no characters outside the portable characters can be confused in whole
> or in part (well, the in part only applies to multi-byte charsets) with
> portable characters.  So for all intents and purposes, you should not need
> to care whether the POSIX  locale is single- or multi-byte encoded - a
> portable app can't use characters outside the portable set in the first
> place.

Yes it can.  For example it can use tcgetattr() to obtain the
interrupt character for the slave side of a pty, and then write that
character to the master side.

It seems to me that there is an underlying assumption, in the way
the terminal interface is specified, that control characters are
always single bytes.  However, as far as I can see there is no
requirement stated anywhere that all control characters are
represented in a single byte.  We should probably change the
text in 6.1 that requires it for the portable character set so
that it requires it for the control character set as well.

> In fact, an EBCDIC charset should be just as compliant as the
> underlying charset of the POSIX locale as the more traditional ASCII or UTF-8.

s/should be/is/

There are some EBCDIC-based systems that are certified UNIX95 conforming.

-- 
Geoff Clare <g.clare@opengroup.org>
The Open Group, Thames Tower, Station Road, Reading, RG1 1LX, England

<Prev in Thread] Current Thread [Next in Thread>