Email List: Xaustin-group-lX
[All Lists]

Re: multibyte C locale

To: wollman+austin-group@xxxxxxxxxxx, albert@xxxxxxxxxxxxxxxxxxxxx
Subject: Re: multibyte C locale
From: Joerg.Schilling@xxxxxxxxxxxxxxxxxxx (Joerg Schilling)
Date: Mon, 02 Nov 2009 17:15:01 +0100
Cc: Glen.Seeds@xxxxxxxxxx, austin-group-l@xxxxxxxxxxxxx
References: <4AE83F49.1090805@byu.net><20091030093610.GA31100@squonk.masqnet><20091030131925.GH28296@prunille.vinc17.org><20091030140706.GA12871@squonk.masqnet><20091030181139.GI28296@prunille.vinc17.org><8CC27A9B7A337AB-1BD0-9799@webmail-d053.sysops.aol.com><4AEB5378.5070402@byu.net><19179.21604.450124.747123@khavrinen.csail.mit.edu><OF7977394B.18C83951-ON8525765F.0077969C-8525765F.0077A6A5@ca.ibm.com><19179.25274.524418.169871@khavrinen.csail.mit.edu><787b0d920910310310x71642ee3i19b7e233e6c64696@mail.gmail.com>
Albert Cahalan <albert@users.sourceforge.net> wrote:

> C.UTF-8 would be damn nice. Dealing with UTF-8 text is rather
> important these days. Unfortunately, locales like en_US.UTF-8
> get all stupid with collating order. 'a' comes after 'Z' damn it!
> (that is, U+0061 is a bigger number than U+005A) Possibly
> there is some hack with multiple locale variables that will make
> things sane, but that's excessively painful for such a common case.
>
> I expect the plain C locale to cover U+0000 through U+00FF,
> but it does not. At least with glibc, stuff like U+00E0 fails the
> isalpha() test. Ouch, this is broken too.

It would make sense to asume ISO8859-1 for the extended charset in a 
single char 8 bit C locale as UNICODE has become the standard of the 
future and as ISO8859-1 is identical to the values 0 .. 255 from UNICODE.

Jörg

-- 
 EMail:joerg@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
       js@cs.tu-berlin.de                (uni)  
       joerg.schilling@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily

<Prev in Thread] Current Thread [Next in Thread>