| To: | wollman+austin-group@xxxxxxxxxxx |
|---|---|
| Subject: | Re: multibyte C locale |
| From: | Albert Cahalan <albert@xxxxxxxxxxxxxxxxxxxxx> |
| Date: | Sat, 31 Oct 2009 06:10:37 -0400 |
| Cc: | Glen Seeds <Glen.Seeds@xxxxxxxxxx>, austin-group-l@xxxxxxxxxxxxx |
| Dkim-signature: | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type; bh=9wEA3hdH4JOufeHPVOOXm0o+Il3uUsg3pM+c6dptCRU=; b=b/QTN8SXK1hE3QLqhhZHreFbO6q471fLsTouWiTAaDQFRN5FzUOnWyVnWvNJz26DXn rbR3K8Osf4HKSEQAJL2XP/kKurEzBD7pPKUH1hDuFXe/1SaUX7RsG/HnFnO2pQ/vzYjM zQYh1No9/0ElUlaprFfTM1Q6bYQn58LlovoDs= |
| Domainkey-signature: | a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; b=XpfZHToCWRmI4FtIsRdA0YzeJkGoPDjb49orEYLZS1Eyjedt8h02PCGfh7I9Pt9aJx QWvNB+iLcsLJoBASbIrW62VIHYF6Qv30GFceUMVcKz5hO/5bIGGDFRk0fxkXERAY83Wy 2IUP/wAcG7z0nklx5j5StWIXqT6Q+MaGSFh0U= |
| References: | <4AE83F49.1090805@byu.net> <20091030093610.GA31100@squonk.masqnet> <20091030131925.GH28296@prunille.vinc17.org> <20091030140706.GA12871@squonk.masqnet> <20091030181139.GI28296@prunille.vinc17.org> <8CC27A9B7A337AB-1BD0-9799@webmail-d053.sysops.aol.com> <4AEB5378.5070402@byu.net> <19179.21604.450124.747123@khavrinen.csail.mit.edu> <OF7977394B.18C83951-ON8525765F.0077969C-8525765F.0077A6A5@ca.ibm.com> <19179.25274.524418.169871@khavrinen.csail.mit.edu> |
On Fri, Oct 30, 2009 at 6:03 PM, <wollman+austin-group@lcs.mit.edu> wrote:
> <<On Fri, 30 Oct 2009 17:46:54 -0400, Glen Seeds <Glen.Seeds@ca.ibm.com> said:
>>> [I wrote:]
>>> Turn it around: of what value is the POSIX locale without such a
>>> requirement?
>
>> The value is considerable, if we can find a way to accommodate UTF-8.
>
> I don't see it. What's the use case? I can see a value to
> applications in being able to configure a locale that acts like
> pre-locale Unix and C did; I don't see a value in configuring a locale
> that doesn't behave like traditional Unix, but isn't usefully
> localized either. ("Traditional Unix" behavior is normally what I
> want pretty much all the time.)
C.UTF-8 would be damn nice. Dealing with UTF-8 text is rather
important these days. Unfortunately, locales like en_US.UTF-8
get all stupid with collating order. 'a' comes after 'Z' damn it!
(that is, U+0061 is a bigger number than U+005A) Possibly
there is some hack with multiple locale variables that will make
things sane, but that's excessively painful for such a common case.
I expect the plain C locale to cover U+0000 through U+00FF,
but it does not. At least with glibc, stuff like U+00E0 fails the
isalpha() test. Ouch, this is broken too.
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| ||
| Previous by Date: | Re: multibyte C locale, Albert Cahalan |
|---|---|
| Next by Date: | Re: multibyte C locale, Albert Cahalan |
| Previous by Thread: | Re: multibyte C locale, Albert Cahalan |
| Next by Thread: | Re: multibyte C locale, Albert Cahalan |
| Indexes: | [Date] [Thread] [All Lists] |