Don Cragun wrote:
> On Thu, July 2, 2009 11:30, Chet Ramey wrote:
>>> Actually, there is a known issue about the possibility of
>>> characters in the portable character set having different encodings
>>> in different locales. It was raised back in March in relation
>>> to the slash in pathnames, but it applies to other cases as well,
>>> including this one.
>> Sure, but that will eventually get fixed. The intent as I understand
>> it is that the portable character set be locale-independent (or at
>> least constant across "conforming locales").
>>
>> Chet
>
> Chet,
> In practice, most system do not have a problem here.
> When the standards were being written, the authors wanted
> to allow for the possibility of having some locales with an
> EBCDIC codeset base and others with an ISO 8859-* or 646
> (ASCII) codeset base. So, the intent was not to require that
> the characters in the portable character set have the same
> encoding in all locales even though that could simplify a lot
> of code.
I've look at glibc sources (as a big source of locales and charser):
'/' has the same binary representation in all charsets (which could
be used as locale).
Then I looked for 'A' letter, discarding charsets that could not be
used in locale (single byte, \0 is not a character).
There are only two variant: ASCII and EBCDIC based values.
EBCDIC based charsets have anyway other problems: they can be
used only if char have the same range as unsigned char
(C constraint). BTW it seems no locale is defined with EBCDIC
charset.
So probably requiring ASCII codes for numbers, letter and few
extra chacter would resolve more portability problem that
it would cause.
ciao
cate
|