Email List: Xaustin-group-lX
[All Lists]

Re: multibyte C locale

To: shwaresyst@xxxxxxx
Subject: Re: multibyte C locale
From: Glen Seeds <Glen.Seeds@xxxxxxxxxx>
Date: Fri, 30 Oct 2009 17:17:46 -0400
Cc: austin-group-l@xxxxxxxxxxxxx
References: <4AE83F49.1090805@byu.net><8CC273626A167C9-708C-A347@webmail-m008.sysops.aol.com><4AEAA953.7070605@jacaranda.org><20091030093610.GA31100@squonk.masqnet><20091030131925.GH28296@prunille.vinc17.org><20091030140706.GA12871@squonk.masqnet> <20091030181139.GI28296@prunille.vinc17.org> <8CC27A9B7A337AB-1BD0-9799@webmail-d053.sysops.aol.com> <OF061B1E55.0E7681F0-ON8525765F.006B7217-8525765F.006BDF48@ca.ibm.com> <8CC27B486F71509-1BD0-AA5A@webmail-d053.sysops.aol.com> <8CC27BC549BC800-55E0-15499@webmail-d051.sysops.aol.com>

The whole point of this discussion is that UTF-8 was in fact carefully crafted that way, and we want conforming programs to be use it for the POSIX locale.
  /glen


From: shwaresyst@aol.com
To: austin-group-l@opengroup.org
Date: 2009-10-30 05:10 PM
Subject: Re: multibyte C locale





Clarifying addendum: a code set matching that criteria could be defined
so that a non-PCS character could refer to different code points
depending whether it was prefaced by a state changing character or not,
i.e. be a non-state-changing code if encountered first, or a
state-continuation code if encountered after a state-changing code. A
test for whether it is a PCS char or not is still simple, but a routine
trying to do GetNextChar might still fail if passed a pointer into the
middle of a multi-byte sequence. UTF-8 may have been designed to avoid
this, but as written it still allows for sets that aren't as carefully
crafted.

Mark


-----Original Message-----
From: shwaresyst@aol.com
To: Glen.Seeds@ca.ibm.com
Cc: austin-group-l@opengroup.org
Sent: Fri, Oct 30, 2009 4:13 pm
Subject: Re: multibyte C locale









I don't believe so. This would still force all applications doing
lexical analysis to use routines that need to include extra logic to
test whether a given byte is or isn't a state-changing code that it
might need to account for, even if just to throw that code and the next
byte or bytes away from lexical consideration before continuing
processing, and not simply a non-state -changing code that isn't part
of the PCS which can be disregarded. For applications like the C
compiler, when doing a rebuild of a million or more lines of code, this
could noticeably add to the processing time required to complete the
task, I'd think.


Cheers,

Mark



-----Original Message-----

From: Glen Seeds <Glen.Seeds@ca.ibm.com>

To: shwaresyst@aol.com

Cc: austin-group-l@opengroup.org

Sent: Fri, Oct 30, 2009 3:38 pm

Subject: Re: multibyte C locale












I believe that would make a lot of working

applications non-conformant. Could we say:




  In the POSIX locale, a character from the portable character

set


  must not have a state-dependent encoding.

For characters that have


  state-dependent encoding, the encoding

of each part must be distinct


  from the coding of all portable characters.




/glen













From:


shwaresyst@aol.com





To:


austin-group-l@opengroup.org





Date:


2009-10-30 02:57 PM





Subject:


Re: multibyte C locale














Yes, that's more the type of fix I was intending to

say, but it was 1AM


when I was composing it. I do think the further qualification of 'and


all code points shall fit in a variable of C type char.' is needed to


explicitly pin it down that wide char types are also excluded.




Mark


-----Original Message-----


From: Vincent Lefevre <vincent-opgr@vinc17.org>


To: austin-group-l@opengroup.org


Sent: Fri, Oct 30, 2009 2:11 pm


Subject: Re: multibyte C locale




8< --------------------------------------------




I wonder whether the text should be changed to be more rigorous and


say exactly what it intends to say. Something like:




  In the POSIX locale, a character must not have a state-dependent


  encoding.




--


Vincent Lefèvre <vincent@vinc17.net> - Web: <
http://www.vinc17.net/>


100% accessible validated (X)HTML - Blog: <
http://www.vinc17.net/blog/>


Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)





























<Prev in Thread] Current Thread [Next in Thread>