| To: | shwaresyst@xxxxxxx |
|---|---|
| Subject: | Re: multibyte C locale |
| From: | Glen Seeds <Glen.Seeds@xxxxxxxxxx> |
| Date: | Fri, 30 Oct 2009 17:17:46 -0400 |
| Cc: | austin-group-l@xxxxxxxxxxxxx |
| References: | <4AE83F49.1090805@byu.net><8CC273626A167C9-708C-A347@webmail-m008.sysops.aol.com><4AEAA953.7070605@jacaranda.org><20091030093610.GA31100@squonk.masqnet><20091030131925.GH28296@prunille.vinc17.org><20091030140706.GA12871@squonk.masqnet> <20091030181139.GI28296@prunille.vinc17.org> <8CC27A9B7A337AB-1BD0-9799@webmail-d053.sysops.aol.com> <OF061B1E55.0E7681F0-ON8525765F.006B7217-8525765F.006BDF48@ca.ibm.com> <8CC27B486F71509-1BD0-AA5A@webmail-d053.sysops.aol.com> <8CC27BC549BC800-55E0-15499@webmail-d051.sysops.aol.com> |
|
The whole point of this discussion is that UTF-8 was in fact carefully crafted that way, and we want conforming programs to be use it for the POSIX locale. /glen
Clarifying addendum: a code set matching that criteria could be defined so that a non-PCS character could refer to different code points depending whether it was prefaced by a state changing character or not, i.e. be a non-state-changing code if encountered first, or a state-continuation code if encountered after a state-changing code. A test for whether it is a PCS char or not is still simple, but a routine trying to do GetNextChar might still fail if passed a pointer into the middle of a multi-byte sequence. UTF-8 may have been designed to avoid this, but as written it still allows for sets that aren't as carefully crafted. Mark -----Original Message----- From: shwaresyst@aol.com To: Glen.Seeds@ca.ibm.com Cc: austin-group-l@opengroup.org Sent: Fri, Oct 30, 2009 4:13 pm Subject: Re: multibyte C locale I don't believe so. This would still force all applications doing lexical analysis to use routines that need to include extra logic to test whether a given byte is or isn't a state-changing code that it might need to account for, even if just to throw that code and the next byte or bytes away from lexical consideration before continuing processing, and not simply a non-state -changing code that isn't part of the PCS which can be disregarded. For applications like the C compiler, when doing a rebuild of a million or more lines of code, this could noticeably add to the processing time required to complete the task, I'd think. Cheers, Mark -----Original Message----- From: Glen Seeds <Glen.Seeds@ca.ibm.com> To: shwaresyst@aol.com Cc: austin-group-l@opengroup.org Sent: Fri, Oct 30, 2009 3:38 pm Subject: Re: multibyte C locale I believe that would make a lot of working applications non-conformant. Could we say: In the POSIX locale, a character from the portable character set must not have a state-dependent encoding. For characters that have state-dependent encoding, the encoding of each part must be distinct from the coding of all portable characters. /glen From: shwaresyst@aol.com To: austin-group-l@opengroup.org Date: 2009-10-30 02:57 PM Subject: Re: multibyte C locale Yes, that's more the type of fix I was intending to say, but it was 1AM when I was composing it. I do think the further qualification of 'and all code points shall fit in a variable of C type char.' is needed to explicitly pin it down that wide char types are also excluded. Mark -----Original Message----- From: Vincent Lefevre <vincent-opgr@vinc17.org> To: austin-group-l@opengroup.org Sent: Fri, Oct 30, 2009 2:11 pm Subject: Re: multibyte C locale 8< -------------------------------------------- I wonder whether the text should be changed to be more rigorous and say exactly what it intends to say. Something like: In the POSIX locale, a character must not have a state-dependent encoding. -- Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon) |
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| ||
| Previous by Date: | Re: multibyte C locale, Glen Seeds |
|---|---|
| Next by Date: | Re: multibyte C locale, Glen Seeds |
| Previous by Thread: | Re: multibyte C locale, Glen Seeds |
| Next by Thread: | Re: multibyte C locale, Glen Seeds |
| Indexes: | [Date] [Thread] [All Lists] |