Email List: Xaustin-group-lX
[All Lists]

Re: multibyte C locale

To: austin-group-l@xxxxxxxxxxxxx
Subject: Re: multibyte C locale
From: David-Sarah Hopwood <david-sarah@xxxxxxxxxxxxx>
Date: Sat, 31 Oct 2009 00:54:30 +0000
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=LPPGY/lN70D/BR9ihLYyTMUJ4Zz7ePwjLiMRIDMDbSs=; b=JQ0JmUQnyaAr1U/UBdX+F/RZwgn3FNfxuk02gZqcTOwvhSbapV5Tx+VMu/+jWj/nlh f5CcyEQBtGXazIQoCtsEA8gHyl9ha08Z5/NhfRaXzQ63OSr1BYDueKOZdltYMjzBn3Mx MuwYD7ycOUAArL3CtyrCms5B3NHLSYXBHR0F8=
Domainkey-signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; b=C7GTHWdpJhxfMtsf07u9saSRT5fQmlTUWHKGQ/CMcCFm0lPLiqo3u7X0OCgM/7dDC8 vAz7IRil309kjMC5vdthFZSIhQv0KJt5ZtsUayrQPtHmYTuuGOLWZUujtKcxAaKIjLtI EIv+GhtgzUebYcwZ6w0G2tcpGhcjN2AWhOX9I=
References: <4AE83F49.1090805@byu.net><8CC273626A167C9-708C-A347@webmail-m008.sysops.aol.com><4AEAA953.7070605@jacaranda.org><20091030093610.GA31100@squonk.masqnet><20091030131925.GH28296@prunille.vinc17.org><20091030140706.GA12871@squonk.masqnet> <20091030181139.GI28296@prunille.vinc17.org> <8CC27A9B7A337AB-1BD0-9799@webmail-d053.sysops.aol.com> <OF061B1E55.0E7681F0-ON8525765F.006B7217-8525765F.006BDF48@ca.ibm.com> <8CC27B486F71509-1BD0-AA5A@webmail-d053.sysops.aol.com>
shwaresyst@aol.com wrote:
> I don't believe so. This would still force all applications doing
> lexical analysis to use routines that need to include extra logic to
> test whether a given byte is or isn't a state-changing code that it
> might need to account for, even if just to throw that code and the next
> byte or bytes away from lexical consideration before continuing
> processing, and not simply a non-state -changing code that isn't part of
> the PCS which can be disregarded. For applications like the C compiler,
> when doing a rebuild of a million or more lines of code, this could
> noticeably add to the processing time required to complete the task, I'd
> think.

That argument makes no sense:

If an application uses single-byte character APIs, then it incurs no
additional overhead when the locale uses a multi-byte encoding.

If an application uses multi-byte character APIs, then that is
presumably because it was intended to work correctly on input in
multi-byte encodings. So complaining that it might incur some
additional overhead is beside the point, since that overhead is
necessary for the app to work as intended.

Note that the app's behaviour will fall into one of these classes
regardless of whether it uses the libc character APIs, or rolls its
own, as most compilers do.

-- 
David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com

<Prev in Thread] Current Thread [Next in Thread>