Email List: Xaustin-group-lX
[All Lists]

Re: Document [On Support for TR-19769 and New Character Types 0.9] crea

To: austin-group-l@xxxxxxxxxxxxx
Subject: Re: Document [On Support for TR-19769 and New Character Types 0.9] created
From: Geoff Clare <gwc@xxxxxxxxxxxxx>
Date: Wed, 12 Mar 2008 11:27:18 +0000
References: <200803112203.WAA04003@xxxxxx>
Nick Stoughton <nick@xxxxxx> wrote, on 11 Mar 2008:
>
> A document has just been created by Nick Stoughton.
> 
> Web:       Austin Group Collaborative Work Area
> Category:  Reference Document
> Title:     On Support for TR-19769 and New Character Types  0.9
> URL:       http://www.opengroup.org/austin/plato/protected/doc.tpl?gdid=16043

I have a few comments ...

| This paper argues for the third of those options; that the TR should
| be confirmed for continuation as a TR, and should not be considered
| for inclusion in the upcoming revision of IS 9899.

Presumably we would be equally happy with the fifth option (withdraw
the TR).

| Combining "duplicate" functions for each ABI in a single ABI causes
| problems for application developers that provide their application
| in library form. Since they do not know which form the customer will
| prefer in the end user developed application, the ISV must provide
| all forms of their library. This applies to all third party
| libraries, since the end user is free to mix these libraries in a
| single program.

I don't follow this part.  Having multiple ABIs for a single API is
what makes ISVs have to provide multiple binary forms of their
library.  Duplicating functions within one ABI just means ISVs
also need to have duplicate functions in their libraries (assuming
the library has any interfaces that involve character types).

| By limiting the ABI to a single of API,

Something awry here.

| No matter which side of the fence an implementer is on, this forces
| all implementers to support both the UTF-16 and ISO-10646 forms.

s/ISO-10646/UTF-32/

| This duplication will require, by transitivity, a duplicate set of
| all libraries provided by those application vendors that distribute
| their applications in the form of libraries.

Again, it is not a duplicate set of libraries themselves that is
required, it is duplication of character-related interfaces within
each library.

| A char16_t storage unit cannot be used to store all codes defined in
| ISO-10646 because the number of codes defined in the Standard exceeds
| 65535. Therefor, the only way to represent all of ISO-10646 is to
| make the char16_t a multi-byte character representation. Hence, it is
| wrong to refer to it as a character class or to have a string literal
| definition for the mapping of UTF-16 "characters" into a string. The
| paper conveniently omits the fact that a 16 bit character cannot
| hold all of the ISO-10646 "characters". Put another way, the 16-bit
| character must be treated as a multi-byte character if it is used to
| store all characters defined in ISO-10646.

The uses of "multi-byte" in this paragraph are problematic.
Really it needs to be "multi-character" but using that term means
some of the other uses of "character" need to change.  Here's an
attempt to fix it:

  A char16_t storage unit cannot be used to store all codes defined in
  ISO-10646 because the number of codes defined in the Standard exceeds
  65535. Therefore, the only way to represent all of ISO-10646 is to
  make the char16_t a multi-character representation akin to the
  multi-byte representation used for char. Hence, it is wrong to refer
  to it as a character class or to have a string literal definition
  for the mapping of UTF-16 characters into a string. The paper
  conveniently omits the fact that a 16 bit character cannot hold all
  of the ISO-10646 codes.

The part I'm least sure about is the middle bit about a "string literal
definition".

I left off the "Put another way ..." part on the end because after
changing to use "multi-character" it seemed to be repeating the point
in the same way, rather than another way.

-- 
Geoff Clare <g.clare@xxxxxx>
The Open Group, Thames Tower, Station Road, Reading, RG1 1LX, England

<Prev in Thread] Current Thread [Next in Thread>