All
Enclosed is the proposed response to XBD ERN 17,
Special thanks to Kenjiro and others for analyzing this.
Please note that all line numbers refer to the 2001 edition of XBD6 and
no text changed in that part of the standard in the 2003 edition.
Opinion:
Disagree with the submitter except for one editorial error
in the example.
Submitter's interpretation:
- Example at L3847-L3852 says a null byte is allowed in
the second or subsequent byte of a multibyte character
definition.
- L3719-L3723 says a null byte is NOT allowed in a single-shift
encoding character definition.
- There is no way to distinguish the intent of a given range
declaration as defining a single-shift encoding character or
a normal multibyte character, in the character set description
file. Therefore, it cannot define a range declaration defining
single-shift encoding characters that carries over a byte
boundary.
Submitter's proposal:
- Add a new encoding format to explicitly specify single-shift
encoding characters.
Our interpretation:
- L3719-L3723 clearly describes that a null byte cannot be in
the second or subsequent bytes of a character, whether it is
a single shift encoding character or a normal multibyte character.
So, no need to distinguish between a single-shift encoding
character and a normal multibyte character in the character set
description file.
- Example at L3847-L3852 may not be appropriate, but I don't
think it's totally wrong. I think the example just mean how
the specification of the range declaration would be interpreted.
So, the result of the expansion of the range may not be always
making a valid character. If the user specifies the range
declaration that carries over a byte boundary, the result is
the user's responsibility. The definition of <j0103> in the
example will be invalid, because a null byte is in the second
byte. However, the user can also directly specify such
an invalid declaration in the character set description file,
although it may result in an error or unexpected behavior of
the localedef utility. I don't see a difference between
specifying a range declaration carrying over a byte boundary
that causes null byte in the second or subsequent byte of
a multibyte character, and specifying an invalid character
declaration.
That is, specifying:
<j0101>...<j0104> \d129\d254
will cause the same result as specifying:
<j0101> \d129\d254
<j0102> \d129\d255
<j0103> \d130\d00
<j0104> \d130\d01
The result may contain an invalid declaration of a character,
but that is user's responsibility.
- Example at L3851-L3852 has an editorial error. Decimal constants
shall be represented by two or three decimal digits, preceded by
the escape character and 'd'. But, they have only one digit
preceded by "\d".
Our proposal:
(1) Fix the editorial error in the example. Change L3851-L3852
from:
<j0103> \d130\d0
<j0104> \d130\d1
to:
<j0103> \d130\d00
<j0104> \d130\d01
(2) Apply the following changes:
- Insert the following sentence after "the range." and before
"For example, ..." at L3846:
However, because this causes a null byte in the second or
subsequent bytes of a character, such a declaration should
not be specified.
- Insert the following sentence between L3852 and L3853:
The expanded declaration of the symbol <j0103> in the above
example is an invalid specification, because it contains a null
byte in the second byte of a character.
-----
Andrew Josey The Open Group
Austin Group Chair Apex Plaza,Forbury Road,
Email: yyyyyyy@xxxxxxxxxxxxx Reading,Berks.RG1 1AX,England
Tel: +44 118 9508311 ext 2250 Fax: +44 118 9500110
|