Email List: Xaustin-group-lX
[All Lists]

Response to Aardvark XBD ERN 17

To: yyyyyyyyyyyyyy@xxxxxxxxxxxxx
Subject: Response to Aardvark XBD ERN 17
From: Andrew Josey <yyyyyy@xxxxxxxxxxxxxxxxx>
Date: Tue, 12 Aug 2003 09:09:23 +0100
All
Enclosed is the proposed response to XBD ERN 17, 
Special thanks to Kenjiro and others for analyzing this.

Please note that all line numbers refer to the 2001 edition of XBD6 and
no text changed in that part of the standard in the 2003 edition.


Opinion:
        Disagree with the submitter except for one editorial error
        in the example.
        
Submitter's interpretation:
        - Example at L3847-L3852 says a null byte is allowed in
        the second or subsequent byte of a multibyte character
        definition.
        - L3719-L3723 says a null byte is NOT allowed in a single-shift
        encoding character definition.
        - There is no way to distinguish the intent of a given range
        declaration as defining a single-shift encoding character or
        a normal multibyte character, in the character set description
        file.  Therefore, it cannot define a range declaration defining
        single-shift encoding characters that carries over a byte
        boundary.
        
Submitter's proposal:
        - Add a new encoding format to explicitly specify single-shift
        encoding characters.


Our interpretation:
        - L3719-L3723 clearly describes that a null byte cannot be in
        the second or subsequent bytes of a character, whether it is
        a single shift encoding character or a normal multibyte character.
        So, no need to distinguish between a single-shift encoding
        character and a normal multibyte character in the character set
        description file.
        - Example at L3847-L3852 may not be appropriate, but I don't
        think it's totally wrong.  I think the example just mean how
        the specification of the range declaration would be interpreted.
        So, the result of the expansion of the range may not be always
        making a valid character.  If the user specifies the range
        declaration that carries over a byte boundary, the result is
        the user's responsibility.  The definition of <j0103> in the
        example will be invalid, because a null byte is in the second
        byte.  However, the user can also directly specify such
        an invalid declaration in the character set description file,
        although it may result in an error or unexpected behavior of
        the localedef utility.  I don't see a difference between
        specifying a range declaration carrying over a byte boundary
        that causes null byte in the second or subsequent byte of
        a multibyte character, and specifying an invalid character
        declaration.
        That is, specifying:

        <j0101>...<j0104>       \d129\d254
        
        will cause the same result as specifying:
        
        <j0101>                 \d129\d254
        <j0102>                 \d129\d255
        <j0103>                 \d130\d00
        <j0104>                 \d130\d01
        
        The result may contain an invalid declaration of a character,
        but that is user's responsibility.
        
        - Example at L3851-L3852 has an editorial error.  Decimal constants
        shall be represented by two or three decimal digits, preceded by
        the escape character and 'd'.  But, they have only one digit
        preceded by "\d".

Our proposal:
        (1) Fix the editorial error in the example. Change L3851-L3852
            from:
                <j0103>                 \d130\d0
                <j0104>                 \d130\d1
                
            to:
        
                <j0103>                 \d130\d00
                <j0104>                 \d130\d01
        

        (2) Apply the following changes:

            - Insert the following sentence after "the range." and before
              "For example, ..." at L3846:

            However, because this causes a null byte in the second or
            subsequent bytes of a character, such a declaration should
            not be specified.

            - Insert the following sentence between L3852 and L3853:

            The expanded declaration of the symbol <j0103> in the above
            example is an invalid specification, because it contains a null
            byte in the second byte of a character. 

           

-----
Andrew Josey                                The Open Group  
Austin Group Chair                          Apex Plaza,Forbury Road,
Email: yyyyyyy@xxxxxxxxxxxxx                Reading,Berks.RG1 1AX,England
Tel:   +44 118 9508311 ext 2250             Fax: +44 118 9500110

<Prev in Thread] Current Thread [Next in Thread>