David Hopwood wrote:
> Geoff Clare wrote:
>>David Hopwood <yyyyyyyyyyyyyyyyyyyy@xxxxxxxxxxxxxxxx> wrote, on 09 Jan 2006:
>>>Eric Blake wrote:
>>>
>>>>This is probably worth an aardvark, if only to make it obvious that %c
>>>>prints only the first character of its argument (in other words, 'printf
>>>>%c 65' prints "6", not "A", in an ASCII-based character encoding). But I
>>>>would like to collect information on what other implementations do with
>>>>%c, and on what others think, before drafting such an aardvark.
>>>
>>>Note that in a locale with a multi-byte character encoding, it should be
>>>the first character, not the first byte.
>>
>>The standard currently requires printf %c to write a single byte.
>>
>>See XBD chapter 5 File Format Notation (which is referenced from the
>>XCU printf page):
>>
>> c The integer argument shall be converted to an unsigned char
>> and the resulting byte shall be written.
>
> This appears to be inconsistent with the following paragraph
> (<http://www.opengroup.org/onlinepubs/009695399/utilities/printf.html>):
>
> # Note that in a locale with multi-byte characters, the value of a character
> # is intended to be the value of the equivalent of the wchar_t representation
> # of the character as described in the System Interfaces volume of IEEE Std
> # 1003.1-2001.
>
> (Granted that this paragraph is informative and not normative.)
>
> AFAICS, there is no *good* reason for %c not to work as perfectly well
> with multi-byte characters. For example, in a UTF-8 locale, "print %c 192"
> should print "À" (A grave), which is represented as 0xC2 0xA0.
Sorry, I was misled by the reference to an "integer argument". The paragraph
I quoted applies only to the %d format specifier.
The description of printf %c seems to be simply incorrect, based on what actual
implementations do. It should have an additional point in the list of exceptions
relative to "Chapter 5, File Format Notation":
8. The c conversion specifier prints the first character of its argument
interpreted as a string; it does not take an integer argument. Note that
more than one byte may be output in the case of a locale with multi-byte
characters.
and delete "when using the %c conversion specification" from the paragraph
on multi-byte characters in the Application Usage section,
*or*
8. The c conversion specifier prints the first byte of its argument interpreted
as a string; it does not take an integer argument. Note that this will
output an invalid partial character encoding in the case where the argument
starts with a character represented by more than one byte.
*or* allow both possibilities.
--
David Hopwood <yyyyyyyyyyyyyyyyyyyy@xxxxxxxxxxxxxxxx>
|