Email List: Xaustin-group-lX
[All Lists]

Re: printf(1) questions

To: yyyyyyyyyyyyyy@xxxxxxxxxxxxx
Subject: Re: printf(1) questions
From: David Hopwood <yyyyyyyyyyyyyyyyyyyy@xxxxxxxxxxxxxxxx>
Date: Mon, 09 Jan 2006 17:57:06 +0000
References: <43C26BAE.9030501@byu.net> <43C27E03.2040302@blueyonder.co.uk> <20060109155429.GA7618@squonk.masqnet> <43C29CDD.2050101@blueyonder.co.uk>
David Hopwood wrote:
> Geoff Clare wrote:
>>David Hopwood <yyyyyyyyyyyyyyyyyyyy@xxxxxxxxxxxxxxxx> wrote, on 09 Jan 2006:
>>>Eric Blake wrote:
>>>
>>>>This is probably worth an aardvark, if only to make it obvious that %c
>>>>prints only the first character of its argument (in other words, 'printf
>>>>%c 65' prints "6", not "A", in an ASCII-based character encoding).  But I
>>>>would like to collect information on what other implementations do with
>>>>%c, and on what others think, before drafting such an aardvark.
>>>
>>>Note that in a locale with a multi-byte character encoding, it should be
>>>the first character, not the first byte.
>>
>>The standard currently requires printf %c to write a single byte.
>>
>>See XBD chapter 5 File Format Notation (which is referenced from the
>>XCU printf page):
>>
>>    c   The integer argument shall be converted to an unsigned char
>>      and the resulting byte shall be written.
> 
> This appears to be inconsistent with the following paragraph
> (<http://www.opengroup.org/onlinepubs/009695399/utilities/printf.html>):
> 
> # Note that in a locale with multi-byte characters, the value of a character
> # is intended to be the value of the equivalent of the wchar_t representation
> # of the character as described in the System Interfaces volume of IEEE Std
> # 1003.1-2001.
> 
> (Granted that this paragraph is informative and not normative.)
> 
> AFAICS, there is no *good* reason for %c not to work as perfectly well
> with multi-byte characters. For example, in a UTF-8 locale, "print %c 192"
> should print "À" (A grave), which is represented as 0xC2 0xA0.

Sorry, I was misled by the reference to an "integer argument". The paragraph
I quoted applies only to the %d format specifier.

The description of printf %c seems to be simply incorrect, based on what actual
implementations do. It should have an additional point in the list of exceptions
relative to "Chapter 5, File Format Notation":

8. The c conversion specifier prints the first character of its argument
   interpreted as a string; it does not take an integer argument. Note that
   more than one byte may be output in the case of a locale with multi-byte
   characters.

   and delete "when using the %c conversion specification" from the paragraph
   on multi-byte characters in the Application Usage section,

*or*

8. The c conversion specifier prints the first byte of its argument interpreted
   as a string; it does not take an integer argument. Note that this will
   output an invalid partial character encoding in the case where the argument
   starts with a character represented by more than one byte.

*or* allow both possibilities.

-- 
David Hopwood <yyyyyyyyyyyyyyyyyyyy@xxxxxxxxxxxxxxxx>


<Prev in Thread] Current Thread [Next in Thread>