Email List: Xaustin-group-futures-lX
[All Lists]

Re: Proposal for a new printf() format conversion character

To: yyyyyyyyyyyyyyyyyyyyyy@xxxxxxxxxxxxx, yyy@xxxxxxxxxxxxxxxx, yyy@xxxxxxxxxxxxxxxx
Subject: Re: Proposal for a new printf() format conversion character
From: David Korn <yyy@xxxxxxxxxxxxxxxx>
Date: Mon, 11 Aug 2003 10:10:36 -0400 (EDT)
cc: kpv gsf
Subject: Re: Proposal for a new printf() format conversion character
--------


> I found that it would be a real easement to have the "%r" conversion for 
> printf() in the POSIX standard.
> 
> This is something that exists for a long time:
> 
> -     DECUS C implemented this via 'goto' (so it has been non-return 
>       recursive) - the old format string behind %r was ignored.
> 
> -     UNOS a UNIX clone implemented this in 1981
> 
> -     I implemented this in my own portable printf() in the mid 1980s
>       from the idea found on UNOS. 
> 
> I don't know where this really belongs to, so I just send it to you for a
> private discussion about it's future.
> 

A more general printf() extension is described by in a paper that
I wrote with Glenn Fowler and Kiem Phong Vo that was presented at
the 2000 annual USENIX conference in San Diego, titled
"Extended Data Formatting Using Sfio".  It uses %! instead of %r 
but provides a lot more functionality.  ksh uses this interface to
implement the complete printf(1) specification as a shell built-in
with extension that provide all of the printf(3) functionality plus
a number of other extensions.  In addition this library was used to
provide formatting capabilities for the output of ls and ps.

The source and complete documentation for sfio can be found
from http://www.research.att.com/sw and clicking on sfio or 
AST and following download instructions.

Here is a description of %! (which processes a Sffmt_t* argument)
from the man page:
        %! and Sffmt_t
          The pattern %! manipulates the formatting environment stack
          to (1) change the top environment to a new environment, (2)
          stack a new environment on top of the current top, or (3)
          pop the top environment.  The bottom of the environment
          stack always contains a virtual environment with the origi-
          nal formatting pair and without any extension functions.

          The top environment of a stack, say fe, is automatically
          popped whenever its format string is completely processed.
          In this case, its event-handling function (if any) is called
          as (*eventf)(f,SF_FINAL,NIL(Void_t*),fe).  The top environ-
          ment can also be popped by giving an argument NULL to %!  or
          by returning a negative value in an extension function.  In
          these cases, the event-handling function is called as
          (*eventf)(f,SF_DPOP,form,fe) where form is the remainder of
          the format string. A negative return value from the event
          handling function will prevent the environment from being
          popped.

          A formatting environment is a structure of type Sffmt_t
          which contains the following elements:

              Sffmtext_f   extf;   /* extension processor        */
              Sffmtevent_f eventf; /* event handler              */

              char*        form;   /* format string to stack     */
              va_list      args;   /* corresponding arg list     */

              int          fmt;    /* pattern being processed    */
              ssize_t      size;   /* object size                */
              int          flags;  /* formatting control flags   */
              int          width;  /* width of field             */
              int          precis; /* precision required         */
              int          base;   /* conversion base            */

              char*        t_str;  /* extfdata string            */
              int          n_str;  /* length of t_str            */

          The first four elements of Sffmt_t must be defined by the
          application before the structure is passed to a formatting
          function.  The two function fields should not be changed
          during processing.  Other elements of Sffmt_t are set by the
          respective formatting function before it calls the extension
          function Sffmt_t.extf and, subsequently, can be modified by
          this function to redirect formatting or scanning.  For exam-
          ple, consider a call from a sfprintf() function to process
          an unknown pattern %t (which we may take to mean ``time'')
          based on a formatting environment fe.  fe->extf may reset
          fe->fmt to `d' upon returing to cause sfprintf() to process
          the value being formatted as an integer.

          Below are the fields of Sffmt_t:

          extf:
               extf is a function to extend scanning and formatting
               patterns.  Its usage is discussed below.

          eventf:
               This is a function to process events as discussed ear-
               lier.

          form and args:
               This is the formatting pair of a specification string
               and corresponding argument list.  When an environment
               fe is being inserted into the stack, if fe->form is
               NULL, the top environment is changed to fe and its
               associated extension functions but processing of the
               current formatting pair continues.  On the other hand,
               if fe->form is not NULL, the new environment is pushed
               onto the stack so that pattern processing will start
               with the new formatting pair as well as any associated
               extension functions.  During processing, whenever extf
               is called, form and args will be set to the current
               values of the formatting pair in use.

          fmt: This is set to the pattern being processed or one of
               '.', 'I', '('.

          size:
               This is the size of the object being processed.

          flags:
               This is a collection of bits defining the formatting
               flags specified for the pattern.  The bits are:

               SFFMT_LEFT: Flag - in sfprintf().

               SFFMT_SIGN: Flag + in sfprintf().

               SFFMT_BLANK: Flag space in sfprintf().

               SFFMT_ZERO: Flag 0 in sfprintf().

               SFFMT_THOUSAND: Flag ' in sfprintf().

               SFFMT_LONG: Flag l in sfprintf() and sfscanf().

               SFFMT_LLONG: Flag ll in sfprintf() and sfscanf().

               SFFMT_SHORT: Flag h in sfprintf() and sfscanf().

               SFFMT_LDOUBLE: Flag L in sfprintf() and sfscanf().

               SFFMT_IFLAG: flag I in sfprintf() and sfscanf().

               SFFMT_ALTER: Flag # in sfprintf() and sfscanf().

               SFFMT_SKIP: Flag * in sfscanf().

               SFFMT_ARGPOS: This indicates argument processing for
               pos$.

               SFFMT_VALUE: This is set by fe->extf to indicate that
               it is returning a value to be formatted or the address
               of an object to be assigned.


          width:
               This is the field width.

          precis:
               This is the precision.

          base:
               This is the conversion base.

          t_str and n_str:
               This is the type string and its size.


          int (*Sffmtext_f)(Sfio_t* f, Void_t* v, Sffmt_t* fe)
          This is the type of the extension function fe->extf to pro-
          cess patterns and arguments.  Arguments are always processed
          in order and fe->extf is called exactly once per argument.
          Note that, when pos$ (below) is not used anywhere in a for-
          mat string, each argument is used exactly once per a corre-
          sponding pattern.  In that case, fe->extf is called as soon
          as the pattern is recognized and before any scanning or for-
          matting.  On the other hand, when pos$ is used in a format
          string, an argument may be used multiple times.  In this
          case, all arguments shall be processed in order by calling
          fe->extf exactly once per argument before any pattern pro-
          cessing.  This case is signified by the flag SFFMT_ARGPOS in
          fe->flags.

          In addition to the predefined formatting patterns and other
          application-defined patterns, fe->extf may be called with
          fe->fmt being one of `(' (left parenthesis), `.' (dot), and
          `I'.

          The left parenthesis requests a string to be used as the
          extfdata string discussed below.  In this case, upon return-
          ing, fe->extf should set the fe->size field to be the length
          of the string or a negative value to indicate a null-
          terminated string.

          The `I' requests an integer to define the object size.

          The dot requests an integer for width, precision, base, or a
          separator.  In this case, the fe->size field will indicate
          how many dots have appeared in the pattern specification.
          Note that, if the actual conversion pattern is 'c' or 's',
          the value *form will be one of these characters.

          f:   This is the input/output stream in the calling format-
               ting function.  During a call to fe->extf, the stream
               shall be unlocked so that fe->extf can read from or
               write to it as appropriate.

          v:   For both sfscanf() and sfprintf() functions, v points
               to a location suitable for storing any scalars or
               pointers.  On return, fe->extf treats v as discussed
               below.

          fe:  This is the current formatting environment.

          The return value rv of fe->extf directs further processing.
          There are two cases.  When pos$ is present, a negative
          return value means to ignore fe in further argument process-
          ing while a non-negative return value is treated as the case
          rv == 0 below.  When pos$ is not present, fe->extf is called
          per argument immediately before pattern processing and its
          return values are treated as below:

          rv < 0:
               The environment stack is immediately popped.

          rv == 0:
               The extension function has not consumed (in a scanning
               case) or output (in a printing case) data out of or
               into the given stream f.  The fields fmt, flags, size,
               width, precis and base of fe shall direct further pro-
               cessing.

               For sfprintf() functions, if fe->flags has the bit
               SFFMT_VALUE, fe->extf should have set *v to the value
               to be processed; otherwise, a value should be obtained
               from the argument list.  Likewise, for sfscanf() func-
               tions, SFFMT_VALUE means that *v should have a suitable
               address; otherwise, an address to assign value should
               be obtained from the argument list.

               When pos$ is present, if fe->extf changes fe->fmt, this
               pattern shall be used regardless of the pattern defined
               in the format string. On the other hand, if fe->fmt is
               unchanged by fe->extf, the pattern in the format string
               is used.  In any case, the effective pattern should be
               one of the standardly defined pattern.  Otherwise, it
               shall be treated as unmatched.

          rv > 0:
               The extension function has accessed the stream f to the
               extent of rv bytes.  Processing of the current pattern
               ceases except that, for scanning functions, if fe-
               >flags does not contain the bit SFFMT_SKIP, the assign-
               ment count shall increase by 1.


        void va_copy(va_list to, va_list fr)
          This macro function portably copies the argument list fr to
          the argument list to. It should be used to set the field
          Sffmt_t.args.


        long sffmtversion(Sffmt_t* fe, int type)
          This macro function initializes the formatting environment
          fe with a version number if type is non-zero. Otherwise, it
          returns the current value of the version number of fe.  This
          is useful for applications to find out when the format of
          the structure Sffmt_t changes.  Note that the version number
          corresponds to the Sfio version number which is defined in
          the macro value SFIO_VERSION.



David Korn
yyy@xxxxxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>