cc: kpv gsf
Subject: Re: Proposal for a new printf() format conversion character
--------
> I found that it would be a real easement to have the "%r" conversion for
> printf() in the POSIX standard.
>
> This is something that exists for a long time:
>
> - DECUS C implemented this via 'goto' (so it has been non-return
> recursive) - the old format string behind %r was ignored.
>
> - UNOS a UNIX clone implemented this in 1981
>
> - I implemented this in my own portable printf() in the mid 1980s
> from the idea found on UNOS.
>
> I don't know where this really belongs to, so I just send it to you for a
> private discussion about it's future.
>
A more general printf() extension is described by in a paper that
I wrote with Glenn Fowler and Kiem Phong Vo that was presented at
the 2000 annual USENIX conference in San Diego, titled
"Extended Data Formatting Using Sfio". It uses %! instead of %r
but provides a lot more functionality. ksh uses this interface to
implement the complete printf(1) specification as a shell built-in
with extension that provide all of the printf(3) functionality plus
a number of other extensions. In addition this library was used to
provide formatting capabilities for the output of ls and ps.
The source and complete documentation for sfio can be found
from http://www.research.att.com/sw and clicking on sfio or
AST and following download instructions.
Here is a description of %! (which processes a Sffmt_t* argument)
from the man page:
%! and Sffmt_t
The pattern %! manipulates the formatting environment stack
to (1) change the top environment to a new environment, (2)
stack a new environment on top of the current top, or (3)
pop the top environment. The bottom of the environment
stack always contains a virtual environment with the origi-
nal formatting pair and without any extension functions.
The top environment of a stack, say fe, is automatically
popped whenever its format string is completely processed.
In this case, its event-handling function (if any) is called
as (*eventf)(f,SF_FINAL,NIL(Void_t*),fe). The top environ-
ment can also be popped by giving an argument NULL to %! or
by returning a negative value in an extension function. In
these cases, the event-handling function is called as
(*eventf)(f,SF_DPOP,form,fe) where form is the remainder of
the format string. A negative return value from the event
handling function will prevent the environment from being
popped.
A formatting environment is a structure of type Sffmt_t
which contains the following elements:
Sffmtext_f extf; /* extension processor */
Sffmtevent_f eventf; /* event handler */
char* form; /* format string to stack */
va_list args; /* corresponding arg list */
int fmt; /* pattern being processed */
ssize_t size; /* object size */
int flags; /* formatting control flags */
int width; /* width of field */
int precis; /* precision required */
int base; /* conversion base */
char* t_str; /* extfdata string */
int n_str; /* length of t_str */
The first four elements of Sffmt_t must be defined by the
application before the structure is passed to a formatting
function. The two function fields should not be changed
during processing. Other elements of Sffmt_t are set by the
respective formatting function before it calls the extension
function Sffmt_t.extf and, subsequently, can be modified by
this function to redirect formatting or scanning. For exam-
ple, consider a call from a sfprintf() function to process
an unknown pattern %t (which we may take to mean ``time'')
based on a formatting environment fe. fe->extf may reset
fe->fmt to `d' upon returing to cause sfprintf() to process
the value being formatted as an integer.
Below are the fields of Sffmt_t:
extf:
extf is a function to extend scanning and formatting
patterns. Its usage is discussed below.
eventf:
This is a function to process events as discussed ear-
lier.
form and args:
This is the formatting pair of a specification string
and corresponding argument list. When an environment
fe is being inserted into the stack, if fe->form is
NULL, the top environment is changed to fe and its
associated extension functions but processing of the
current formatting pair continues. On the other hand,
if fe->form is not NULL, the new environment is pushed
onto the stack so that pattern processing will start
with the new formatting pair as well as any associated
extension functions. During processing, whenever extf
is called, form and args will be set to the current
values of the formatting pair in use.
fmt: This is set to the pattern being processed or one of
'.', 'I', '('.
size:
This is the size of the object being processed.
flags:
This is a collection of bits defining the formatting
flags specified for the pattern. The bits are:
SFFMT_LEFT: Flag - in sfprintf().
SFFMT_SIGN: Flag + in sfprintf().
SFFMT_BLANK: Flag space in sfprintf().
SFFMT_ZERO: Flag 0 in sfprintf().
SFFMT_THOUSAND: Flag ' in sfprintf().
SFFMT_LONG: Flag l in sfprintf() and sfscanf().
SFFMT_LLONG: Flag ll in sfprintf() and sfscanf().
SFFMT_SHORT: Flag h in sfprintf() and sfscanf().
SFFMT_LDOUBLE: Flag L in sfprintf() and sfscanf().
SFFMT_IFLAG: flag I in sfprintf() and sfscanf().
SFFMT_ALTER: Flag # in sfprintf() and sfscanf().
SFFMT_SKIP: Flag * in sfscanf().
SFFMT_ARGPOS: This indicates argument processing for
pos$.
SFFMT_VALUE: This is set by fe->extf to indicate that
it is returning a value to be formatted or the address
of an object to be assigned.
width:
This is the field width.
precis:
This is the precision.
base:
This is the conversion base.
t_str and n_str:
This is the type string and its size.
int (*Sffmtext_f)(Sfio_t* f, Void_t* v, Sffmt_t* fe)
This is the type of the extension function fe->extf to pro-
cess patterns and arguments. Arguments are always processed
in order and fe->extf is called exactly once per argument.
Note that, when pos$ (below) is not used anywhere in a for-
mat string, each argument is used exactly once per a corre-
sponding pattern. In that case, fe->extf is called as soon
as the pattern is recognized and before any scanning or for-
matting. On the other hand, when pos$ is used in a format
string, an argument may be used multiple times. In this
case, all arguments shall be processed in order by calling
fe->extf exactly once per argument before any pattern pro-
cessing. This case is signified by the flag SFFMT_ARGPOS in
fe->flags.
In addition to the predefined formatting patterns and other
application-defined patterns, fe->extf may be called with
fe->fmt being one of `(' (left parenthesis), `.' (dot), and
`I'.
The left parenthesis requests a string to be used as the
extfdata string discussed below. In this case, upon return-
ing, fe->extf should set the fe->size field to be the length
of the string or a negative value to indicate a null-
terminated string.
The `I' requests an integer to define the object size.
The dot requests an integer for width, precision, base, or a
separator. In this case, the fe->size field will indicate
how many dots have appeared in the pattern specification.
Note that, if the actual conversion pattern is 'c' or 's',
the value *form will be one of these characters.
f: This is the input/output stream in the calling format-
ting function. During a call to fe->extf, the stream
shall be unlocked so that fe->extf can read from or
write to it as appropriate.
v: For both sfscanf() and sfprintf() functions, v points
to a location suitable for storing any scalars or
pointers. On return, fe->extf treats v as discussed
below.
fe: This is the current formatting environment.
The return value rv of fe->extf directs further processing.
There are two cases. When pos$ is present, a negative
return value means to ignore fe in further argument process-
ing while a non-negative return value is treated as the case
rv == 0 below. When pos$ is not present, fe->extf is called
per argument immediately before pattern processing and its
return values are treated as below:
rv < 0:
The environment stack is immediately popped.
rv == 0:
The extension function has not consumed (in a scanning
case) or output (in a printing case) data out of or
into the given stream f. The fields fmt, flags, size,
width, precis and base of fe shall direct further pro-
cessing.
For sfprintf() functions, if fe->flags has the bit
SFFMT_VALUE, fe->extf should have set *v to the value
to be processed; otherwise, a value should be obtained
from the argument list. Likewise, for sfscanf() func-
tions, SFFMT_VALUE means that *v should have a suitable
address; otherwise, an address to assign value should
be obtained from the argument list.
When pos$ is present, if fe->extf changes fe->fmt, this
pattern shall be used regardless of the pattern defined
in the format string. On the other hand, if fe->fmt is
unchanged by fe->extf, the pattern in the format string
is used. In any case, the effective pattern should be
one of the standardly defined pattern. Otherwise, it
shall be treated as unmatched.
rv > 0:
The extension function has accessed the stream f to the
extent of rv bytes. Processing of the current pattern
ceases except that, for scanning functions, if fe-
>flags does not contain the bit SFFMT_SKIP, the assign-
ment count shall increase by 1.
void va_copy(va_list to, va_list fr)
This macro function portably copies the argument list fr to
the argument list to. It should be used to set the field
Sffmt_t.args.
long sffmtversion(Sffmt_t* fe, int type)
This macro function initializes the formatting environment
fe with a version number if type is non-zero. Otherwise, it
returns the current value of the version number of fe. This
is useful for applications to find out when the format of
the structure Sffmt_t changes. Note that the version number
corresponds to the Sfio version number which is defined in
the macro value SFIO_VERSION.
David Korn
yyy@xxxxxxxxxxxxxxxx
|