Garrett Wollman <yyyyyyy@xxxxxxxxxxxxx> wrote:
> > getline()
>
> The interface seems reasonable, if a bit baroque. Will there be
> wide-character versions of these interfaces as well? And why these
> and not the much simpler fgetln() interface from 4.4BSD?
The fgetln() interface may be simpler, but it is inferior since it
does not allow the programmer to allocate the buffer on his own,
and does factually disallow him to use it for other purposes. In
effect, it is not possible to use one memory buffer for reading
from multiple files, or to use multiple buffers for reading from
a single file. The first is sometimes useful to save memory (an
important point since these buffers can, and sometimes do, get
very large); the second is useful e.g. to compare adjacent lines
of a file.
Moreover, the aim of this proposal is (as far as I understand)
to standardize interfaces that are widely used in existing Open
Source applications, and fgetln() is certainly not such an
interface.
I therefore strongly advocate to include getline() and not fgetln().
On wide-character versions: The stdio wide character functionality
is currently of _very_ limited usefulness since its behavior is
essentially undefined when an illegal byte sequence is encountered.
Since about every second file contains such a sequence in the daily
practice of a multibyte locale user, applications that use the wide
character stdio functions are practically unreliable; they usually
take WEOF on EILSEQ as an end-of-file condition and stop processing
immediately, and there is nothing they could really otherwise do
since the position of the file pointer is not defined in that case.
I can only recommend any programmer to avoid stdio wide character
functions except for special limited purposes.
To this particular case, a function that exists to read lines of
arbitrary length but may fail as soon as it encounters an illegal
byte sequence would be worth a laugh, at best.
Before considering to invent further wide character stdio functions
(be sure, few people would miss them in practice) one should consider
to make the existing API reliable. Since POSIX imposes many additional
restrictions on multibyte characters in comparison to ISO C, POSIX may
be the right place to address this issue.
This issue also affects the utilities. Utilities that process text
files are currently allowed to abort when they encounter an illegal
byte sequence in their input. Utilities that actually do so are
almost useless in daily practice, although I have seen vendors
shipping them. The easiest fix for this would be to state that
a text file may contain illegal byte sequences, and that utilities
processing text files are advised to discard them unless otherwise
stated. A similar policy might make sense for stdio.
Gunnar
|