Email List: Xaustin-group-lX
[All Lists]

Re: fgets/strtok and LINE_MAX

To: austin-group-l@xxxxxxxxxxxxx
Subject: Re: fgets/strtok and LINE_MAX
From: shwaresyst@xxxxxxx
Date: Tue, 03 Nov 2009 15:26:10 -0500
References: <20090920230313.GV657@xxxxxx> <20091103025454.GA27699@xxxxxx> <8CC2A78725B6067-87B8-1D3ED@webmail-d013.sysops.aol.com> <20091103095250.GA19619@xxxxxx>
I agree the example could be more robust by use of sysconf() and dynamically allocated arrays. The use of static allocation I expect was more for brevity of the example, not to make it robust, and uses the static LINE_MAX as a representative commonly suitable limit. I've no objection to the alignment with the C standard's wording also. However, I believe there still remains a few unanswered questions that some consensus should be reached on before an aardvark could be adequately attempted.

I'm not trying to make a mountain out of a molehill, these are just corollaries I can see the description should explicitly account for as a consequence of stream being any arbitray file and N being any int value. The alternatives I can see would be to explicitly limit the function to use only with files representing cooked streams that can guarantee only well formed lines of less than that static LINE_MAX length will be encountered and the last character before end-of-file is a newline char, or deprecate fgets() in favor of getline().

1) Is the Nth char supposed to be read or is it reserved for the trailing null; i.e. is the application expected to allocate an array with indexes of a) 1..N or 0..N-1, or b) 0..N or 1..N+1 to pass to the function, N bytes or N+1 bytes total size? The static LINE_MAX is used in the example to create a commonly large enough array, but for the degenerate case where N=1 I'd think most would expect fgets to behave like fgetc but with the appended null, so the array would need the N+1 size to accommodate this, and char N is expected to be read. This isn't spelled out, though. Otherwise, it appears the function could read 0 chars without an error indication, always returning just the null character in the buffer. I also believe the function should be required to report an ERANGE type error if the N passed in is <= 0, in addition to the errors fgetc can report as listed, and return a nil pointer instead of a copy of s.

2) Do any implementations presently considered conforming always terminate their reads with a newline before writing the null char, so that a well formed line is presented to the application even if the stream is not formatted this way? If any do, this opens up a further requirement on the passed array, for that degenerate case, of being 3 chars, or N+2, in size.

3) Do any present implementations always read chars until a newline or end-of-file is encountered, only copying what chars initially fit in the buffer passed? I'd hope not, as that could lead a lot of characters being thrown away, but some implementors could have read that as the intended behavior.

4) Should errno be set if the call is terminated after a read where no newline is encountered to indicate that a scan for a newline will fail using the non-nil pointer returned as the function's result, for the case where the implementation doesn't behave as in 2)? I can see two values being needed: a) an EEOFNEXT type, for the case where the call was terminated by reaching end-of-file and the next call would be an attempt to read past there, and then returning the nil pointer as described on that subsequent call, and b) an EBUFOVF type, to indicate the buffer is completely filled, but there are still chars on that logical line that could be processed and the end-of-file is still to be reached.

5) Are implementations expected to return nil pointers once end-of-file is reached even though a partial successful read of chars has occurred? This could also be allowable behavior as written, I'd think, but the method of 4) would lead to applications being less likely to throw away those chars.

6) The description is also silent on what behavior an application can expect if the control character used to specify a soft end-of-file is encountered during a read. Is this ignored and the function reads up to the filesize reported as allocated for that file, in the case of a disk file, or is it honored and the call is terminated at that point, disallowing further reads from the stream as honoring it would also mean setting the EOF condition on the stream? I'd expect the latter as the general case since the stream could be getting chars from a keyboard and that would be the primary means to indicate that all chars the function should process have been encountered and stream should be closed, and explicitly reopened if needed for any further use.

Cheers,
Mark


-----Original Message-----
From: Geoff Clare <gwc@xxxxxx>
To: austin-group-l@xxxxxx
Sent: Tue, Nov 3, 2009 4:52 am
Subject: Re: fgets/strtok and LINE_MAX

shwaresyst@xxxxxx <shwaresyst@xxxxxx> wrote, on 03 Nov 2009:

Could be less a bug in the example than the wording of the function's

behavior. Taking the function description literally, you do have a
case
that there's a discrepancy. Taking the example as an expression of
the
intent of the description, though, it could be interpreted that up to

n-1 characters will be read that aren't newline chars by an
implementation and if it gets that far the nth char is to be assumed
to
be a newline, as the function is intended for use with well formed
text
streams that would have honored the LINE_MAX length when created.
The problem with the example is worse than that.  It uses the value of
the LINE_MAX macro from <limits.h>. This is not necessarily equal to
the true {LINE_MAX} limit.  (It is a "runtime increasable" value.)
Therefore, even if the input lines honour {LINE_MAX}, they could have
lengths exceeding the value of the LINE_MAX macro.

As the
stream can be any type of file, though, this assumption can't be
guaranteed and the description is erroneously silent on whether the
nth
char explicitly should or shouldn't be validated as being that
newline,
and copied to the buffer if so, or is it a malformed line and that
nth
char should also be copied but with just the trailing null, or should
a
newline char be explicitly appended as char n and then the null, with

the actual nth char left as the next char to be read or is it thrown
away.
I think the intention is that the example shows how to read lines
from a text file.  So it doesn't need to deal with "lines" longer
than {LINE_MAX} (except perhaps to report them as an error).  The
code could be fixed by obtaining the true {LINE_MAX} value from
sysconf() and allocating a buffer of {LINE_MAX}+1 bytes.  The
introductory paragraph would also need changing.

The description also does not specify the behavior when nulls are
read
from the file. Copying these could lead an application to believe
that
end-of-file was reached also if the preceding character is anything
other
than a newline.
The description says what fgets() does with bytes read from the
stream.  This description applies to null bytes just the same as
other bytes, since there is no exception stated for null bytes.
However, there does appear to be a wording problem with the
statement:

   "The string is then terminated with a null byte."

If a null byte was read from the stream, then the null added by
fgets() will not terminate the string.

The C Standard words it differently:

   "A null character is written immediately after the last character
   read into the array."

It is careful not to say that this added null character (byte in POSIX
terminology) terminates the string.  In POSIX we should change:

   "The string is then terminated with a null byte."
to

   "A null byte shall be written immediately after the last byte
   read into the array."

As such, the scope of the fgets() description appears to be
deficient.
I'm open to suggestion on whether an aardvark report or
interpretation
request is most suitable to address these, or should this be reported
to
the C working group since the description defers to the C standard.
An aardvark is needed to fix the null byte problem and the example.

--
Geoff Clare <g.clare@xxxxxx>
The Open Group, Thames Tower, Station Road, Reading, RG1 1LX, England






<Prev in Thread] Current Thread [Next in Thread>