| To: | austin-group-l@xxxxxxxxxxxxx |
|---|---|
| Subject: | Re: fgets/strtok and LINE_MAX |
| From: | shwaresyst@xxxxxxx |
| Date: | Tue, 03 Nov 2009 15:26:10 -0500 |
| References: | <20090920230313.GV657@xxxxxx> <20091103025454.GA27699@xxxxxx> <8CC2A78725B6067-87B8-1D3ED@webmail-d013.sysops.aol.com> <20091103095250.GA19619@xxxxxx> |
|
I agree the example could be more robust by use of sysconf() and
dynamically allocated arrays. The use of static allocation I expect was
more for brevity of the example, not to make it robust, and uses the
static LINE_MAX as a representative commonly suitable limit. I've no
objection to the alignment with the C standard's wording also. However,
I believe there still remains a few unanswered questions that some
consensus should be reached on before an aardvark could be adequately
attempted. I'm not trying to make a mountain out of a molehill, these are just corollaries I can see the description should explicitly account for as a consequence of stream being any arbitray file and N being any int value. The alternatives I can see would be to explicitly limit the function to use only with files representing cooked streams that can guarantee only well formed lines of less than that static LINE_MAX length will be encountered and the last character before end-of-file is a newline char, or deprecate fgets() in favor of getline(). 1) Is the Nth char supposed to be read or is it reserved for the trailing null; i.e. is the application expected to allocate an array with indexes of a) 1..N or 0..N-1, or b) 0..N or 1..N+1 to pass to the function, N bytes or N+1 bytes total size? The static LINE_MAX is used in the example to create a commonly large enough array, but for the degenerate case where N=1 I'd think most would expect fgets to behave like fgetc but with the appended null, so the array would need the N+1 size to accommodate this, and char N is expected to be read. This isn't spelled out, though. Otherwise, it appears the function could read 0 chars without an error indication, always returning just the null character in the buffer. I also believe the function should be required to report an ERANGE type error if the N passed in is <= 0, in addition to the errors fgetc can report as listed, and return a nil pointer instead of a copy of s. 2) Do any implementations presently considered conforming always terminate their reads with a newline before writing the null char, so that a well formed line is presented to the application even if the stream is not formatted this way? If any do, this opens up a further requirement on the passed array, for that degenerate case, of being 3 chars, or N+2, in size. 3) Do any present implementations always read chars until a newline or end-of-file is encountered, only copying what chars initially fit in the buffer passed? I'd hope not, as that could lead a lot of characters being thrown away, but some implementors could have read that as the intended behavior. 4) Should errno be set if the call is terminated after a read where no newline is encountered to indicate that a scan for a newline will fail using the non-nil pointer returned as the function's result, for the case where the implementation doesn't behave as in 2)? I can see two values being needed: a) an EEOFNEXT type, for the case where the call was terminated by reaching end-of-file and the next call would be an attempt to read past there, and then returning the nil pointer as described on that subsequent call, and b) an EBUFOVF type, to indicate the buffer is completely filled, but there are still chars on that logical line that could be processed and the end-of-file is still to be reached. 5) Are implementations expected to return nil pointers once end-of-file is reached even though a partial successful read of chars has occurred? This could also be allowable behavior as written, I'd think, but the method of 4) would lead to applications being less likely to throw away those chars. 6) The description is also silent on what behavior an application can expect if the control character used to specify a soft end-of-file is encountered during a read. Is this ignored and the function reads up to the filesize reported as allocated for that file, in the case of a disk file, or is it honored and the call is terminated at that point, disallowing further reads from the stream as honoring it would also mean setting the EOF condition on the stream? I'd expect the latter as the general case since the stream could be getting chars from a keyboard and that would be the primary means to indicate that all chars the function should process have been encountered and stream should be closed, and explicitly reopened if needed for any further use. Cheers, Mark -----Original Message----- From: Geoff Clare <gwc@xxxxxx> To: austin-group-l@xxxxxx Sent: Tue, Nov 3, 2009 4:52 am Subject: Re: fgets/strtok and LINE_MAX shwaresyst@xxxxxx <shwaresyst@xxxxxx> wrote, on 03 Nov 2009:
behavior. Taking the function description literally, you do have a case that there's a discrepancy. Taking the example as an expression of the intent of the description, though, it could be interpreted that up to n-1 characters will be read that aren't newline chars by an to be a newline, as the function is intended for use with well formed text streams that would have honored the LINE_MAX length when created.
The problem with the example is worse than that. It uses the value of
the LINE_MAX macro from <limits.h>. This is not necessarily equal to
the true {LINE_MAX} limit. (It is a "runtime increasable" value.)
Therefore, even if the input lines honour {LINE_MAX}, they could have
lengths exceeding the value of the LINE_MAX macro.
As the nth char explicitly should or shouldn't be validated as being that newline, and copied to the buffer if so, or is it a malformed line and that nth char should also be copied but with just the trailing null, or should a newline char be explicitly appended as char n and then the null, with the actual nth char left as the next char to be read or is it thrown away.
I think the intention is that the example shows how to read lines
from a text file. So it doesn't need to deal with "lines" longer
than {LINE_MAX} (except perhaps to report them as an error). The
code could be fixed by obtaining the true {LINE_MAX} value from
sysconf() and allocating a buffer of {LINE_MAX}+1 bytes. The
introductory paragraph would also need changing.
The description also does not specify the behavior when nulls are read from the file. Copying these could lead an application to believe that end-of-file was reached also if the preceding character is anything other than a newline. The description says what fgets() does with bytes read from the stream. This description applies to null bytes just the same as other bytes, since there is no exception stated for null bytes. However, there does appear to be a wording problem with the statement: "The string is then terminated with a null byte." If a null byte was read from the stream, then the null added by fgets() will not terminate the string. The C Standard words it differently: "A null character is written immediately after the last character read into the array." It is careful not to say that this added null character (byte in POSIX terminology) terminates the string. In POSIX we should change: "The string is then terminated with a null byte." to "A null byte shall be written immediately after the last byte read into the array." As such, the scope of the fgets() description appears to be deficient. I'm open to suggestion on whether an aardvark report or interpretation request is most suitable to address these, or should this be reported to the C working group since the description defers to the C standard. An aardvark is needed to fix the null byte problem and the example. -- Geoff Clare <g.clare@xxxxxx> The Open Group, Thames Tower, Station Road, Reading, RG1 1LX, England |
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| ||
| Previous by Date: | [1003.1(2008)/Issue 7 0000167]: getenv() and modifying environ directly, Austin Group Bug Tracker |
|---|---|
| Next by Date: | [1003.1(2008)/Issue 7 0000177]: mismatch between example description and code, Austin Group Bug Tracker |
| Previous by Thread: | Re: fgets/strtok and LINE_MAX, Geoff Clare |
| Next by Thread: | Re: fgets/strtok and LINE_MAX, Don Cragun |
| Indexes: | [Date] [Thread] [All Lists] |