Email List: Xaustin-group-lX
[All Lists]

Re: fgets/strtok and LINE_MAX

To: shwaresyst@xxxxxxx
Subject: Re: fgets/strtok and LINE_MAX
From: Don Cragun <dcragun@xxxxxxxxx>
Date: Thu, 5 Nov 2009 15:26:08 -0800
Cc: austin-group-l@xxxxxxxxxxxxx
References: <ee3ffd5cb61c11046aea665c63c5adbd@xxxxxx> <8CC2B9DD22E07E8-AF4-14C01@webmail-m088.sysops.aol.com> <20091105102458.GA11253@xxxxxx> <8CC2C4B8D552A03-2DFC-9667@webmail-m095.sysops.aol.com>
The subject line on this used to be: "Re: [1003.1(2008)/Issue 7
0000177]: mismatch between example description and code". I have
changed it back to "Re: fgets/strtok and LINE_MAX" because Bug #177
is about a mismatch between the header of the example in strtok()
(where it says "The following example uses strtok( ) to break a line
into two character strings separated by any combination of <space>,
<tab>, or <newline> characters." and the following code that includes:
key = strtok(line, " \n");
data = strtok(NULL, " \n");
which does not look for <tab> characters.

This discussion and the recent comments added to Bug #177 have nothing to do with strtok().

Please find further comments in-line below...

Cheers,
Don

On Nov 5, 2009, at 8:24 AM, shwaresyst@xxxxxx wrote:

... ... ...
2. {LINE_MAX} is in the class of variables that aren't allowed to return -1 from sysconf(). In the absence of a larger value set at runtime it is required to return the static LINE_MAX from limits.h, which in turn is required to be >= _POSIX2_LINE_MAX, so the test on sysconf()'s return is superfluous.
This is not true.  It is true that LINE_MAX has to be defined in
<limits.h> and may have a value that is less than the maximum line
length supported by the system when an application is running.  But,
as stated on P2063, L65328-65331 in the RETURN VALUE section of the
description of sysconf() ("If the variable corresponding to name is
described in <limits.h> as a maximum or minimum value and the
variable has no limit, sysconf( ) shall return −1 without changing
the value of errno.  Note that indefinite limits do not imply infinite
limits; see <limits.h>."),  sysconf(_SC_LINE_MAX) may indeed return
-1 without setting errno to indicate that the standard utilities
support lines of arbitrary length.


As it's simple enough, I believe below adequately addresses being reasonably robust and shows the NULL is accounted for. Adding the partial line processing I believe would overly obfuscate the call setup requirements the example was intended to show, so replacing the ellipsis in the while loop with code of that nature is more a subject for a separate tutorial example and I've inserted descriptive comments in their place of possible caveats. fp is made an argument so the open/close call logic can be relegated to the calling code and the non-void return type is for reporting error codes.
I will work off-line to come up with a better example for fgets() before
the conference call next Thursday.  I don't know yet if I will try to
make it a tutorial on how to process arbitrary length lines or if I will
just add comments noting that lines can be arbitrarily long and state
that the example has limitations by ignoring this possibility.


Mark

<pre>
#include <limits.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

int f(FILE *fp)
{
long line_max; /* used to avoid multiple sysconf() calls */
char *line; /* the buffer */
/* declare other variables function needs */

/* get implementation instance's max expectable line length and account for fgets() appending a NULL char */
line_max = sysconf(_SC_LINE_MAX) + 1;
As noted above, this can set line_max to zero.

  line = malloc(line_max);               /* get the buffer space */
And, malloc(0) is allowed to return a pointer to a newly allocated
buffer that
is zero bytes long...

if (line=NULL) return ENOMEM; /* error check required by malloc() */
I assume you meant "line == NULL" instead of "line = NULL".  But the
idea of having an example always fail on a system that doesn't limit
line lengths doesn't seem to be setting a good example of how to
write portable code.

/* set up other declared variables needed for the loop's processing */

while (fgets(line, line_max, fp) != NULL) {
/* process returned full or partial line, accounting for there may be embedded NULL chars
along with the appended one; assumes fgets result = line when non-NULL */
};
/* process possible fgets error values returned in errno and handle normal EOF condition */

free(line); /* clean up the dynamic memory usage */
return 0; /* function succeded */
}
</pre>

-----Original Message-----
From: Geoff Clare <gwc@xxxxxx>
To: austin-group-l@xxxxxx
Sent: Thu, Nov 5, 2009 5:24 am
Subject: Re: [1003.1(2008)/Issue 7 0000177]: mismatch between example description and code










shwaresyst@xxxxxx <shwaresyst@xxxxxx> wrote, on 04 Nov 2009:

----------------------------------------------------------------------
(0000278) nick (manager) - 2009-11-04 18:56
http://austingroupbugs.net/view.php?id=177#c278
----------------------------------------------------------------------
A better example:

<pre>
#include <stdio.h>
#include <limits.h>
#include <unistd.h>

void f()
{
  long line_max;
  FILE *fp;
      ...
  if ((line_max = sysconf(_SC_LINE_MAX)) <= 0) {
      line_max = LINE_MAX;
  }
  char line[line_max];
  while (fgets(line, line_max, fp) != NULL) {
      ...
  }
  return;
}

</pre>

Problem: line_max is declared as long. To use it as the array limit
declarator it would have to be '#define line_max =0;' but then
sysconf()
couldn't be used to assign its runtime value. Suggested fix: change
the
line[line_max] declaration to 'char *line;' declaration with a
malloc()/free() pair to assign line a value.
C99 allows variable-length arrays in some circumstances. I think
this one is valid (at least, it is accepted by gcc -std=c99 and
by the Sun Studio compiler when invoked as c99).

However, since line_max could potentially be very large, it would
be better to use malloc() so that if insufficient memory is
available the program will see an error returned by malloc()
instead of just crashing.

The code does have a couple of problems:

1. The size of line[], and the argument to fgets(), should be line_max+1

2. If sysconf() returns -1 to indicate that LINE_MAX is indeterminate
then setting line_max to the LINE_MAX macro from <limits.h> is the
wrong thing to do. It would mean that the fgets() loop needs to
handle partial lines, but the whole point of using a buffer sized
to {LINE_MAX} is so that the program doesn't need to handle partial
lines in the fgets() loop (except perhaps to detect them and report
that the input is not a text file).

One way to fix it would be to abandon the use of LINE_MAX altogether.
Just have a fixed size buffer (of, say, 4096 bytes) and show how to
handle partial lines in the fgets() loop. Alternatively, have the
code use getline() if sysconf() returns -1 and fgets() otherwise.

--
Geoff Clare <g.clare@xxxxxx>
The Open Group, Thames Tower, Station Road, Reading, RG1 1LX, England











<Prev in Thread] Current Thread [Next in Thread>