Email List: Xaustin-review-lX
[All Lists]

Defect in XCU awk

To: yyyyyyyyyyyyyyy@xxxxxxxxxxxxx
Subject: Defect in XCU awk
From: yyyyyy@xxxxxxxxxxx
Date: Tue, 4 May 2004 23:54:01 +0100 (BST)
        Defect report from : Paul Eggert , UCLA

(Please direct followup comments direct to yyyyyyyyyyyyyy@xxxxxxxxxxxxx)

@ page 177 line 6924 section awk objection {20040504a}

Problem:

Edition of Specification (Year): 2004

Defect code :  1. Error

The C99 standard introduced the notion of hexadecimal floating
constants, and since the POSIX "awk" specification refers to C99,
POSIX "awk" is required to support them.  However, the POSIX
specification was not updated with this C99 change in mind, as as a
result POSIX "awk" is required to support hexadecimal numbers in some
contexts but not others.  This should be fixed, either by requiring
support for hexadecimal numbers everywhere, or disallowing it
everywhere.

Here's the problem.  XCU page 177 lines 6924-6925 contains this
restriction:

  a. An integer constant cannot begin with 0x or include the
     hexadecimal digits 'a', 'b', 'c', 'd', 'e', 'f', 'A', 'B' 'C',
     'D', 'E', or 'F' .

However, restriction (a) contradicts the awk rationale, which says
(XCU page 185 lines 7293-7295):

  The description of numeric string processing is based on the
  behavior of the atof() function in the ISO C standard.  While it is
  not a requirement for an implementation to use this function, many
  historical implementations of awk do.

Restriction (a) evidently was inspired by C89, where atof() did not
parse hexadecimal numbers.  However, in C99 atof() must parse
hexadecimal numbers like "0xa" and "0xap0".  Hence the rationale no
longer matches the text of the standard.

The "awk" specification does not contain any restrictions against
hexadecimal floating constants.  As a result of this
inconsistency, a conforming awk implementation must treat the
hexadecimal floating constant "0xap0" as a number equal to 10,
but "awk" is not allowed to treat the hexadecimal integer constant
"0xa" as a number equal to 10 -- even though atof() does so.

Also, restriction (a) causes an inconsistency with another POSIX
requirement (XCU page 157 lines 6050-6053):

  A string value shall be converted to a numeric value by the
  equivalent of the following calls to functions defined by the ISO C
  standard:

  setlocale(LC_NUMERIC, "");
  numeric_value = atof(string_value);

Hence, for example, the Awk expression ("0xa" + 0 == 10) must evaluate
to 1, even though restriction (a) means that the similar expression
(split("0xa", a) && a[1] == 10) must evaluate to 0 because "0xa" is
not considered to be a numeric string.

I see three possible fixes:

  1. The standard is correct as-is.  Conforming "awk" implementations
     must parse hexadecimal floating constants and must reject
     hexadecimal integer constants, and they must parse numeric
     strings differently from strings explicitly converted to numbers.
     (If this alternative is chosen, the rationale should explain this.)

  2. The intent was for "awk" to disallow hexadecimal numbers; add
     more restrictions that disallow hexadecimal floating constants.

  3. The intent was for "awk" to use atof(), so remove the restriction
     disallowing hexadecimal integer constants.

(1) is entirely unsatisfactory, as it's not internally consistent
and disagrees with the rationale.

(2) is internally consistent, but disagrees with the rationale.

(3) is internally consistent and agrees with the rationale.  It can be
accomplished by removing restriction (a).


Action:

Remove the following text from XCU page 177 lines 6924-6925.

  a. An integer constant cannot begin with 0x or include the
     hexadecimal digits 'a', 'b', 'c', 'd', 'e', 'f', 'A', 'B' 'C',
     'D', 'E', or 'F' .

Append the following text to the awk rationale, after XCU page 185
line 7300:

   Historical implementations of awk did not parse hexadecimal integer
   or floating constants like "0xa" and "0xap0".  Because C99 required
   support for these constants in atof(), support for them is now
   required in awk.  This is a silent change to the awk language: for
   example, the expression ("0xap0" + 0) formerly returned 0, but now
   returns 10.  Due to an oversight, the 2001 through 2004 editions of
   this standard required support only for hexadecimal floating
   constants, but this edition has corrected this to require support
   for hexadecimal integer constants as well.

<Prev in Thread] Current Thread [Next in Thread>