Email List: Xaustin-review-lX
[All Lists]

Defect in XCU diff (text vs binary)

To: yyyyyyyyyyyyyyy@xxxxxxxxxxxxx
Subject: Defect in XCU diff (text vs binary)
From: yyyyyy@xxxxxxxxxxx
Date: Tue, 13 Jun 2006 06:13:13 +0100 (BST)
        Defect report from : Paul Eggert , UCLA

(Please direct followup comments direct to yyyyyyyyyyyyyy@xxxxxxxxxxxxx)

@ page 321 line 12380 section diff (text vs binary) objection {20060612a}

Problem:

Edition of Specification (Year): 2004

Defect code :  1. Error

POSIX says that empty files are not text files, but common practice is
to treat empty files as text files.  This causes problems,
particularly with "diff", but also with other commands.

1.  POSIX's exclusion of empty files from the set of text files
requires behavior that contradicts all POSIX implementations that I
know of.  For the following script executed in the POSIX locale:

        : > empty
        diff empty empty

POSIX requires that the output of "diff" must contain the word
"differ" (see XCU page 321 line 12382).  I know of no diff
implementation that does this; they all output nothing, which is what
users expect.

2.  Typically, diff implementations sample the first few bytes of a
file.  If any bytes are zero, they treat the file as binary; otherwise
they treat it as text.  In contrast, POSIX requires that "diff" must
inspect the entire input file and strictly check for any encoding
errors or long lines.

3.  POSIX says that portable scripts cannot use standard utilities
like 'grep', 'sed,', 'sort', etc. on empty files.  For example, the
command:

   grep PATTERN FILE | sort | sed 's/$/./'

does not conform to POSIX if FILE contains no lines that match
PATTERN.  Yet this sort of shell script programming is very common
practice, and POSIX should not say that the behavior is undefined
here.

If there is a good reason that empty files are not text files
(compatibility with TENEX, perhaps? :-) then this should be clearly
documented in the rationale.  However, I suspect that whatever reason
may have existed long ago, is no longer valid.


Action:

In XBD page 89 line 2824 (definition of Text File), change from:

  A file that contains characters organized into one or more lines.

to:

  A file that contains characters organized into zero or more lines.

In XCU page 321 line 12380-12382 (Diff Binary Output Format), change
from:

  In the POSIX locale, if one or both of the files being compared are
  not text files, an unspecified format shall be used that contains
  the pathnames of two files being compared and the string "differ".

to:

  In the POSIX locale, if one or both of the files being compared are
  not text files, it is implementation-defined whether "diff" uses the
  binary-file output format or the other formats as specified below.
  The binary-file output format shall contain the pathnames of two
  files being compared and the string "differ".

In XRAT page 32 lines 1232-1233, change:

  The definition allows a file with a single <newline>, but not
  a totally empty file, to be called a text file.

to:

  The definition allows a file with a single <newline>, or
  a totally empty file, to be called a text file.

<Prev in Thread] Current Thread [Next in Thread>
  • Defect in XCU diff (text vs binary), eggert <=