NOTICE: This is an unapproved draft, it is a work in progress, subject to change. Updated 14 July 2003, add new rdvk item Updated 18 July 2003, update after meeting _____________________________________________________________________________ COMMENT Enhancement Request Number 1 dwc@spartan.eng.sun.comDefect in XRAT A.12.2 Guidelines rationale (rdvk# 1) {Sun-dwc-USG_rationale} Wed, 2 Oct 2002 17:34:41 -0700 (PDT) ______________________________________________________________________________ Accept_X___ Accept as marked below_____ Duplicate_____ Reject_____ Rationale for rejected or partial changes: _____________________________________________________________________________ Page: 72 Line: 2948-2949 Section: A.12.2 Problem: Defect code : 1. Error The paragraph on XRAT6, P72, L2948-2949 in the rationale concerning Utility Syntax Guidelines 1 and 2 is to be out of date. First, some background: This paragraph made perfect sense when the guidelines were merged into XPG4 from SVID3 because the guidelines in SVID3 didn't reference the portable filename character set. SVID3 Rule 2 was just: "Command names must include lower-case letters and digits only." implying that any lower-case letters and digits from the current locale could be used. The XBD4 text for Guideline 2 was: "Utility names should include lower-case letters (the lower character classification) and digits only from the portable character set." but also included the statements: "Guidelines 1 and 2 are offered as guidance for locales using Latin alphabets. No recommendations are made by this document set concerning utility naming in other locales." "In the XCU specification, Section 2.9.1, Simple Commands, it is further stated that a command used in the XSI Shell Command Language cannot be named with a trailing colon." which now appear unchanged (except for the cross references) in XRAT6, P73, L2948-2952. POSIX.2-1992 specified Guideline 2 to be: "Utility names should include lowercase letters (the lower character classification) from the set described in 2.4 and digits only." The "2.4" reference in POSIX.2-1992 was to the portable character set. The two paragraphs above from XBD4 did not appear in POSIX.2-1992 in normative text nor in the rationale. Second, the meat of the problem: Guidelines 1 and 2 are meant for all locales; not just locales using Latin alphabets. The standard already requires that all character sets used by standard locales provide a superset of the characters specified in the portable character set. So, as long as characters used in utility names are in the portable character set, they can be run in any locale (for this discussion, I'm going to ignore problems that are caused by incompatible codesets like ASCII and EBCDIC). If any other characters are used in utility names, the utility may not be able to be invoked in a different locale (due to other characters being invalid characters in other locales). Action: Change: Guidelines 1 and 2 are offered as guidance for locales using Latin alphabets. No recommendations are made by IEEE Std 1003.1-2001 concerning utility naming in other locales. on XRAT6, P73, L2948-2949 to: Guidelines 1 and 2 encourage utility writers to use only characters from the portable character set because use of locale specific characters may make the utility inaccessible from other locales. Use of uppercase letters is discouraged due to problems associated with porting utilities to systems that don't distinguish between uppercase and lowercase characters in file names. Use of non-alphanumeric characters is discouraged due to the number of utilities that treat non-alphanumeric characters in "special" ways depending on context (such as the shell using whitespace characters to delimit arguments, various quote characters for quoting, to introduce variable expansion, ...). _____________________________________________________________________________ COMMENT Enhancement Request Number 2 wart@tepkom.ru Defect in XRAT 2.4.6 Arithmetic Expansion (rdvk# 2) {42} Wed, 25 Jun 2003 10:08:04 +0100 (BST) _____________________________________________________________________________ Accept_____ Accept as marked below_X___ Duplicate_____ Reject_____ Rationale for rejected or partial changes: Changes to 2.6.4: Add a new paragraph to the normative text 2.6.4 before "As an extension, the shell may recognize arithmetic expressions beyond those listed..." "All changes to variables in an arithmetic expression shall be in effect after the arithmetic expansion, as in the parameter expansion ${x=value}. If the shell variable x contains a value that forms a valid integer constant then the arithmetic expansions $((x)) and $(($x)) shall return the same value." Changes to Rationale C2.6.4: Change from: "The "(())" form of KornShell arithmetic in early proposals was omitted. The standard developers concluded that there was a strong desire for some kind of arithmetic evaluator to replace expr, and that relating it to '$' makes it work well with the standard shell language, and it provides access to arithmetic evaluation in places where accessing a utility would be inconvenient." To: "The standard developers agreed that there was a strong desire for some kind of arithmetic evaluator to provide functionality similar to expr, that relating it to '$' makes it work well with the standard shell language and provides access to arithmetic evaluation in places where accessing a utility would be inconvenient." Change from: "The syntax and semantics for arithmetic were changed for the ISO/IEC 9945-2:1993 standard. The language is essentially a pure arithmetic evaluator of constants and operators (excluding assignment) and represents a simple subset of the previous arithmetic language (which was derived from the KornShell "(())" construct). ......" To: "The syntax and semantics for arithmetic were revised for the ISO/IEC 9945-2:1993 standard. The language represents a simple subset of the previous arithmetic language (which was derived from the KornShell "(())" construct). ...." Add after the above paragraph: "The standard requires assignment operators to be supported (as listed in Section 2.7.1), and since arithmetic expansions are not specified to be evaluated in a subshell environment, changes to variables there have to be in effect after the arithmetic expansion, just as in the parameter expansion ${x=value}. Note, however, that $(( x=5 )) need not be equivalent to $(( $x=5 )). If the value of the environment variable x is the string "y=", the expansion of $(( x=5 )) would set x to 5 and output 5, but $(( $x=5 )) output 0 if the value of the environment variable y is not 5 and would output 1 if the environment variable y is 5. Similarly, if the value of the environment variable is 4, the expansion of $(( x=5 )) would still set x to 5 and output 5, but $(( $x=5 )) (which would be equivalent to $(( 4=5 ))) would yield a syntax error. " Add to the end of the paragraph beginning "The portion of the ISO C standard arithmetic operations selected corresponds to the operations historically supported in the KornShell.": "In addition to the exceptions listed in section 2.6.4, the use of the following are explicitly outside the scope of the rules defined in section 1.7.2.1: * The prefix operator & and the [], -> and . operators. * Casts." Add into the rationale before the paragraph commencing: "Although the ISO/IEC 9899:1999 Standard now..." (this text from PASC Interp 1003.2-208) "The standard is intentionally silent about how a variable's numeric value in an expression is determined from its normal "sequence of bytes" value. It could be done as a text substitution, as a conversion like that performed by strtol(), or even recursive evaluation. Therefore, the only cases for which the standard is clear are those for which both conversions produce the same result. The cases where they give the same result are those where the sequence of bytes form a valid integer constant. Therefore, if a variable does not contain a valid integer constant, the behavior is unspecified. For the commands: x=010; echo $((x += 1)) the output must be 9. For the commands: x=' 1'; echo $((x += 1)) the results are unspecified. For the commands: x=1+1; echo $((x += 1)) the results are unspecified. _____________________________________________________________________________ Page: 0 Line: 0 Section: 2.4.6 Problem: Edition of Specification (Year): 2003 Defect code : 3. Clarification required The problem discussed is about standart behaviour of POSIX-compatible shell during arithmetic evaluation in $(()). Initially, the problem was arised in freebsd-standarts, with the following bug report against FreeBSD's /bin/sh (ash): "/bin/sh implements only a subset of the operators in $(( ... )) arithmetic. It also does not understand variable names in arithmetic expressions. This missing feature makes it impossible to run the OpenGroup's POSIX validation test suite because the configuration process for the test suite expects a POSIX system shell and makes heavy use of $((var += number)). $ /bin/sh $ a=1 $ echo $((a += 1)) # should echo 2 and increment a arith: syntax error: "a += 1" " The tests in question are TET3.6 Environment, which have that kind of code: chweikh@hal9000:/tmp $ fgrep '$((' src/posix_sh/api/tcm.sh : $((tet_l1_iccount += 1)) while test $((tet_l1_tpnum += 1)) -le $tet_tpcount : $((tet_l1_tpcount += 1)) ... Let's look at the http://www.opengroup.org/onlinepubs/007904975/utilities/xcu_chap02.html#tag_02_06_04, which is "Arithmetic Expansion": "Next, the shell shall treat this as an arithmetic expression and substitute the value of the expression. The arithmetic expression shall be processed according to the rules given in Arithmetic Precision and Operations , with the following exceptions: Only signed long integer arithmetic is required. Only the decimal-constant, octal-constant, and hexadecimal-constant constants specified in the ISO C standard, Section 6.4.4.1 are required to be recognized as constants. The sizeof() operator and the prefix and postfix "++" and "--" operators are not required. Selection, iteration, and jump statements are not supported." Now, let's go to Arithmetic Precision and Operations, which is http://www.opengroup.org/onlinepubs/007904975/utilities/xcu_chap01.html#tag_01_07_02_01 : "Integer variables and constants, including the values of operands and option-arguments, used by the standard utilities listed in this volume of IEEE Std 1003.1-2001 shall be implemented as equivalent to the ISO C standard signed long data type; floating point shall be implemented as equivalent to the ISO C standard double type. Conversions between types shall be as described in the ISO C standard. All variables shall be initialized to zero if they are not otherwise assigned by the input to the application." So, here I see the requirement for "integer variables" and assignment operators like "+=", which clearly needs variables support. I have filed the same bug against Debian's version of ash shell, which is maintained but one of the Austin group members, Herbert Xu. He claimed that POSIX shell should only support constant arithmetics, and pointed me to the rationale for section 2.6.4, which is here: http://www.opengroup.org/onlinepubs/007904975/xrat/xcu_chap02.html "The syntax and semantics for arithmetic were changed for the ISO/IEC 9945-2:1993 standard. The language is essentially a pure arithmetic evaluator of constants and operators (excluding assignment) and represents a simple subset of the previous arithmetic language (which was derived from the KornShell "(())" construct). The syntax was changed from that of a command denoted by ((expression)) to an expansion denoted by $((expression)). The new form is a dollar expansion ( '$' ) that evaluates the expression and substitutes the resulting value." So, rationale tends to agree with Herbert that only evaluation of constants and operators is required for the compatible shell. In this case I should give preference to the standart itself, but I feel like I need to have some clarification on the subject. Is Rationale for that section outdated, or have I misread the standart? If the latter, why OpenGroup's own tests use this non-standart feature? Action: If I am right, and rationale is simply outdated, then the whole rationale section 2.6.4 should be rewritten to conform to standart itself. Proposal: Proposed changes -- revised 16 July 2003 Changes to 2.6.4: Add a new paragraph to the normative text 2.6.4 before "As an extension, the shell may recognize arithmetic expressions beyond those listed..." "All changes to variables in an arithmetic expression shall be in effect after the arithmetic expansion, as in the parameter expansion ${x=value}. [and add one of ] [1] If the shell variable x contains a value that forms a valid integer constant then the arithmetic expansions $((x)) and $(($x)) shall be equivalent, otherwise they need not be equivalent. or [2] If the shell variable x contains a value that forms a valid integer constant then the arithmetic expansions $((x)) and $(($x)) shall return the same value." Changes to Rationale C2.6.4: Change from: "The "(())" form of KornShell arithmetic in early proposals was omitted. The standard developers concluded that there was a strong desire for some kind of arithmetic evaluator to replace expr, and that relating it to '$' makes it work well with the standard shell language, and it provides access to arithmetic evaluation in places where accessing a utility would be inconvenient." To: "The standard developers agreed that there was a strong desire for some kind of arithmetic evaluator to provide functionality similar to expr, that relating it to '$' makes it work well with the standard shell language and provides access to arithmetic evaluation in places where accessing a utility would be inconvenient." Change from: "The syntax and semantics for arithmetic were changed for the ISO/IEC 9945-2:1993 standard. The language is essentially a pure arithmetic evaluator of constants and operators (excluding assignment) and represents a simple subset of the previous arithmetic language (which was derived from the KornShell "(())" construct). ......" To: "The syntax and semantics for arithmetic were revised for the ISO/IEC 9945-2:1993 standard. The language represents a simple subset of the previous arithmetic language (which was derived from the KornShell "(())" construct). ...." Add after the above paragraph: "The standard requires assignment operators to be supported (as listed in Section 2.7.1), and since arithmetic expansions are not specified to be evaluated in a subshell environment, changes to variables there have to be in effect after the arithmetic expansion, just as in the parameter expansion ${x=value}. Note, however, that $(( x=5 )) need not be equivalent to $(( $x=5 )). If the value of the environment variable x is the string "y=", the expansion of $(( x=5 )) would set x to 5 and output 5, but $(( $x=5 )) output 0 if the value of the environment variable y is not 5 and would output 1 if the environment variable y is 5. Similarly, if the value of the environment variable is 4, the expansion of $(( x=5 )) would still set x to 5 and output 5, but $(( $x=5 )) (which would be equivalent to $(( 4=5 ))) would yield a syntax error. " Add to the end of the paragraph beginning "The portion of the ISO C standard arithmetic operations selected corresponds to the operations historically supported in the KornShell.": "In addition to the exceptions listed in section 2.6.4, the use of the following are explicitly outside the scope of the rules defined in section 1.7.2.1: * The prefix operator & and the [], -> and . operators. * Casts." Add into the rationale before the paragraph commencing: "Although the ISO/IEC 9899:1999 Standard now..." (this text from PASC Interp 1003.2-208) "The standard is intentionally silent about how a variable's numeric value in an expression is determined from its normal "sequence of bytes" value. It could be done as a text substitution, as a conversion like that performed by strtol(), or even recursive evaluation. Therefore, the only cases for which the standard is clear are those for which both conversions produce the same result. The cases where they give the same result are those where the sequence of bytes form a valid integer constant. Therefore, if a variable does not contain a valid integer constant, the behavior is unspecified. For the commands: x=010; echo $((x += 1)) the output must be 9. For the commands: x=' 1'; echo $((x += 1)) the results are unspecified. For the commands: x=1+1; echo $((x += 1)) the results are unspecified.