Austin Group Minutes of the 27 May Teleconference Austin-213 Page 1 of 1 Submitted by Andrew Josey, The Open Group. May 28, 2004 Attendees Andrew Josey, The Open Group Don Cragun , Sun, PASC OR Ulrich Drepper, Red Hat Nick Stoughton, USENIX, WG15 OR Glenn Fowler, AT&T Apologies Dave Butenhof, HP Mark Brown, IBM, TOG OR Joanna Farley, Sun Draft Status --------------- No new status to report. The hardcopy run is due back from the printers, there are two copies left remaining. http://www.opengroup.org/bookstore/catalog/t041y.htm Defect Report Processing ------------------------- The group picked up on the latest batch of defect reports, which are available at the following URL: http://www.opengroup.org/austin/aardvark/latest/ XBD ERN 7 , BRE nested subpatterns (XBD BRE defn vs regcomp()) OPEN Further investigations during the week led to a number of responses, with a proposal from Glenn Fowler. A summary is follows below: -start summary Observations of historical sed implementation behavior need to determine if the sed implementations use POSIX regcomp()/regexec() and return '' because the underlying regcomp()/regexec() returns ''. The correlation between sed and regcomp()/regexec() is important because regcomp()/regexec() implementations that return '' are not compliant, and have not been since at least 1997. If the regcomp()/regexec() implementations for those sed implementations using regcomp()/regexec() were fixed to comply with the standard then those sed implementations would return '', and the 'historical sed' argument for those sed implementations would become the 'historical regcomp()/regexec()' argument. Assuming the intent of the standard, stated or not, would be to keep the BRE and regcomp()/regexec() descriptions consistent, the 2001 BRE addition should be fixed. Otherwise there is a chance that this addition, as it stands, could be used to invalidate (or 'unspecify') portions of the regcomp()/regexec() description, with affects well beyond the scope of sed. As ERN-7 suggests, one solution to the discrepancy would be to fix the 2001 addition to 9.3.6 (item 3.) to include the affects of nested subexpressions on back-reference expressions. This seems like a reasonable course of action since it corrects a problem introduced in 2001 and leaves the regcomp()/regexec() description (which predates the 2001 addition) intact. The proposed change is based on the regcomp()/regexec() description of the regexec() pmatch array. [* original 9.3.6 (item 3.) text *] 3.The back-reference expression '\n' shall match the same (possibly empty) string of characters as was matched by a subexpression enclosed between "\(" and "\)" preceding the '\n' . The character 'n' shall be a digit from 1 through 9, specifying the nth subexpression (the one that begins with the nth "\(" from the beginning of the pattern and ends with the corresponding paired "\)" ). The expression is invalid if less than n subexpressions precede the '\n'. [* replace this text *] For example, the expression "\(.*\)\1$" matches a line consisting of two adjacent appearances of the same string, and the expression "\(a\)*\1" fails to match 'a' . When the referenced subexpression matched more than one string, the back-referenced expression shall refer to the last matched string. If the subexpression referenced by the back-reference matches more than one string because of an asterisk ( '*' ) or an interval expression (see item (5)), the back-reference shall match the last (rightmost) of these strings. [* with this text *] The string matched by a contained subexpression shall be within the string matched by the containing subexpression. If the containing subexpression does not match, or if there is no match for the contained subexpression within the string matched by the containing subexpression then back-reference expressions corresponding to the contained subexpression shall not match. When a subexpression matches more than one string, a back-reference expression corresponding to the subexpression shall refer to the last matched string. For example, the expression "^\(.*\)\1$" matches lines consisting of two adjacent appearances of the same string, the expression "\(a\)*\1" fails to match 'a', the expression "\(a\(b\)*\)*\2" fails to match 'abab', and the expression "^\(ab*\)*\1$" matches 'ababbabb' but fails to match 'ababbab'. Also, I ran ERN-7 by Doug McIlroy and he noted that the sed substitute command description only specifies what is substituted when a backreference expression refers to a subexpression that matches: The characters "\n", where n is a digit, shall be replaced by the text matched by the corresponding backreference expression. This should probably be revised to handle cases where the backreference expression does not match: The characters "\n", where n is a digit, shall be replaced by the text matched by the corresponding backreference expression, or by the empty string if the the corresponding backreference expression does not match. -end summary The feeling is that an interpretation will be needed, the standard is unclear and we need to clarify for the next revision. The relevant sections of the standard are as follows: XBD p 174 6.3.6 XCU p 846 substitute command Glenn is taking an action to review other uses of BRE within the standard and report back to the mailing list. This is thus being left open until the next meeting. XBD ERN 11 key_t should be arithmetic type ? Accept as marked below The group agreed with the proposal that key_t be changed in to just be an arithmetic type to be consistent with XSH section 2.12 Data Types, and noted that any implementation that has key_t as a pointer would be broken by this change. It was agreed to put this down the interpretations track. The standard is inconsistent and no conformance distinction can be made for the current standard. The interpretation should include the recommendation that in a future revision we include the change as in the ERN. The proposed change would correct an inconsistency which has been around since at least XPG4. XBD ERN 12 option handling in unistd.h Accept as Marked below This should be treated consistently with XBD ERN 9 and go down the interpretations track as part of the same interpretation. Arising recommendations for a future revision are as follows: In section: Constants for Options and Option Groups Delete in paragraph 1: "If these are undefined, the fpathconf(), pathconf(), or sysconf() functions can be used to determine whether the option is provided for a particular invocation of the application." Change in paragraph 2 from: If a symbolic constant is defined with the value -1, the option is not supported. To: If a symbolic constant is not defined or is defined with the value -1, the option is not supported. Change in paragraph 4 from: The application can check at runtime to see whether the option is supported by calling fpathconf (),pathconf (),orsysconf( ) with the indicated name parameter. To: The application can check at runtime to see whether the option is supported for a particular invocation of the application by calling fpathconf (),pathconf (),orsysconf( ) with the indicated name parameter. XBD ERN 13 sched.h option language Accept as Marked below It was agreed to go with the option 1 in the proposal which is to remove the lead in so that the following sentence would start "The sched_param structure...." : "In addition, if _POSIX_SPORADIC_SERVER or _POSIX_THREAD_SPORADIC_SERVER is defined, the sched_param structure defined in shall contain the following members in addition to those specified above:" This is needed since the option constants may not be defined. This way the margin marker notation would show the optional nature of this requirement. Next Steps ----------- Andrew will update the aardvark reports with the latest inbound defect reports. Andrew will generate some new interpretations. There are a number of open action items outstanding: 1. Don Cragun Pathname Resolution proposal 2. Larry Dwyer system() and threads 3. Joerg Schilling wording for XCU ERN 1 pax 4. Further investigation for XCU ERN 18. The next teleconference call is scheduled for June 10 2004