Minutes of the 12 Aug 2010 Teleconference Austin-492 Page 1 of 1 Submitted by Andrew Josey, The Open Group. August 13th , 2010 Attendees Andrew Josey, The Open Group Don Cragun, PASC OR Geoff Clare, The Open Group Nick Stoughton, USENIX, ISO/IEC OR Eric Blake, Red Hat Ulrich Drepper, Red Hat Apologies Mark Brown, IBM, TOG OR * Old Business A proposed consent item was drafted at the last meeting and should be discussed at the next meeting when all three of the Organizational Representatives are present. We had started discussion of the steps needed to start the technical corrigendum process. Andrew agreed to take an action to draft a summary of the next steps. Action: Andrew draft a summary of the next steps to commence a technical corrigendum. * New business We picked up on the filename discussions. Don had posted a mail on issues with "." and ".." and whether filenames are terminated with a null byte. A discussion followed, including a line by line review of bug 291. Bug 291 was updated a number of times during the following discussion to capture the consensus. Another draft Consent item was proposed: Consent item: ------------- Filename is not null terminated --end consent item-- Bug 291 http://austingroupbugs.net/view.php?id=291 It was agreed that bug 291 should be closed as Accept as marked, and that an interpretation is required. Interpretation response: The standard is unclear on this issue, and no conformance distinction can be made between alternative implementations based on this. This is being referred to the sponsor. Rationale: The description provided by the submitter and the notes to the editor below contain rationale for these changes. Notes to the Editor (not part of this interpretation): Based on points raised during the 5 Aug 2010 meeting, the proposed action needs to be reworded. In particular, by carefully defining in which situations a filename represents a character string, and the fact that and are uniquely identifiable across all supported locales, it is possible to have a well-defined meaning behind searching for a character within a (byte) string, even when the string does not consist solely of characters. Doing this reduces the overall amount of changes needed to the rest of the standard. In particular, no term "slash byte" is needed, and no changes to XSH are needed. After line 3589, (XBD 6.1, Portable Character Set), insert a new bullet: The encoded values associated with and shall be invariant across all locales supported by the implementation. At line 3619 (XBD 6.2, Character Encoding), add a sentence: Likewise, the byte values used to encode and shall not occur as part of any other character in any locale. At line 91229 (XCU iconv RATIONALE), add a new paragraph: The iconv utility may support the conversion between ASCII and EBCDIC based encodings, but is not required to do so. In an XSI-compliant implementation, the dd utility is the only method guaranteed to support conversion between these two charsets. At line 91233 (XCU iconv SEE ALSO), add a link to dd. At line 93178 (XCU locale RATIONALE), add a new paragraph: According to , the standard requires that all supported locales must have the same encoding for and , because these two characters are used within the locale-independent pathname resolution sequence. Therefore, it would be an error if 'locale -a' listed both ASCII and EBCDIC based locales, since those two encodings do not share the same representation for either or . Any system that supports both environments would be expected to provide two POSIX locales, one in either codeset, where only the locales appropriate to the current environment can be visible at a time. In an XSI-compliant implementation, the dd utility is the only portable means for performing conversions between the two charsets. At line 2145 (XBD 3.266, Pathname), change one paragraph: A character string that is used to identify a file. In the context of POSIX.1-2008, a pathname may be limited to {PATH_MAX} bytes, including the terminating null byte. It has an optional beginning , followed by zero or more filenames separated by characters. A pathname may optionally contain one or more trailing characters. Multiple successive characters are considered to be the same as one , except for the case of exactly two leading characters. into two paragraphs: A string that is used to identify a file. In the context of , a pathname may be limited to {PATH_MAX} bytes, including the terminating null byte. It has optional beginning characters, followed by zero or more filenames separated by characters. A pathname can optionally contain one or more trailing characters. Multiple successive characters are considered to be the same as one , except for the case of exactly two leading characters. Note: If a pathname consists of only bytes corresponding to characters from the portable filename character set (), characters, and a single terminating character, the pathname will be usable as a character string in all supported locales; otherwise, the pathname might only be a string (rather than a character string). Additionally, since the single-byte encoding of the character is required to be the same across all locales and to not occur within a multi-byte character, references to a character within a pathname are well-defined even when the pathname is not a character string. However, this property does not necessarily hold for the remaining characters within the portable filename character set. At line 2199 (XBD 3.276, Portable Filename Character Set), add a sentence: See also . At line 1647 (XBD 3.136, Dot), change: In the context of naming files, the filename consisting of a single dot character ('.'). to: In the context of naming files, the filename consisting of a single character ('.'). At line 1650 (XBD 3.137, Dot-Dot), change: The filename consisting solely of two dot characters (".."). to: The filename consisting solely of two characters (".."). At line 1782 (XBD 3.170, Filename), change: A name consisting of 1 to {NAME_MAX} bytes used to name a file. The characters composing the name may be selected from the set of all character values excluding the character and the null byte. The filenames dot and dot-dot have special meaning. A filename is sometimes referred to as a "pathname component". to: A sequence of bytes consisting of 1 to {NAME_MAX} bytes used to name a file. The bytes composing the name shall not contain the or characters. In the context of a pathname, each filename shall be followed by a or a character; elsewhere, a filename followed by a character forms a string (but not necessarily a character string). The filenames dot and dot-dot have special meaning. A filename is sometimes referred to as a "pathname component". See also . At line 107745 (XCU test RATIONALE), add a paragraph: It is noted that '[' is not part of the portable filename character set; however, since it is required to be encoded by a single byte, and is part of the portable character set, the name of this utility forms a character string across all supported locales. At line 115603 (XRAT A.4.6, Filenames), change: The file system implementation historically deals only with bytes, not with characters, except for and the null byte. to: The file system implementation historically deals only with bytes, not with characters. Limitations on valid encodings ensure that the byte sequences for the character, character, and character will not be confused with any other character in any locale. However, there exist common single-shift encodings where other single-byte characters from the portable filename character set can also occur as a subset of a multi-byte character, making case-folding of portable filename bytes dependent on the context of whether a shift-state is active. At line 115615 (XRAT A.4.6 Filenames), change: Case folding is inconsistent with portable filename character set definition and filename definition (all characters except and null). No known implementations allowing all characters except and null also do case folding. to: Case folding is inconsistent with the portable filename character set and filename definitions (all bytes except and null). No known implementations allowing all bytes except and null also do case folding. At line 114796 (XRAT A.3 definitions Filename), delete the two existing paragraphs about filename truncation, and replace it with: See . At line 115093 (XRAT A.3 definitions), add a new section: Pathname Pathnames historically allowed all bytes except for the and characters. For compatibility with existing file systems, this usage is maintained throughout the standard by noting that a pathname need not be a valid character string in all locales. However, the properties of the portable filename character set are such that a pathname using only those characters and the is portable in all locales as a character string. At line 115933 (XRAT A.6.2 Character Encoding), add a new paragraph: The encoding for and are required to be the same across all locales, in part because pathname resolution requires recognition of these bytes. It is a fortunate accident that all common shift-based encodings did not use either or as a valid second byte in a multi-byte character. At line 115802 (XRAT A.4.12 Pathname Resolution), add a new paragraph: Earlier versions of this standard were unclear as to whether a pathname was required to be a character string or just a string. The is clear that filenames are just strings, and that pathname processing is locale-independent. Next Steps ---------- The next call will be on August 19th at 16:00 UK time/08:00 Pacific and will return to processing defect reports. http://austingroupbugs.net See the calendar for the list of dialup numbers. An IRC channel will be available for the meeting irc://irc.freestandards.org #austin ICAL: http://www.google.com/calendar/ical/nvctqtstkuni3fab9k3jqtrt4g@group.calendar.google.com/public/basic XML: http://www.google.com/calendar/feeds/nvctqtstkuni3fab9k3jqtrt4g@group.calendar.google.com/public/basic