Minutes of the 26 September 2019 Teleconference Austin-971 Page 1 of 1 Submitted by Geoff Clare, The Open Group. 27th September 2019 Attendees: Don Cragun, IEEE PASC OR Nick Stoughton, USENIX, ISO/IEC JTC 1/SC 22 OR Joerg Schilling, FOKUS Fraunhofer Geoff Clare, The Open Group Eric Blake, Red Hat, Open Group OR Mark Ziegast, SHware Systems Dev. Apologies: Andrew Josey, The Open Group * General news None * Outstanding actions (Please note that this section has been flushed to shorten the minutes - to locate the previous set of outstanding actions, look to the minutes from 13th June 2019 and earlier) Bug 1254: "asynchronous list" description uses "command" instead of "AND-OR list" OPEN http://austingroupbugs.net/view.php?id=1254 Action: Joerg to investigate how his shell behaves. Bug 700 - Nick to raise this issue with the C committee Bug 713 - Nick to raise with the C committee. Bug 739 - Nick to raise with the C committee. Bug 1216 - Eric to ask if The Open Group is willing to sponsor this interface, referencing bug note 4478. * Current Business Bug 1190: backslash has two special meanings in the shell and only loses one of them in bracket expressions Accepted as Marked http://austingroupbugs.net/view.php?id=1190 (This bug was resolved in the 23rd September teleconference, but was omitted from the previous minutes.) This item is tagged for TC3-2008. Interpretation response ------------------------ The standard is unclear on this issue, and no conformance distinction can be made between alternative implementations based on this. This is being referred to the sponsor. Rationale: ------------- None. Notes to the Editor (not part of this interpretation): ------------------------------------------------------- On page 184 line 6087 section 9.3.5 RE Bracket Expression, change: The special characters '.', '*', '[', and '\\' (, , , and , respectively) shall lose their special meaning within a bracket expression. to: When the bracket expression appears within a BRE, the special characters '.', '*', '[', and '\\' (, , , and , respectively) shall lose their special meaning within the bracket expression. When the bracket expression appears within an ERE, the special characters '.', '(', '*', '+', '?', '{', '|', '$', '[', and '\\' (, , , plus-sign>, , , , dollar-sign>, , and , respectively) shall lose their special meaning within the bracket expression; ('^') shall lose its special meaning as an anchor. When the bracket expression appears within a shell pattern (see [xref to XCU 2.13]), the special characters '?', '*', and '[' (, , and , respectively) shall lose their special meaning within the bracket expression; whether or not ('\\') loses its special meaning as a pattern matching character is described in [xref to XCU 2.13.1], but in contexts where a shell-quoting can be used it shall retain its special meaning (see [xref to XCU 2.2]). For example: $ ls ! $ - \ a b c $ echo [a\-c] - a c $ echo [\!a] ! a $ echo ["!\$a-c"] ! $ - a c $ echo [!"\$a-c"] ! \ b $ echo [!\]\\] ! $ - a b c Bug 1234: in most shells, backslash doesn't have two meaning wrt pattern matching Accepted as Marked http://austingroupbugs.net/view.php?id=1234 This item is tagged for TC3-2008. Interpretation response ------------------------ 1. The standard clearly states in XCU 2.13.1 that backslash has an escaping role in shell patterns that is distinct from its role as a quoting character, and conforming implementations must conform to this. 2. The standard states in XCU 2.13.3 that patterns in pathname expansion are matched against existing files regardless of the pattern contents, and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor. Rationale: ------------- 1. Although existing practice in some shells is not to treat backslash as special in situations where shell quoting does not affect the pattern (such as in word expansions when a pattern used in pathname expansion is "indirect", i.e. not present in the original word but resulting from an earlier expansion), relaxing the standard to allow this behavior would be undesirable, as it would mean that the only way to match a literal '?', '*' or '[' would be to put them in a bracket expression, unlike all other contexts where these characters are special and they can be escaped with backslash. Application writers should be able to use an unquoted unescaped backslash that is not inside a bracket expression in a pattern and have it interpreted the same way across the shell (in all contexts), find, pax, fnmatch() and glob(). This was the aim of the original POSIX.2-1992 developers in having all of those parts of the standard, where they talk about pattern matching, reference what is now XCU 2.13. It is unfortunate that the issue of patterns in shell variables did not come to light earlier, thus allowing the current discrepancy in some shells to persist for several years instead of being corrected long ago. However, the goal of consistency across all uses of pattern matching is still as worthwhile now as it was in 1992. 2. Existing practice in most shells that do treat backslash as special in "indirect" patterns in pathname expansions is only to match patterns against existing pathnames if the pattern includes a '*', '?' or '[' that is treated as special. This prevents accidental removal of backslash characters in variable expansions where generating a list of matching files is not intended and a (usually oddly named) file with a matching name happens to exist. Notes to the Editor (not part of this interpretation): ------------------------------------------------------- On page 2382 line 76210 section 2.13.1, change: The following patterns matching a single character shall match a single character: ordinary characters, special pattern characters, and pattern bracket expressions. The pattern bracket expression also shall match a single collating element. A character shall escape the following character. The escaping shall be discarded. If a pattern ends with an unescaped , it is unspecified whether the pattern does not match anything or the pattern is treated as invalid. to: The following patterns shall match a single character: ordinary characters, special pattern characters, and pattern bracket expressions. The pattern bracket expression also shall match a single collating element. In a pattern, or part of one, where a shell-quoting can be used, a character shall escape the following character as described in [xref to 2.2.1], regardless of whether or not the is inside a bracket expression. (The sequence "\\" represents one literal .) In a pattern, or part of one, where a shell-quoting cannot be used to preserve the literal value of a character that would otherwise be treated as special: * A character that is not inside a bracket expression shall preserve the literal value of the following character, unless the following character is in a part of the pattern where shell quoting can be used and is a shell quoting character, in which case the behavior is unspecified. * For the shell only, it is unspecified whether or not a character inside a bracket expression preserves the literal value of the following character. All of the requirements and effects of quoting on ordinary, shell special, and special pattern characters shall apply to escaping in this context, except where specified otherwise. (Situations where this applies include word expansions when a pattern used in pathname expansion is not present in the original word but results from an earlier expansion, or the argument to the find -name or -path primary as passed to find, or the pattern argument to the fnmatch() and glob() functions when FNM_NOESCAPE or GLOB_NOESCAPE is not set in flags respectively.) If a pattern ends with an unescaped , the behavior is unspecified. On page 2382 line 76216 section 2.13.1 change: An ordinary character is a pattern that shall match itself. It can be any character in the supported character set except for NUL, those special shell characters in [xref to 2.2] that require quoting, and the following three special pattern characters. Matching shall be based on the bit pattern used for encoding the character, not on the graphic representation of the character. If any character (ordinary, shell special, or pattern special) is quoted, that pattern shall match the character itself. The shell special characters always require quoting. When unquoted and outside a bracket expression, ... to: An ordinary character is a pattern that shall match itself. In a pattern, or part of one, where a shell-quoting can be used, an ordinary character can be any character in the supported character set except for NUL, those special shell characters in [xref to 2.2] that require quoting, and the three special pattern characters described below. In a pattern, or part of one, where a shell-quoting cannot be used to preserve the literal value of a character that would otherwise be treated as special, an ordinary character can be any character in the supported character set except for NUL and the three special pattern characters described below. Matching shall be based on the bit pattern used for encoding the character, not on the graphic representation of the character. If any character (ordinary, shell special, or pattern special) is quoted, or escaped with a , that pattern shall match the character itself. The application shall ensure that it quotes or escapes any character that would otherwise be treated as special, in order for it to be matched as an ordinary character. When unquoted, unescaped, and not inside a bracket expression, ... On page 2383 line 76232 section 2.13.1, delete: When pattern matching is used where shell quote removal is not performed (such as in the argument to the find -name primary when find is being called using one of the exec functions as defined in the System Interfaces volume of POSIX.1-2017, or in the pattern argument to the fnmatch() function), special characters can be escaped to remove their special meaning by preceding them with a character. This escaping is discarded. The sequence "\\" represents one literal . All of the requirements and effects of quoting on ordinary, shell special, and special pattern characters shall apply to escaping in this context. On page 2384 line 76271 section 2.13.3, change: 3. Specified patterns shall be matched against existing filenames and pathnames, as appropriate. Each component that contains a pattern character shall require read permission in the directory containing that component. Any component, except the last, that does not contain a pattern character shall require search permission. to: 3. If a specified pattern contains any '*', '?' or '[' characters that will be treated as special (see [xref to 2.13.1]), it shall be matched against existing filenames and pathnames, as appropriate. Each component that contains any such characters shall require read permission in the directory containing that component. Each component that contains a that will be treated as special may require read permission in the directory containing that component. Any component, except the last, that does not contain any '*', '?', or '[' characters that will be treated as special shall require search permission. On page 2384 line 76295 section 2.13.3, add: 4. If a specified pattern does not contain any '*', '?' or '[' characters that will be treated as special, the pattern string shall be left unchanged. On page 3748 line 128686 section C.2.13.1, change: Calling a utility or function without going through a shell, as described for find and the fnmatch() function defined in the System Interfaces volume of POSIX.1-2017. to: Calling a utility or function without going through a shell, as described for find and the fnmatch() and glob() functions defined in the System Interfaces volume of POSIX.1-2017, or pattern matching in the shell in situations where the pattern is specified indirectly instead of directly to the shell, such as ls -ld -- $pattern or case $var in ($pattern) .... On page 3748 line 128696 section C.2.13.1 change: pax −r ... "*a\(\?" to: pax −r ... "*a(\?" On page 3748 line 128697 section C.2.13.1, add these new paragraphs after the numbered list: The wording "In a pattern, or part of one, where a shell-quoting cannot be used to preserve the literal value of a character that would otherwise be treated as special" has been carefully crafted so that for the shell it only applies to certain contexts. In particular: * The use of "or part of one" is needed because a single pattern can be produced partly from characters directly included in a word and partly from characters that result from one or more of the word expansions. For example, in the following command the escapes the '?' character: dir='abc\?' ls -l -- $dir/*.c * The reference to "a shell-quoting " rather than just using "where shell quoting cannot be used" is because there are ways that other types of shell quoting can be used where a shell-quoting cannot, such as placing an expansion within double-quotes as in this example: dir='abc?' ls -l -- "$dir"/*.c * The use of "that would otherwise be treated as special" is needed because otherwise the condition would apply to in single-quotes. For example, in the following command the is not treated as escaping the '?' because the '?' would not be treated as special anyway: ls -l 'abc\?'/*.c In patterns specified indirectly to the shell, it is unspecified whether or not is special inside bracket expressions. This is because there are two mutually exclusive consistency aims and neither is considered more important than the other. One is consistency with direct patterns, where is special inside bracket expressions (which is, in turn, for consistency with the way single-quotes and double-quotes preserve the literal value of characters inside bracket expressions); the other is consistency with regular expressions, find, pax, fnmatch(), and glob(), where is not special inside bracket expressions (not counting the extra C-string escaping in EREs in awk). Earlier versions of this standard allowed two behaviors when a pattern ends with an unescaped : it could match nothing or be treated as an invalid pattern. However, a third behavior has since been observed, where the ending is treated as a literal , and therefore this standard now simply states that the behavior is unspecified. On page 3748 line 128698 section C.2.13.1 change: Conforming applications are required to quote or escape the shell special characters (sometimes called metacharacters). If used without this protection, syntax errors can result or implementation extensions can be triggered. For example, the KornShell supports a series of extensions based on parentheses in patterns. to: Earlier versions of this standard included the statement "The shell special characters always require quoting" in [xref to XCU 2.13.1]. It is unclear what was intended by this, since there are pattern matching contexts in which it is not possible to quote those characters, such as: execlp("find", "find", ".", "-name", "*[()]*", (char *)0); where the parentheses cannot be escaped with a because is not special in bracket expressions in that context. The statement is thought to have been a warning to application writers and interactive shell users that shell special characters (sometimes called metacharacters) always need quoting in patterns that appear directly in shell code; for example, this code: case $char in [()]) ... ;; esac is incorrect because the parentheses are parsed as operators - they need to be quoted in order to be treated as part of the pattern. This standard now simply requires instead that applications quote or escape any character that would otherwise be treated as special, in order for it to be matched as an ordinary character. If shell special characters are used without this protection in contexts where they are treated as special, syntax errors can result or implementation extensions can be triggered. Some shells support a series of extensions based on parentheses in patterns that are valid extensions in these contexts because they would otherwise cause syntax errors. However, this means that they are not allowed by this standard to be recognized in contexts where those syntax errors would not occur anyway, such as in: pattern='a*(b)'; ls -- $pattern which this standard requires to list files with names beginning 'a' and ending "(b)". It is recommended that implementations do not extend pattern matching in the shell in ways that are only valid extensions because they would otherwise be syntax errors, in order to avoid inconsistency between different pattern matching contexts. One way to provide an extension that is consistent between different pattern matching contexts in the shell (although still not consistent with find -name, fnmatch(), etc.) is to enable the extension only when a non-standard shell option is set, or when the shell is executed using a command name other than sh. Consistency with non-shell contexts can then be achieved by enabling equivalent extensions in those other contexts by use of non-standard utility options or non-standard FNM_* and GLOB_* flags. On page 3749 line 128725 section C.2.13.3, add a new paragraph: Patterns are matched against existing filenames and pathnames only when the pattern contains a '*', '?' or '[' character that will be treated as special. This prevents accidental removal of backslash characters in variable expansions where generating a list of matching files is not intended and a (usually oddly named) file with a matching name happens to exist. For example, a shell script that tries to be portable to systems that predate the introduction of functions and printf might use this on POSIX systems: myecho='printf %s\n' to be used as: $myecho args... If %s\n were to be matched against existing files, this would not work if a file called %sn happened to exist. Bug 1284: The sense of "checksum" test is too narrow. Accepted http://austingroupbugs.net/view.php?id=1284 This item is tagged for TC3-2008. Bug 1285: There should be a line-break before the 2nd trap in the synopsis Accepted as Marked http://austingroupbugs.net/view.php?id=1285 This item is tagged for TC3-2008. Add a paragraph break (troff .P) between the two synopsis lines. Bug 1286: positive increments *increase* the nice value in renice Accepted http://austingroupbugs.net/view.php?id=1286 This item is tagged for TC3-2008. Bug 1288: RE Bracket Expression item 8 should not say "rejected as an error" Accepted http://austingroupbugs.net/view.php?id=1288 This item is tagged for TC3-2008. The outstanding actions list was reviewed: https://collaboration.opengroup.org/operational/mailarch.php?soph=Y&action=show&archive=austin-core-l&num=978 It was noted that the action for bug 900 was completed recently by Eric. Bug 663 was closed in 2015, and the associated action is no longer relevant. We may be able to resolve bug 789 now, depending on whether any bug reports for the new behavior in bash 5 have been received. Next Steps ---------- The next calls are on: September 30 2019 (Monday) This call will be for 60 minutes. October 3 2019 (Thursday) This call will be for 90 minutes. Calls are anchored on US time. (8am Pacific) Please check the calendar invites for dial in details. http://austingroupbugs.net An etherpad is usually up for the meeting, with a URL using the date format as below: https://posix.rhansen.org/p/201x-mm-dd username=posix password=2115756#