Kai Henningsen wrote:
>
> > I live on Planet Earth. A place where people speak different languages
> > and have different expectations about what any given range includes.
> > Not everyone is a Unix veteran who only uses the C locale.
> >
> > You have often said that you only have U.S. locales on the systems
> > available to you. Your systems may only exhibit 1970s and 1980s
> > behavior with respect to character handling, but most of us have
> > moved way beyond that.
>
> Yes, most of us non-US types have been badly burned by [a-c] including
> upper case letters.
>
> That doesn't mean it's right.
Well, the _right thing_ is what is already written in the Standard,
namely that "Range expressions shall not be used in portable applications
because their behaviour is dependent on the collating sequence".
> Actually, what is really needed, IMNSHO, is the ability to select for
> either C or national locale behaviour on a case-by-case basis.
It really depends on the individual. Being strongly accustomed to the
MS-DOS/Windows behaviour, I read this sub-thread with an amused eye.
Clearly for me, the "right" behaviour is to mix the upper-cased with the
lower-cased, I expect "[a-z]*" to catch all files that have _any_ letters
(which is not possible under Unix unless you cheat in some ways with
the order of the collation rules), I am strongly accustomed to see README
and Makefile being in the middle of the list when I type ls, and least
but not least, I never issue a somewhat dangerous command like "rm [a-c]*"
without first typing "ls [a-c]*" to see what will actually happen.
And yes, I've been burned. Once for sure, perhaps twice; not thrice.
Now, I am certainly not a good representative of the typical Unix user
(although I may be more representative of the typical Linux newbie ;-)).
> And only selecting between them via setting and unsetting LANG is a
> really bad interface.
I agree with you. If you cannot "forget" the POSIX locale behaviour (for
example, because you rely on habits like "vi [A-Z]*", and at the same time
want to have sensible locale settings for other collating tasks, that's
a problem.
But how many "other collating tasks which take sensible locale settings
in account" do you run?
> Both versions are actually necessary.
This is where I am not sure I agree. What about
set LANG=de_DE
set LC_COLLATE=POSIX
plus perhaps some aliases for "sort", etc., as needed. "grep" will
probably require some ad-hoc script (that only turns the locale on when
-i is issued).
Antoine
|