"C" UTF-8 trouble

Andy Koppe andy.koppe@gmail.com
Wed Oct 7 10:08:00 GMT 2009


2009/10/7 Corinna Vinschen:
> Urgh.  So we have to change nl_langinfo in newlib as well.  Do we have
> to return "US-ASCII" if charset is "ASCII", or is it sufficient to
> return __locale_charset() as you did, thus returning "ASCII" for "ASCII"?

I'd assume so, but WWLD?


> And what about stuff like "eucJP" vs. "EUCJP"?  The charset in newlib
> is always uppercase right now.

Hmm. There's also the KOI8s, which turn into CP2[01]866.


> As for Emacs, I'm wondering if it shouldn't be changed to set its locale
> according to setlocale(LC_CTYPE,NULL) instead, given what POSIX says.

Well, yes, but good luck with that. When Ken Brown raised the ^? vs ^H
issue, they told him that sending ^H for backspace should be
considered a bug.



> I, too, think this is a good idea.  __get_locale_env() should be changed
> to return "C.UTF-8".
>
> It would be nice to check /etc/defaults/locale in __get_locale_env() as
> well, but I'm a bit reluctant to do that.  It means, every invocation of
> a Cygwin process has to open that file if the environment isn't set.
> Talking about performance...
>
> Alternatively, the first invocation of Cygwin in a process tree could
> try to read this file only.

Agreed with the last point, but I think setenv("LANG",...) at the
first invocation of Cygwin is a better and simpler solution than
changing __get_locale_env(), because:
- it solves the emacs isssue
- applications will get the same result from setlocale(,"") and
reading the environment variables themselves, so apps that do the
latter don't have to be changed- it's more like Linux
- it doesn't require a newlib change

Andy



More information about the Cygwin-developers mailing list