Scott Hanselman

Assert your assumptions - .NET Core and subtle locale issues with WSL's Ubuntu

October 16, 2019 Comment on this post [6] Posted in DotNetCore | Linux
Sponsored By

I thought this was an interesting and subtle bug behavior that was not only hard to track down but hard to pin down. I wasn't sure 'whose fault it was.'

Here's the story. Feel free to follow along and see what you get.

I was running on Ubuntu 18.04 under WSL.

I made a console app using .NET Core 3.0. You can install .NET Core here http://dot.net/get-core3

I did this:

dotnet new console
dotnet add package Humanizer --version 2.6.2

Then made Program.cs look like this. Humanizer is a great .NET Standard library that you'll learn about and think "why didn't .NET always have this!?"

using System;
using Humanizer;

namespace dotnetlocaletest
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(3501.ToWords());
}
}
}

You can see that I want the app to print out the number 3051 as words. Presumably in English, as that's my primary language, but you'll note I haven't indicated that here. Let's run it.

image

Note that app this works great and as expected in Windows.

scott@IRONHEART:~/dotnetlocaletest$ dotnet run
3501

Huh. It didn't even try. That's weird.

My Windows machine is en-us (English in the USA) but what's my Ubuntu machine?

scott@IRONHEART:~/dotnetlocaletest$ locale
LANG=C.UTF-8
LANGUAGE=

Looks like it's nothing. It's "C.UTF-8" and it's nothing. C in this context means the POSIX default locate. It's the most basic. C.UTF-8 is definitely NOT the same as en_US.utf8. It's a locate of sorts, but it's not a place.

What if I tell .NET explicitly where I am?

static void Main(string[] args)
{
Thread.CurrentThread.CurrentUICulture = new CultureInfo("en-US");
Console.WriteLine(3501.ToWords());
}

And running it.

scott@IRONHEART:~/dotnetlocaletest$ dotnet run
three thousand five hundred and one

OK, so things work well if the app declares "hey I'm en-US!" and Humanizer works well.

What's wrong? Seems like Ubuntu's "C.UTF-8" isn't "invariant" enough to cause Humanizer to fall back to an English default?

Seems like other people have seen unusual or subtle issues with Ubuntu installs that are using C.UTF-8 versus a more specific locale like en-US.UTF8.

I could fix this in a few ways. I could set the locale specifically in Ubuntu:

locale-gen en_US.UTF-8
update-locale LANG=en_US.UTF-8

Fortunately Humanizer 2.7.2 and above has fixed this issue and falls back correctly. Whose "bug" was it? Tough one but in this case, Humanizer had some flawed fallback logic. I updated to 2.7.2 and now C.UTF-8 falls back to a neutral English.

That said, I think it could be argued that WSL/Canonical/Ubuntu should detected my local language and/or set locale to it on installation.

The lesson here is that your applications - especially ones that are expected to work in multiple locales in multiple languages - take "input" from a lot of different places. Phrased differently, not all input comes from the user.

System locale and language, time, timezone, dates, are all input as ambient context to your application. Make sure you assert your assumptions about what "default" is. In this case, my little app worked great on en-US but not on "C.UTF-8." I was able to explore the behavior and learn that there was both a local workaround (I could detected and set a default locale if needed) and there was a library fix available as well.

Assert your assumptions!


Sponsor: Suffering from a lack of clarity around software bugs? Give your customers the experience they deserve and expect with error monitoring from Raygun.com. Installs in minutes, try it today!

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Hosting By
Hosted in an Azure App Service
October 20, 2019 11:32
C.UTF-8 as default locale and UTC as default timezone are nice things for server
October 20, 2019 13:35
Believe it or not, we've been asserting our assumptions for years now, because fallback to en-us is unwanted here.

From outside the U.S., things inside the U.S. are very hectic. I mean Americans don't use metric or A4 papers, write the month before the day in dates, and even when they say they are using metric, they still use Fahrenheit instead of Celcius. And since Microsoft is an American corporation, one must always be careful not end up with wrong locale settings.
October 21, 2019 9:04
Seems as though .net core should default to a locale if one is not returned by the operating system.

Per my father, the USA doesn't use metric because President Carter in the 1970s tried to force it on the country and its schools. People revolted and did not accept it; and given the federal power abuses earlier in the 1970s, the USA did not go with a metric only taught in schools.

Per my father, his much younger co-workers are very weak when using fractions as they were taught decimals and a 50/50 mix of English and metric in school. He only started using metric in his 20s at his first job.

It's up there, per him, with how FDR added by stroke of the pen the 12th grade in high school to keep tens of thousands of 18 year olds out of the workforce during the Great Depression. His father graduated high school at 16 as he skipped an entire grade along which all of his classmates in the 1930s.



October 21, 2019 10:27
These are funny; you notice this kind of stuff a lot more if you live outside the US. For example many, many websites insist on showing translated content based on where I live, and I absolutely do not want that. I grew up in the 1970's and 80's, learned everything computer related in English because that was all that was available. And still today, with a lot of translated material available, I do not feel comfortable with my own language, when it comes to computers.
October 22, 2019 12:48
I've also run come across this quite a few times!

My locale is en-GB and timezone is GMT/BST. The problem we quite often run into is that because it's "close enough" to en-US and UTC it's usually quite late before problems with where the input is coming from is picked up.

I recently ran into a very similar problem running a Windows Docker container, which defaults to en-US and UTC - p.s. still haven't actually managed to figure out how the set the locale properly on the container image! Date parsing is particularly problematic because everyone likes to write their dates slightly differently (dd-MM-yyyy vs. MM-dd-yyyy that type of thing).

Handling the changes in code would be nice, but sometimes it's not possible, especially if you're integrating with an application that's relying on OS calls which read the system locale/timezone. That's why it's important to make sure that the underlying OS is configured "correctly".

In Linux you've got the command line that can set the locale and time zone but in Windows it seems to be a lot harder to this via the command line!
Mo
October 23, 2019 12:24
For a lot of .NET devs who have only worked with Windows, things like this will be the biggest causes of headaches as the underlying OS where their applications are running is becoming less relevant.

I know a lot of .NET devs who have only worked with Windows, and have balked at some of the things that Linux/Unix do or how they handle seemingly "standard" things (line endings, permissions, mount points, etc.). Some of these devs have pushed back and demanded that their apps only run on Windows, or refused to support non-Windows based OSs; whereas most of them have accepted that their assumptions need to change slightly, going forward.

Then again, I think that things like this will lead to more stable software with fewer assumptions (to quote Coach Smiley, "never make an assumption, because you will look like an ass and the ump with tion you"), and more in depth tests.

Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.