messageformat is an OpenJS Foundation project that handles both pluralization and gender in applications. It helps keep messages in human-friendly formats, and can be the basis for tone and accuracy that are critical for applications. Pluralization and gender are not a simple challenge, and deciding on which message format to implement can be pushed down the priority list as development teams make decisions on resources. However, this can lead to tougher transitions later on in the process with both technology and vendor lock-in playing a role.

Quick note: The upstream spec is called ICU MessageFormat. ICU stands for International Components for Unicode: a set of portable libraries that are meant to make working with i18n easier for Java and C/C++ developers. If you’ve worked on a project with i18n/l10n, you may have used the ICU MessageFormat without knowing it.

To find out more about messageformat, I spoke with Eemeli Aro, Software Developer at Vincit, and OpenJS Cross Project Council (CPC) member. Aro maintains the messageformat libraries, and actively participates in various efforts to improve JavaScript localization. Aro spoke on “The State of the Art in Localization” at last year’s Node+JS Interactive. Aro is an active participant in ECMA-402 processes, runs the monthly HelsinkiJS meetups, and helps organise React Finland conferences.

How do formats deal with nuances in language?

It’s all about choices. Variance, e.g. how the greeting used by a program could vary from one instance to the next, gets dealt with by having the messaging format that you’re using support the ability to have choices. So you can have some random number coming in and depending on the choice of that random number, you select one of a number of choices. This functionality isn’t directly built into ICU MessageFormat, but it’s very easily implementable in a way that gets you results.

We need to decide how we deal with choices and whether you can have just a set number of different choice types. Is it a sort of generic function that you can define and then use? It’s an interesting question, but ICU MessageFormat doesn’t yet provide an easy, clear answer to that. But it provides a way of getting what you want.

What are the biggest problems with messaging formats?

Perhaps the biggest problem is that while ICU MessageFormat is the closest we have to a standard, that doesn’t mean it is in standard use by everyone. There are a number of different other standards. There are various versions that are used by a number of tools and workflows and other processes in terms of localization. The biggest challenge is that when you have some kind of interface and you want to present some messages in that interface, there isn’t one clear solution that’s always the right one for you.

And then it also becomes challenging because, for the most part, almost any solution that you end up with will solve most of the problems that you have. This is the scope in which it’s easy to get lock-in. Effectively, if you have a workflow that works with one standard or one set of tools or one format that you’re using, then you have some sort of limitation. Eventually, at some point, you will want to do something that your systems aren’t supporting. You can feel like it’s a big cost to change that system, and therefore you make do with what you have, and then you get a suboptimal workflow and a suboptimal result. Eventually, your interface and whole project may not work as well.

It’s easy to look at messageformat and go, “That’s too complicated for us, let’s pick something simpler.” You end up being stuck with “that something simpler” for the duration of whatever it is that you’re working on.

You’re forced to make a decision between two bad options. So the biggest challenge is it would be nice to have everyone agree that “this is the right thing to do” and do it from the start! (laughs)

But of course that is never going to happen. When you start building an interface like that, you start with just having a JSON file with keys and messages. That will work for a long time, for a really great variety of interfaces, but it starts breaking at some point, and then you start fixing it, and then your fix has become your own custom bespoke localization system.

#messaging #messaging apis #messaging challenges

messageformat is Working Hard to Make Themselves Obsolete
1.15 GEEK