Localization Done Right

Many US tech companies have difficulties entering foreign markets like China, Japan, and Russia. Many people know that the reason is cultural difference. But few companies take a step forward and fix the localization problem.

It’s hard to find a rockstar programmer at my level. It’s almost impossible to find a rockstar programmer at my level who can also speak 5 distinct languages (like me). I feel I have the responsibility to talk about the correct way to localize your app or website for markets outside the US.

Disclaimer
In this article I give many examples of localization failures of American companies because I suppose most of my readers are American. It doesn’t mean American companies are particularly bad in localization. In fact, localization efforts of Chinese, Japanese, and German companies are as bad as, if not worse than, American companies. Actually, if Alibaba and Tencent can solve their localization problems, Amazon and Facebook may have trouble competing with them.

The Wrong Way

Before going into details of the correct way, I want to talk about the wrong way to do localization first. The wrong way is to compile a list of string literals in your codebase and databases, then hand over the list to translators. Unfortunately, the wrong way is also the most common way of localization.

In the following sections, I will try to point out why this approach is bad and how we can do localization right.

Grammar

One big problem of the string mapping approach is grammar. If you simply translate strings, the grammar can never be correct.

First, if your strings include program generated numbers, you may need to put your number in difference places in different languages. For example, in a video website, you may want to show people how many times the video is viewed. In English, you might include something like “${video.views} views”.

However, in Chinese, the best translation is “觀看次數:${video.views}”.

In English you put the extra string after the number, but in Chinese in need to put an extra string before the number. Therefore, you can never get the correct Chinese translation if you only translate strings. Here is an example of the bad translation from Twitch. Obviously they simply translate “views” into Chinese and put it after the number, resulting a terrible translation.

The example above is far from the worst grammar issue you need to solve for localization. It’s not difficult to solve the word order issue. Many localization APIs allow you to use placeholders for variables and achieve correct translations.

What’s worse is that program generated content might alter strings nearby. For example, if you want to generate a sentence “There is a ${object}”, you might need to do some extra work because object might be “apple”. You need to change “a” to “an” if the following word starts with a vowel.

Say if you are a Chinese developer working for a Chinese company who can only speak Chinese, you might not even be aware of the rule at all. Thus the program doesn’t have anything to find out if the program generated object should use “a” or “an”. There is no way to generate the correct English translation with string matching, even with variable placeholders.

You may think this is not your problem because English grammar is much more complicated than Chinese grammar. You are correct, if you only want to do localization for Chinese. But say if you want to translate the same sentence to German and Japanese, you are out of luck.

For example, if you want to translate “There is a ${object}” for the following three objects: apple, pineapple and man. In English you need to do something different for apple because apple starts with a vowel. In German, you need to do something different for pineapple because it is feminine (for Germans). In Japanese, you need to do something different with man because it is alive.

The only way to have correct Japanese and German translation is to have a function to figure out the gender of every object for Germans, and figure out whether a object is alive for the Japanese. And then we need to allow translators to use complicated function compositions freely to achieve the correct translation. Unfortunately, none of the 3rd party localization frameworks today support this kind of complicated tasks.

Context

Even if you get grammar right, you cannot generate correct translation without knowing the context.

For example, in English we can use “yes” and “no” to answer all yes-no questions. However, Chinese languages use echo answers. For example, if you ask someone in Cantonese “do you like watching movies?” (“你鍾唔鍾意睇戲嫁?”), the answer is “like” (“鍾意”) or “don’t like” (“唔鍾意”). Therefore to correctly translate “yes” and “no” into Cantonese correctly, you need to know the context (the question itself).

For example, if you try to delete your post on Twitch, a pop-out box will let you choose “yes” or “no”.


The correct Chinese translation is “delete” (“刪除”), and “don’t delete” (“不刪除”), or alternatively “confirm” (“確定”) and “cancel” (“取消”). However, the Traditional Chinese translation of that website uses “是”, which means “is”, and “否”, which means “is not”. The Simplified Chinese translation is more absurd. It uses “有”, which mean “have”, and “无”, which means “do not have”. It’s extremely confusing for Chinese users.


The correct solution to this problem is to inform translators about the context of every translation task. Moreover, different string literals, even if their values are identical, must be translated separately. For example, if you want to have correct Chinese translation, you need to create a separate translation task for every occurrence of “yes”, and inform translators what the question is.

Another example is how to format numbers into Chinese. Apple provides a help function to format numbers in different languages. One of the format type is “spell out”.

There are several reasons to use the “spell out” style of numbers. One of the reasons is to prevent forgery. That’s why we need to spell out the numbers when writing a check.

In Chinese, there are two numeral systems: normal and financial. The normal numerals (e.g., 一二三) , are for casual use. However, because the normal numerals characters are too simple to prevent forgery, people invented the financial numeral system (e.g., 壹貳參). All the characters are very complicated and it’s extremely difficult to alter numbers written with financial numerals.

Thus if you want to translate a “spell out” English number to Chinese, you cannot simply use the function provided by Apple. You have to know why the “spell out” version is used. If it is used casually, you should translate it to the normal Chinese numerals. If your application is going to generate a check image, you should translate it to financial Chinese numerals.

Again, without knowing the context, it’s impossible to know the correct translation.

Translation Quality

Now we know how to provide tools to translators. However, even with the correct tools, it is still extremely difficult to have good translation.

If you can speak two languages fluently, you will know that it is impossible to make perfect translations. There is no “equivalent word in another language” because every word is different. Words with a same meaning may have different roots, different derived meanings, and they may be associated with different feelings.

For example, in English, “freedom” and “liberty” have almost the same but slightly different meanings and feelings. Similarly, even though both words are translated to “自由” in both Teochew (pronounced as [tsɨjɯ]) and Japanese (pronounced as [tɕijɯ:]), “自由” has slightly different meaning comparing to “freedom” or “liberty”. And “自由” in Teochew and Japanese have slightly different meanings even though they are cognates.

Beside differences in meaning, the differences in feeling are as important. For example, in English, if you want to express your disappointment towards Netflix’s Chinese translation, you can simply say “Netflix’s Chinese translation is terrible”, or alternatively, you can say “Netflix’s Chinese translation is a piece of crap”. They essentially have the same meaning, but they give you very different feelings. The latter gives a very vulgar feeling and unsuitable to be put on most websites and apps.

Similarly, in Chinese, there are proper written Chinese and vulgar spoken Mandarin. They give you different feelings and you should never ever use any vulgar languages. For example, both “不要” and “別” means “don’t”. The former is proper Chinese, the latter is vulgar spoken Mandarin. Both “主演” and “挑大樑” means “leading in a show”. The former is proper Chinese, the latter is vulgar spoken Mandarin.

Unfortunately Netflix hired some uneducated mainland Chinese sweatshop workers who use “piece of crap” equivalents in their translation.

Beyond Translation

Translation strings is not the only thing that you need to do to make your website or app successful in another market.

For example, sometimes you need to filter out untranslated pictures and database entries. In the Netflix example above, only half of the videos are translated. The leadership might think it is a good thing to provide choices. But in fact, for more than 95% of the population who have very limited English abilities, showing them English content is the same as showing them useless content. Customers have to spend a long time to find movies they can understand and eventually they would abandon the platform altogether.

Actually, Google began to support search in Chinese before Baidu. However, Baidu still prevailed even before Google quit the Chinese market. I believe the most important reason is because Baidu search results don’t contain any non-Chinese web pages by default. For 99% of the Chinese people, it means getting more useful information faster.

Another problem is, developers and leadership without international background are likely to make assumptions which are not universally true. And they don’t even realize it. For example, in America, there is usually a movie genre called “international movies” or “foreign movies”. The problem is obvious: a Hong Kong movie is a foreign movie in America, but not a foreign movie in Hong Kong. However, in Hong Kong Netflix, movies like Ip Man is classified under “foreign movies”.

This is ridiculous. Ip Man is a Cantonese movie. It is made by a Hong Kong company with Hong Kong directors and actors. It tells the story of a Hong Kong kung fu master fighting against Japanese invaders and Chinese gangsters in Canton during WWII. It is an understatement for Hong Kong people to say Ip Man is a local movie. It is a Hong Kong national movie. The error is like classifying Gone with the Wind as a foreign movie for people in Atlanta.

Another example is how to show stock prices. In the US, price increase is denoted by green color, and price decrease is denoted by red color. In East Asia, the meanings of colors are the opposite. In East Asian countries, green (or blue) denotes price decrease and red denotes price increase. For example, Bloomberg’s Japanese site uses the wrong colors.

Google Finance noticed the problem and use the correct colors for their Japanese version. However, there is another difference: in East Asia people are more used to candlestick charts for stock prices. For example, Nikkei’s website uses candlestick chart by default.

If a financial website or app wants to give native feeling to Japanese customers, they need to show candlestick charts by default. Unfortunately I haven’t seen any American companies doing that.

There are much more to consider the localization effort beyond translation, like colors, fonts, font sizes (e.g., Chinese characters require larger font sizes), and even layouts (e.g., Hebrew and Arabic are written from right to left and layouts need to be changed accordingly). It’s far beyond the scope of this article to discuss all these issues. But localization teams need to identify these issues and solve them accordingly.

Open Your Mind

I have spent so much time on the details. But in the end, the most important thing is to open your mind.

First, You need to jump out of the “US vs non-US” or “English vs non-English” duality. The concept of “non-US” or “non-English” makes no sense. The difference between Dutch and Teochew is way larger than the difference between Dutch and English. When I see a company using the term “internationalization” instead of “localization” or having a position called “head of internationalization”, I know their executives are narrow-minded and they will never succeed in localization.

Second, you need to understand translation is literature. It is an art form. You can’t hire any bilingual people as translators for a billion dollar business. You need people who are at the master level of the target language. Vast majority of companies fail in this step. Many companies simply distrust foreigners and won’t give freedom to a competent person to lead the localization team. And if the localization team is controlled by someone in a foreign country without any knowledge of the market or the language, how are they going to hire the right people to do the translation?

Third, even if you managed to hire a team of top-notch translators, you can’t have good translation by email them a list of strings. They need to be context aware. They need to have freedom to alter variables. They need to have a team of programmers, especially target-language-speaking programmers to support them. And this team need to have the power to require any other engineering team to write localization aware code.

Unfortunately, most American companies don’t understand this. In most cases, managers and programmers who only speak English do whatever they want during the development. Then they ask some minimum wage employees to compile a list of string literals from the codebase and databases. Then the employees contact an American firm who claims to be able to translate a list of strings into 20 languages for 5 dollars per word. And then the translation firm email the list of strings to some foreign sweatshops and pay them 2 cents per word for translation. Everyone is happy, including the CEO, who smiles while looking at the list of languages they can choose for their website, until months later, when their website is beaten miserably in China, Japan and Russia, and they still don’t know why.

Those American Internet companies struggling to compete in the Asian market should pay me at least 1 million a year to lead the Chinese localization team only. Unfortunately they are simply not smart enough to make that call and I am still a rockstar programmer.

Posts created 10

Leave a Reply

Your email address will not be published. Required fields are marked *

Begin typing your search term above and press enter to search. Press ESC to cancel.

Back To Top