{"id":15,"date":"2018-06-20T12:00:56","date_gmt":"2018-06-20T12:00:56","guid":{"rendered":"https:\/\/sinyalee.com\/essays\/?p=15"},"modified":"2024-09-02T03:24:05","modified_gmt":"2024-09-02T03:24:05","slug":"localization-done-right","status":"publish","type":"post","link":"https:\/\/sinyalee.com\/essays\/?p=15","title":{"rendered":"Localization Done Right"},"content":{"rendered":"<p>Many US tech companies have difficulties entering foreign markets like China, Japan, and Russia. Many people know that the reason is cultural difference. But few companies take a step forward and fix the localization problem.<\/p>\n<p>It\u2019s hard to find a rockstar programmer at my level. It\u2019s almost impossible to find a rockstar programmer at my level who can also speak 5 distinct languages (like me). I feel I have the responsibility to talk about the correct way to localize your app or website for markets outside the US.<\/p>\n<p><strong>Disclaimer<\/strong><br \/>\nIn this article I give many examples of localization failures of American companies because I suppose most of my readers are American. It doesn\u2019t mean American companies are particularly bad in localization. In fact, localization efforts of Chinese, Japanese, and German companies are as bad as, if not worse than, American companies. Actually, if Alibaba and Tencent can solve their localization problems, Amazon and Facebook may have trouble competing with them.<\/p>\n<h3>The Wrong Way<\/h3>\n<p>Before going into details of the correct way, I want to talk about the wrong way to do localization first. The wrong way is to compile a list of string literals in your codebase and databases, then hand over the list to translators. Unfortunately, the wrong way is also the most common way of localization.<\/p>\n<p>In the following sections, I will try to point out why this approach is bad and how we can do localization right.<\/p>\n<h3>Grammar<\/h3>\n<p>One big problem of the string mapping approach is grammar. If you simply translate strings, the grammar can never be correct.<\/p>\n<p>First, if your strings include program generated numbers, you may need to put your number in difference places in different languages. For example, in a video website, you may want to show people how many times the video is viewed. In English, you might include something like \u201c${<em>video.views<\/em>} views\u201d.<\/p>\n<p><img decoding=\"async\" class=\" article-image\" src=\"https:\/\/farm1.staticflickr.com\/838\/41409705500_a867f1693e_m.jpg\" alt=\"\" \/><\/p>\n<p>However, in Chinese, the best translation is \u201c\u89c0\u770b\u6b21\u6578\uff1a${<em>video.views<\/em>}\u201d.<\/p>\n<p><img decoding=\"async\" class=\" article-image\" src=\"https:\/\/farm1.staticflickr.com\/926\/43170287002_47ebcc44d7_m.jpg\" alt=\"\" \/><\/p>\n<p>In English you put the extra string after the number, but in Chinese in need to put an extra string before the number. Therefore, you can never get the correct Chinese translation if you only translate strings. Here is an example of the bad translation from Twitch. Obviously they simply translate \u201cviews\u201d into Chinese and put it after the number, resulting a terrible translation.<\/p>\n<p><img decoding=\"async\" class=\" article-image\" src=\"https:\/\/farm2.staticflickr.com\/1824\/42501366284_5c62970a46_n.jpg\" alt=\"\" \/><\/p>\n<p>The example above is far from the worst grammar issue you need to solve for localization. It\u2019s not difficult to solve the word order issue. Many localization APIs allow you to use placeholders for variables and achieve correct translations.<\/p>\n<p>What\u2019s worse is that program generated content might alter strings nearby. For example, if you want to generate a sentence \u201cThere is a ${<em>object<\/em>}\u201d, you might need to do some extra work because <em>object<\/em> might be \u201capple\u201d. You need to change \u201ca\u201d to \u201can\u201d if the following word starts with a vowel.<\/p>\n<p>Say if you are a Chinese developer working for a Chinese company who can only speak Chinese, you might not even be aware of the rule at all. Thus the program doesn\u2019t have anything to find out if the program generated object should use \u201ca\u201d or \u201can\u201d. There is no way to generate the correct English translation with string matching, even with variable placeholders.<\/p>\n<p>You may think this is not your problem because English grammar is much more complicated than Chinese grammar. You are correct, if you only want to do localization for Chinese. But say if you want to translate the same sentence to German and Japanese, you are out of luck.<\/p>\n<p>For example, if you want to translate \u201cThere is a ${<em>object<\/em>}\u201d for the following three objects: apple, pineapple and man. In English you need to do something different for apple because apple starts with a vowel. In German, you need to do something different for pineapple because it is feminine (for Germans). In Japanese, you need to do something different with man because it is alive.<\/p>\n<p>The only way to have correct Japanese and German translation is to have a function to figure out the gender of every object for Germans, and figure out whether a object is alive for the Japanese. And then we need to allow translators to use complicated function compositions freely to achieve the correct translation. Unfortunately, none of the 3rd party localization frameworks today support this kind of complicated tasks.<\/p>\n<h3>Context<\/h3>\n<p>Even if you get grammar right, you cannot generate correct translation without knowing the context.<\/p>\n<p>For example, in English we can use \u201cyes\u201d and \u201cno\u201d to answer all yes-no questions. However, Chinese languages use <a href=\"https:\/\/en.wikipedia.org\/wiki\/Echo_answer\">echo answers<\/a>. For example, if you ask someone in Cantonese \u201cdo you like watching movies?\u201d (\u201c\u4f60\u937e\u5514\u937e\u610f\u7747\u6232\u5ac1?\u201d), the answer is \u201clike\u201d (\u201c\u937e\u610f\u201d) or \u201cdon\u2019t like\u201d (\u201c\u5514\u937e\u610f\u201d). Therefore to correctly translate \u201cyes\u201d and \u201cno\u201d into Cantonese correctly, you need to know the context (the question itself).<\/p>\n<p>For example, if you try to delete your post on Twitch, a pop-out box will let you choose \u201cyes\u201d or \u201cno\u201d.<\/p>\n<p><img decoding=\"async\" class=\" article-image\" src=\"https:\/\/farm2.staticflickr.com\/1805\/42501365924_ab474a2f69_n.jpg\" alt=\"\" \/><br \/>\n<img decoding=\"async\" class=\" article-image\" src=\"https:\/\/farm1.staticflickr.com\/915\/43170286482_e07877ebf4_n.jpg\" alt=\"\" \/><\/p>\n<p>The correct Chinese translation is \u201cdelete\u201d (\u201c\u522a\u9664\u201d), and \u201cdon\u2019t delete\u201d (\u201c\u4e0d\u522a\u9664\u201d), or alternatively \u201cconfirm\u201d (\u201c\u78ba\u5b9a\u201d) and \u201ccancel\u201d (\u201c\u53d6\u6d88\u201d). However, the Traditional Chinese translation of that website uses \u201c\u662f\u201d, which means \u201cis\u201d, and \u201c\u5426\u201d, which means \u201cis not\u201d. The Simplified Chinese translation is more absurd. It uses \u201c\u6709\u201d, which mean \u201chave\u201d, and \u201c\u65e0\u201d, which means \u201cdo not have\u201d. It\u2019s extremely confusing for Chinese users.<\/p>\n<p><img decoding=\"async\" class=\" article-image\" src=\"https:\/\/farm1.staticflickr.com\/844\/42501366074_57b08a52fc_n.jpg\" alt=\"\" \/><br \/>\n<img decoding=\"async\" class=\" article-image\" src=\"https:\/\/farm2.staticflickr.com\/1782\/42321880085_1d7ba00765_n.jpg\" alt=\"\" \/><\/p>\n<p>The correct solution to this problem is to inform translators about the context of every translation task. Moreover, different string literals, even if their values are identical, must be translated separately. For example, if you want to have correct Chinese translation, you need to create a separate translation task for every occurrence of \u201cyes\u201d, and inform translators what the question is.<\/p>\n<p>Another example is how to format numbers into Chinese. Apple provides a help function to format numbers in different languages. One of the format type is \u201cspell out\u201d.<\/p>\n<p><img decoding=\"async\" class=\" article-image\" src=\"https:\/\/farm2.staticflickr.com\/1765\/42501365664_9b27581149.jpg\" alt=\"\" \/><\/p>\n<p>There are several reasons to use the \u201cspell out\u201d style of numbers. One of the reasons is to prevent forgery. That\u2019s why we need to spell out the numbers when writing a check.<\/p>\n<p>In Chinese, there are two numeral systems: normal and financial. The normal numerals (e.g., \u4e00\u4e8c\u4e09) , are for casual use. However, because the normal numerals characters are too simple to prevent forgery, people invented the financial numeral system (e.g., \u58f9\u8cb3\u53c3). All the characters are very complicated and it\u2019s extremely difficult to alter numbers written with financial numerals.<\/p>\n<p>Thus if you want to translate a \u201cspell out\u201d English number to Chinese, you cannot simply use the function provided by Apple. You have to know why the \u201cspell out\u201d version is used. If it is used casually, you should translate it to the normal Chinese numerals. If your application is going to generate a check image, you should translate it to financial Chinese numerals.<\/p>\n<p>Again, without knowing the context, it\u2019s impossible to know the correct translation.<\/p>\n<h3>Translation Quality<\/h3>\n<p>Now we know how to provide tools to translators. However, even with the correct tools, it is still extremely difficult to have good translation.<\/p>\n<p>If you can speak two languages fluently, you will know that <em>it is impossible to make perfect translations<\/em>. There is no \u201cequivalent word in another language\u201d because every word is different. Words with a same meaning may have different roots, different derived meanings, and they may be associated with different feelings.<\/p>\n<p>For example, in English, \u201cfreedom\u201d and \u201cliberty\u201d have almost the same but slightly different meanings and feelings. Similarly, even though both words are translated to \u201c\u81ea\u7531\u201d in both Teochew (pronounced as [ts\u0268j\u026f]) and Japanese (pronounced as [t\u0255ij\u026f:]), \u201c\u81ea\u7531\u201d has slightly different meaning comparing to \u201cfreedom\u201d or \u201cliberty\u201d. And \u201c\u81ea\u7531\u201d in Teochew and Japanese have slightly different meanings even though they are cognates.<\/p>\n<p>Beside differences in meaning, the differences in feeling are as important. For example, in English, if you want to express your disappointment towards Netflix\u2019s Chinese translation, you can simply say \u201cNetflix\u2019s Chinese translation is terrible\u201d, or alternatively, you can say \u201cNetflix\u2019s Chinese translation is a piece of crap\u201d. They essentially have the same meaning, but they give you very different feelings. The latter gives a very vulgar feeling and unsuitable to be put on most websites and apps.<\/p>\n<p>Similarly, in Chinese, there are proper written Chinese and vulgar spoken Mandarin. They give you different feelings and you should never ever use any vulgar languages. For example, both \u201c\u4e0d\u8981\u201d and \u201c\u5225\u201d means \u201cdon\u2019t\u201d. The former is proper Chinese, the latter is vulgar spoken Mandarin. Both \u201c\u4e3b\u6f14\u201d and \u201c\u6311\u5927\u6a11\u201d means \u201cleading in a show\u201d. The former is proper Chinese, the latter is vulgar spoken Mandarin.<\/p>\n<p>Unfortunately Netflix hired some uneducated mainland Chinese sweatshop workers who use \u201cpiece of crap\u201d equivalents in their translation.<\/p>\n<p><img decoding=\"async\" class=\" article-image\" src=\"https:\/\/farm1.staticflickr.com\/844\/42501366474_664c66977c_z.jpg\" alt=\"\" \/><\/p>\n<h3>Beyond Translation<\/h3>\n<p>Translation strings is not the only thing that you need to do to make your website or app successful in another market.<\/p>\n<p>For example, sometimes you need to filter out untranslated pictures and database entries. In the Netflix example above, only half of the videos are translated. The leadership might think it is a good thing to provide choices. But in fact, for more than 95% of the population who have very limited English abilities, showing them English content is the same as showing them useless content. Customers have to spend a long time to find movies they can understand and eventually they would abandon the platform altogether.<\/p>\n<p>Actually, Google began to support search in Chinese before Baidu. However, Baidu still prevailed even before Google quit the Chinese market. I believe the most important reason is because Baidu search results don\u2019t contain any non-Chinese web pages by default. For 99% of the Chinese people, it means getting more useful information faster.<\/p>\n<p>Another problem is, developers and leadership without international background are likely to make assumptions which are not universally true. And they don\u2019t even realize it. For example, in America, there is usually a movie genre called \u201cinternational movies\u201d or \u201cforeign movies\u201d. The problem is obvious: a Hong Kong movie is a foreign movie in America, but not a foreign movie in Hong Kong. However, in Hong Kong Netflix, movies like <em>Ip Man<\/em> is classified under \u201cforeign movies\u201d.<\/p>\n<p><img decoding=\"async\" class=\" article-image\" src=\"https:\/\/farm2.staticflickr.com\/1824\/41409705290_b26fc69b89_n.jpg\" alt=\"\" \/><\/p>\n<p>This is ridiculous. Ip Man is a Cantonese movie. It is made by a Hong Kong company with Hong Kong directors and actors. It tells the story of a Hong Kong kung fu master fighting against Japanese invaders and Chinese gangsters in Canton during WWII. It is an understatement for Hong Kong people to say Ip Man is a local movie. It is a <em>Hong Kong national movie<\/em>. The error is like classifying <em>Gone with the Wind<\/em> as a foreign movie for people in Atlanta.<\/p>\n<p>Another example is how to show stock prices. In the US, price increase is denoted by green color, and price decrease is denoted by red color. In East Asia, the meanings of colors are the opposite. In East Asian countries, green (or blue) denotes price decrease and red denotes price increase. For example, Bloomberg\u2019s Japanese site uses the wrong colors.<\/p>\n<p><img decoding=\"async\" class=\" article-image\" src=\"https:\/\/farm1.staticflickr.com\/840\/42501365464_4cc9f08535_z.jpg\" alt=\"\" \/><\/p>\n<p>Google Finance noticed the problem and use the correct colors for their Japanese version. However, there is another difference: in East Asia people are more used to <a href=\"https:\/\/en.wikipedia.org\/wiki\/Candlestick_chart\">candlestick charts<\/a> for stock prices. For example, Nikkei\u2019s website uses candlestick chart by default.<\/p>\n<p><img decoding=\"async\" class=\" article-image\" src=\"https:\/\/farm1.staticflickr.com\/918\/41409704880_3cbbd88033.jpg\" alt=\"\" \/><\/p>\n<p>If a financial website or app wants to give native feeling to Japanese customers, they need to show candlestick charts by default. Unfortunately I haven\u2019t seen any American companies doing that.<\/p>\n<p>There are much more to consider the localization effort beyond translation, like colors, fonts, font sizes (e.g., Chinese characters require larger font sizes), and even layouts (e.g., Hebrew and Arabic are written from right to left and layouts need to be changed accordingly). It\u2019s far beyond the scope of this article to discuss all these issues. But localization teams need to identify these issues and solve them accordingly.<\/p>\n<h3>Open Your Mind<\/h3>\n<p>I have spent so much time on the details. But in the end, the most important thing is to <em>open your mind<\/em>.<\/p>\n<p>First, You need to jump out of the \u201cUS vs non-US\u201d or \u201cEnglish vs non-English\u201d duality. The concept of \u201cnon-US\u201d or \u201cnon-English\u201d makes no sense. The difference between Dutch and Teochew is way larger than the difference between Dutch and English. When I see a company using the term \u201cinternationalization\u201d instead of \u201clocalization\u201d or having a position called \u201chead of internationalization\u201d, I know their executives are narrow-minded and they will never succeed in localization.<\/p>\n<p>Second, you need to understand translation is literature. It is an art form. You can\u2019t hire any bilingual people as translators for a billion dollar business. You need people who are at the master level of the target language. Vast majority of companies fail in this step. Many companies simply distrust foreigners and won\u2019t give freedom to a competent person to lead the localization team. And if the localization team is controlled by someone in a foreign country without any knowledge of the market or the language, how are they going to hire the right people to do the translation?<\/p>\n<p>Third, even if you managed to hire a team of top-notch translators, you can\u2019t have good translation by email them a list of strings. They need to be context aware. They need to have freedom to alter variables. They need to have a team of programmers, especially target-language-speaking programmers to support them. And this team need to have the <em>power to require<\/em> any other engineering team to write localization aware code.<\/p>\n<p>Unfortunately, most American companies don\u2019t understand this. In most cases, managers and programmers who only speak English do whatever they want during the development. Then they ask some minimum wage employees to compile a list of string literals from the codebase and databases. Then the employees contact an American firm who claims to be able to translate a list of strings into 20 languages for 5 dollars per word. And then the translation firm email the list of strings to some foreign sweatshops and pay them 2 cents per word for translation. Everyone is happy, including the CEO, who smiles while looking at the list of languages they can choose for their website, until months later, when their website is beaten miserably in China, Japan and Russia, and they still don\u2019t know why.<\/p>\n<p>Those American Internet companies struggling to compete in the Asian market should pay me at least 1 million a year to lead the Chinese localization team only. Unfortunately they are simply not smart enough to make that call and I am still a rockstar programmer.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Many US tech companies have difficulties entering foreign markets like China, Japan, and Russia. Many people know that the reason is cultural difference. But few companies take a step forward and fix the localization problem. It\u2019s hard to find a rockstar programmer at my level. It\u2019s almost impossible to find a rockstar programmer at my [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[],"class_list":["post-15","post","type-post","status-publish","format-standard","hentry","category-translation"],"_links":{"self":[{"href":"https:\/\/sinyalee.com\/essays\/index.php?rest_route=\/wp\/v2\/posts\/15","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sinyalee.com\/essays\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sinyalee.com\/essays\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sinyalee.com\/essays\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sinyalee.com\/essays\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=15"}],"version-history":[{"count":3,"href":"https:\/\/sinyalee.com\/essays\/index.php?rest_route=\/wp\/v2\/posts\/15\/revisions"}],"predecessor-version":[{"id":127,"href":"https:\/\/sinyalee.com\/essays\/index.php?rest_route=\/wp\/v2\/posts\/15\/revisions\/127"}],"wp:attachment":[{"href":"https:\/\/sinyalee.com\/essays\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=15"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sinyalee.com\/essays\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=15"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sinyalee.com\/essays\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=15"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}