Nov 10, 2015

Django Localization Issues and Solutions

Django has excellent support for building a multi language multi culture application. However, there are a few interesting bits about its implementation of the internationalization feature that can prove to be minor annoyances in truly localizing yoru web based application. This post aims to discusses these issues and highlights the solutions for them.

It matters where you run 'makemessages' command from

The core command that prepares the application for localization, makemessages can be launched from two locations and depending on where it's launched it behaves slightly differently. The command can be launched from project root folder or from an application folder.
If started from the project root, it attempts to extract all the localizable strings from all installed apps and put them in the *.po file. However, if started from an application subfolder, it only extracts the strings from that application's files.
The key issue here is that makemessages works differently compared to other commands likemakemigrations which accepts an app name as an optional argument to generate migrations for that specific app. If you want to extract messages for a specific app, issue the command from the application folder.

The locale folder is optional for an application

Conventional wisdom will tell you that each application stores the localizable artifacts in its own private folder. It turns out that this is optional. makemessages will look for a locale subfolder under the app folder and if it finds one, it would place the .po file underneath it in the corresponding locale-name subfolder (more about this below) . If the locale folder is not found, the localizable strings for the app are collected in a .po file placed in the first path specified in the LOCALE_PATHS setting.

Language and locale names for Chinese have changed in 1.8

Earlier versions of Django used to refer to Chinese(Traditional) and Chinese(Simplified) as zh-twand zh-cn. This has given way to zh-hant and zh-hans respectively. Apparently, the change was brought about by the pan-nation usage of the language (and hence the related locale) that goes beyond Taiwan and China to countries such as Hong Kong, Malaysia and Singapore.
Interesting bit here is that 'makemessages' would still accept the old language codes as its argument and would send the messages file to a locale subfolder with that name. However, when trying out some localized strings, they won't show up.

Language names and locale names are different

Language names specified in the LANGUAGES tuple in settings are typically all lowercase. English (United States) is specified as 'en-us' and English(Great Britain) is specified as en-gb. However, locale names follow ISO-639 standard for country names and therefore these are uppercased. Soen-us becomes en_US and en-gb becomes en_GB.
There's an exception to this as well. If the country/region name has more than 2 characters, then it is represented in its pronoun form.
Finally, locale names use underscore to separate language and country codes whereas the Django language specification system uses all lowercase. So the locale name for English (United States) specified as 'en-us' in LANGUAGES gets translated to the locale name en_US . Chinese(Traditional) specified as zh-hant gets translated to locale name zh_Hant and Chinese(Simplified) specified as zh-hans gets translated to locale name zh_Hans.

Beware of 'fuzzy' in PO files

Often as you tweak your website, you will invariably change some text. The change could be as simple as fixing a typo or adding a missing comma. After you change this, you would also generate a PO file using the makemessages command. This is where things get a little, shall we say 'fuzzy' (pun intdented). When you regenerate the PO file, GNU gettext utility parses the source code and starts extracting each string from it while looking for a matching entry in the PO file. On finding an exact match, it doesn't do anything as there could already be a msgstr "" entry for that string which it shouldn't mess up.
However, if it doesn't find any strings in PO file which is an exact match, it doesn't just create a brand new entry for the new string. Rather it has a concept called a close match. That is, the new string has a corresponding string in the PO file that, though not an exact match, is very close to the original string. In this case, makemessages just updates the relevant string entry in po file with a line marked as #, fuzzy followed by the new string while commenting out the old closely matching string.
Such strings are not by default stored in the MO file when the messages are compiled and consequently the localized text for the newly modified string won't show up. The system expects you to review the slight changes and make a decision whether the localized text needs to be appropriately updated. As part of this review, you have to remove the #, fuzzy and the commented original string which marks this message and its equivalent localized text as properly reviewed.
This mechanism is documented in detail here.