I've been keeping a low profile since Juliette blogged that I had joined the Cloudworks team, back in January. However, as we will shortly be launching a new feature – namely, the Cloudworks site translated into Greek, it seems like a good time to share our experiences. I'll keep technical detail to a minimum.
First, it should be pointed out that it is the user-interface, for example the main navigation links and form labels that have been localized (translated), not the dynamic content, Clouds, Cloudscapes and so on. Greek was chosen as the first language as we recieved European funding for this purpose, but translating into other languages will require less effort (volunteers are welcome to email email@example.com).
Cloudworks is built on top of the CodeIgniter software framework, which has support for internationalization. However, CodeIgniter uses a bespoke method, which though it is efficient, has drawbacks, for translators - a lack of tools, and for developers - having to simultaneously edit two files when internationalizing a source text. Both these issues were a concern given the size of Cloudworks (700 texts extracted!) and the limited time available. (Tech: by default CodeIgniter stores language packs in arrays, a similar method to that used by the Moodle open-source e-learning software.)
After further research, I settled on GNU Gettext, a set of free/open-source software libraries and file formats, which is used by many Linux distributions, and by web projects including Wordpress and Drupal (with subtle variations). In the Gettext method, each text string in the software is manually wrapped by the developer in a function call (a software notation or syntax). A software tool is used to automatically extract each chunk of text to a file (with a .po or .pot file extension).
The translator uses a specialised graphical editor to translate the text. Files can be merged and joined. The resulting file is converted to a binary format (.mo extension), which is deployed to the server. Gettext handles plurals (languages use suffixes differently for zero, one, two...), and to some extent dates.
I developed a system of placeholders for dynamic parts of sentences and phrases - for example the titles of Clouds, names of contributors, dates and so on. This borrowed heavily from Drupal with some variations to help maintain content flow, particularly in the about pages (the only content to be translated). And, the language will be chosen based on your browser software's configuration, with the option to override this by selecting the language from a menu.
What lessons have we learnt?
- Using built-in date/time functionality (Tech:
strftimePHP/C function) is not trivial, due to encoding issues on Windows servers. This is one reason why web projects like Drupal handle dates and times themselves.
- Preparing the template files for translation, writing notes/instructions for translators, and integrating the text from translators takes more time than you might think.
- Character encoding can be an issue - for historical reasons Cloudworks uses Latin1 (ISO-8859-1), and we still need to convert content with a quite a lot of accents to Unicode. (Note, this may require some down time - we'll keep you posted.)
- The only drawbacks to Gettext that I have found in the context of a web site/application, are the need to create a binary file, and the difficulty of extending a language using multiple files. Drupal deals with these issues by handling the localization files itself, bypassing the default system functionality.
Finally, my thanks to Martha Vasiliadou from Innovade LI Ltd. in Cyprus, who is doing a great job of translating the site to Greek! And we'll keep you posted as we release the new language functionality.
Posted by Nick Freear on 26 March 2010