Most efficient approach for multilingual PHP website

alexteg picture alexteg · May 11, 2010 · Viewed 12.4k times · Source

I am working on a large multilingual website and I am considering different approaches for making it multilingual. The possible alternatives I can think of are:

  1. The Gettext functions with generation of .po files
  2. One MySQL table with the translations and a unique string ID for each text
  3. PHP-files with arrays containing the different translations with unique string IDs

As far as I have understood the Gettext functions should be most efficient, but my requirement is that it should be possible to change a text string in the original reference language (English) without the other translations of that string automatically reverting back to English just because a couple of words changed. Is this possible with Gettext?

What is the least resource demanding solution?
Is using the Gettext functions or PHP files with arrays more or less equally resource demanding?
Any other suggestions for more efficient solutions?

Answer

Alec picture Alec · May 11, 2010

A few considerations:

1. Translations
Who will be doing the translations? People that are also connected to the site? A translation agency? When using Gettext you'll be working with 'pot' (.po) files. These files contain the message ID and the message string (the translation). Example:

msgid "A string to be translated would go here"  
msgstr ""

Now, this looks just fine and understandable for anyone who needs to translate this. But what happens when you use keywords, like Mike suggests, instead of full sentences? If someone needs to translate a msgid called "address_home", he or she has no clue if this is should be a header "Home address" or that it's a full sentence. In this case, make sure to add comments to the file right before you call on the gettext function, like so:

/// This is a comment that will be included in the pot file for the translators
gettext("ready_for_lost_episode");

Using xgettext --add-comments=/// when creating the .po files will add these comments. However, I don't think Gettext is ment to be used this way. Also, if you need to add comments with every text you want to display you'll a) probably make an error at some point, b) you're whole script will be filled with the texts anyway, only in comment form, c) the comments needs to be placed directly above the Gettext function, which isn't always convient, depending on the position of the function in your code.

2. Maintenance
Once your site grows (even further) and your language files along with it, it might get pretty hard to maintain all the different translations this way. Every time you add a text, you need to create new files, send out the files to translators, receive the files back, make sure the structure is still intact (eager translators are always happy to translate the syntax as well, making the whole file unusable :)), and finish with importing the new translations. It's doable, sure, but be aware with possible problems on this end with large sites and many different languages.


Another option: combine your 2nd and 3rd alternative:

Personally, I find it more useful to manage the translation using a (simple) CMS, keeping the variables and translations in a database and export the relevent texts to language files yourself:

  1. add variables to the database (e.g.: id, page, variable);
  2. add translations to these variables (e.g.: id, varId, language, translation);
  3. select relevant variables and translations, write them to a file;
  4. include the relevant language file in your site;
  5. create your own function to display a variables text:

text('var'); or maybe something like __('faq','register','lost_password_text');

Point 3 can be as simple as selecting all the relevant variables and translations from the database, putting them in an array and writing the serlialized array to a file.

Advantages:

  1. Maintenance. Maintaining the texts can be a lot easier for big projects. You can group variables by page, sections or other parts within your site, by simply adding a column to your database that defines to which part of the site this variable belongs. That way you can quickly pull up a list of all the variables used in e.g. the FAQ page.

  2. Translating. You can display the variable with all the translations of all the different languages on a single page. This might be useful for people who can translate texts into multiple languages at the same time. And it might be useful to see other translations to get a feel for the context so that the translation is as good as possible. You can also query the database to find out what has been translated and what hasn't. Maybe add timestamps to keep track of possible outdated translations.

  3. Access. This depends on who will be translating. You can wrap the CMS with a simple login to grant access to people from a translation agency if need be, and only allow them to change certain languages or even certain parts of the site. If this isn't an option you can still output the data to a file that can be manually translated and import it later (although this might come with the same problems as mentioned before.). You can add one of the translations that's already there (English or another main language) as context for the translator.

All in all I think you'll find that you'll have a lot more control over the translations this way, especially in the long run. I can't tell you anything about speed or efficiency of this approach compared to the native gettext function. But, depending on the size of the language files, I don't think it'll be a big difference. If you group the variables by page or section, you can alway include only the required parts.