Parsing gettext `.po` files with python

alex picture alex · Mar 6, 2012 · Viewed 8.1k times · Source

I need to extract messages from .po files. Is there a Python module to do that? I wrote a parser, but it depends on platform (\r\n vs. \n).

Is there a better way to do this?

Answer

MestreLion picture MestreLion · Apr 3, 2012

In most cases you don't need to parse .po files yourself. Developers give translators a .pot template file, they rename it to xx_XX.po and translate the strings. Then you as developer only have to "compile" them to .mo files using GNU's gettext tools (or its Python implementation, pygettext)

But, if you want/need to parse the po files yourself, instead of compiling them, I strongly suggest you to use polib, a well-known python library to handle po files. It is used by several large-scale projects, such as Mercurial and Ubuntu's Launchpad translation engine:

PyPi package home: http://pypi.python.org/pypi/polib/

Code repository: https://github.com/izimobil/polib

(Original repository was hosted at Bitbucket, which no longer supports Mercurial: https://bitbucket.org/izi/polib/wiki/Home)

Documentation: http://polib.readthedocs.org

The import module is a single file, with MIT license, so you can easily incorporate it in your code like this:

import polib
po = polib.pofile('path/to/catalog.po')
for entry in po:
    print entry.msgid, entry.msgstr

It can't be easier than that ;)