How do I use data in package_data from source code?

Scott picture Scott · May 5, 2011 · Viewed 11.3k times · Source

In setup.py, I have specified package_data like this:

packages=['hermes'],
package_dir={'hermes': 'hermes'},
package_data={'hermes': ['templates/*.tpl']},

And my directory structure is roughly

hermes/
 |
 | docs/
 | ...
 | hermes/
    | 
    | __init__.py
    | code.py
    | templates
        |
        | python.tpl
 |
 | README
 | setup.py

The problem is that I need to use files from the templates directory in my source code so I can write out python code (this project is a parser generator). I can't seem to figure out how to properly include and use these files from my code. Any ideas?

Answer

samplebias picture samplebias · May 5, 2011

The standard pkgutil module's get_data() function will calculate the path to your data, relative to your package, and retrieve the data for you via whatever module loader Python used to import the hermes package:

import pkgutil
data = pkgutil.get_data('hermes', 'templates/python.tpl')

Of course in certain cases you could just read your data using a path calculated from hermes.__file__, but if you plan to distribute your project, consider that it may be installed in different ways on the end user's machine: as plain files, deployed in a zipped egg archive, etc. In the latter case, your hermes module will have been imported by Python using a zipimporter, preventing you from doing a normal open(path).read():

>>> import hermes
>>> hermes.__loader__
<zipimporter object "/home/pat/.cascade/virt/foo/lib/python2.6/site-packages/foo-0.0.0-py2.6.egg">

If you're okay with adding a runtime dependency on the distribute codebase, you may want to consdider looking at the pkg_resources module, which can give you the same result but adds other capabilities.

import pkg_resources
data = pkg_resources.resource_string('hermes', 'templates/python.tpl')