Jinja2 and Mako are both apparently pretty fast.
How do these compare to (the less featured but probably good enough for what I'm doing) string.Template ?
Here are the results of the popular template engines for rendering a 10x1000 HTML table.
Python 2.6.2 on a 3GHz Intel Core 2
Kid template 696.89 ms
Kid template + cElementTree 649.88 ms
Genshi template + tag builder 431.01 ms
Genshi tag builder 389.39 ms
Django template 352.68 ms
Genshi template 266.35 ms
ElementTree 180.06 ms
cElementTree 107.85 ms
StringIO 41.48 ms
Jinja 2 36.38 ms
Cheetah template 34.66 ms
Mako Template 29.06 ms
Spitfire template 21.80 ms
Tenjin 18.39 ms
Spitfire template -O1 11.86 ms
cStringIO 5.80 ms
Spitfire template -O3 4.91 ms
Spitfire template -O2 4.82 ms
generator concat 4.06 ms
list concat 3.99 ms
generator concat optimized 2.84 ms
list concat optimized 2.62 ms
The benchmark is based on code from Spitfire performance tests with some added template engines and added iterations to increase accuracy. The list and generator concat at the end are hand coded Python to get a feel for the upper limit of performance achievable by compiling to Python bytecode. The optimized versions use string interpolation in the inner loop.
But before you run out to switch your template engine, make sure it matters. You'll need to be doing some pretty heavy caching and really optimized code before the differences between the compiling template engines starts to matter. For most applications good abstraction facilities, compatibility with design tools, familiarity and other things matter much much more.