I'm trying to use utf-8 characters when rendering a template with Jinja2. Here is how my template looks like:
<!DOCTYPE HTML>
<html manifest="" lang="en-US">
<head>
<meta charset="UTF-8">
<title>{{title}}</title>
...
The title variable is set something like this:
index_variables = {'title':''}
index_variables['title'] = myvar.encode("utf8")
template = env.get_template('index.html')
index_file = open(preview_root + "/" + "index.html", "w")
index_file.write(
template.render(index_variables)
)
index_file.close()
Now, the problem is that myvar is a message read from a message queue and can contain those special utf8 characters (ex. "Séptimo Cine").
The rendered template looks something like:
...
<title>S\u00e9ptimo Cine</title>
...
and I want it to be:
...
<title>Séptimo Cine</title>
...
I have made several tests but I can't get this to work.
I have tried to set the title variable without .encode("utf8"), but it throws an exception (ValueError: Expected a bytes object, not a unicode object), so my guess is that the initial message is unicode
I have used chardet.detect to get the encoding of the message (it's "ascii"), then did the following: myvar.decode("ascii").encode("cp852"), but the title is still not rendered correctly.
I also made sure that my template is a UTF-8 file, but it didn't make a difference.
Any ideas on how to do this?
TL;DR:
template.render()
This had me puzzled for a while. Because you do
index_file.write(
template.render(index_variables)
)
in one statement, that's basically just one line where Python is concerned, so the traceback you get is misleading: The exception I got when recreating your test case didn't happen in template.render(index_variables)
, but in index_file.write()
instead. So splitting the code up like this
output = template.render(index_variables)
index_file.write(output)
was the first step to diagnose where exactly the UnicodeEncodeError
happens.
Jinja returns unicode whet you let it render the template. Therefore you need to encode the result to a bytestring before you can write it to a file:
index_file.write(output.encode('utf-8'))
The second error is that you pass in an utf-8
encoded bytestring to template.render()
- Jinja wants unicode. So assuming your myvar
contains UTF-8, you need to decode it to unicode first:
index_variables['title'] = myvar.decode('utf-8')
So, to put it all together, this works for me:
# -*- coding: utf-8 -*-
from jinja2 import Environment, PackageLoader
env = Environment(loader=PackageLoader('myproject', 'templates'))
# Make sure we start with an utf-8 encoded bytestring
myvar = 'Séptimo Cine'
index_variables = {'title':''}
# Decode the UTF-8 string to get unicode
index_variables['title'] = myvar.decode('utf-8')
template = env.get_template('index.html')
with open("index_file.html", "w") as index_file:
output = template.render(index_variables)
# jinja returns unicode - so `output` needs to be encoded to a bytestring
# before writing it to a file
index_file.write(output.encode('utf-8'))