The PyYAML package loads unmarked strings as either unicode or str objects, depending on their content.
I would like to use unicode objects throughout my program (and, unfortunately, can't switch to Python 3 just yet).
Is there an easy way to force PyYAML to always strings load unicode objects? I do not want to clutter my YAML with !!python/unicode
tags.
# Encoding: UTF-8
import yaml
menu= u"""---
- spam
- eggs
- bacon
- crème brûlée
- spam
"""
print yaml.load(menu)
Output: ['spam', 'eggs', 'bacon', u'cr\xe8me br\xfbl\xe9e', 'spam']
I would like: [u'spam', u'eggs', u'bacon', u'cr\xe8me br\xfbl\xe9e', u'spam']
Here's a version which overrides the PyYAML handling of strings by always outputting unicode
. In reality, this is probably the identical result of the other response I posted except shorter (i.e. you still need to make sure that strings in custom classes are converted to unicode
or passed unicode
strings yourself if you use custom handlers):
# -*- coding: utf-8 -*-
import yaml
from yaml import Loader, SafeLoader
def construct_yaml_str(self, node):
# Override the default string handling function
# to always return unicode objects
return self.construct_scalar(node)
Loader.add_constructor(u'tag:yaml.org,2002:str', construct_yaml_str)
SafeLoader.add_constructor(u'tag:yaml.org,2002:str', construct_yaml_str)
print yaml.load(u"""---
- spam
- eggs
- bacon
- crème brûlée
- spam
""")
(The above gives [u'spam', u'eggs', u'bacon', u'cr\xe8me br\xfbl\xe9e', u'spam']
)
I haven't tested it on LibYAML
(the c-based parser) as I couldn't compile it though, so I'll leave the other answer as it was.