How to configure ruamel.yaml.dump output?

nowox picture nowox · Sep 20, 2016 · Viewed 8.9k times · Source

With this data structure:

d = {
    (2,3,4): {
        'a': [1,2], 
        'b': 'Hello World!',
        'c': 'Voilà!'
    }
}

I would like to get this YAML:

%YAML 1.2
---
[2,3,4]:
  a:
    - 1
    - 2
  b: Hello World!
  c: 'Voilà!'

Unfortunately I get this format:

$ print ruamel.yaml.dump(d, default_flow_style=False, line_break=1, explicit_start=True, version=(1,2))
%YAML 1.2
---
? !!python/tuple
- 2
- 3
- 4
: a:
  - 1
  - 2
  b: Hello World!
  c: !!python/str 'Voilà!'

I cannot configure the output I want even with safe_dump. How can I do that without manual regex work on the output?

The only ugly solution I found is something like:

def rep(x):
    return repr([int(y) for y in re.findall('^\??\s*-\s*(\d+)', x.group(0), re.M)]) + ":\n"
print re.sub('\?(\s*-\s*(\w+))+\s*:', rep, 
    ruamel.yaml.dump(d, default_flow_style=False, line_break=1, explicit_start=True, version=(1,2)))

Answer

Anthon picture Anthon · Sep 21, 2016

New ruamel.yaml API

You cannot get what you want using ruamel.yaml.dump(), but with the new API, which has a few more controls, you can come very close.

import sys
import ruamel.yaml


d = {
    (2,3,4): {
        'a': [1,2], 
        'b': 'Hello World!',
        'c': 'Voilà!'
    }
}

def prep(d):
    if isinstance(d, dict):
        needs_restocking = False
        for idx, k in enumerate(d):
            if isinstance(k, tuple):
                needs_restocking = True
            try:
                if 'à' in d[k]:
                    d[k] = ruamel.yaml.scalarstring.SingleQuotedScalarString(d[k])
            except TypeError:
                pass
            prep(d[k])
        if not needs_restocking:
            return
        items = list(d.items())
        for (k, v) in items:
            d.pop(k)
        for (k, v) in items:
            if isinstance(k, tuple):
                k = ruamel.yaml.comments.CommentedKeySeq(k)
            d[k] = v
    elif isinstance(d, list):
        for item in d:
            prep(item)

yaml = ruamel.yaml.YAML()
yaml.indent(mapping=2, sequence=4, offset=2)
yaml.version = (1, 2)
data = prep(d)
yaml.dump(d, sys.stdout)

which gives:

%YAML 1.2
---
[2, 3, 4]:
  a:
    - 1
    - 2
  b: Hello World!
  c: 'Voilà!'

There is still no simple way to suppress the space before the sequence items, so you cannot get [2,3,4] insted of [2, 3, 4] without some major effort.

Original answer:


You cannot get exactly what you want as output using ruamel.yaml.dump() without major rework of the internals.

  • The output you like has indentation 2 for the values of the top-level mapping (key a, b, etc) and indentation 4 for the elements of the sequence that is the value for the a key (with the - pushed in 2 positions. That would at least require differencing between indentation levels for mapping and sequences (if not for individual collections) and that is non-trivial.
  • Your sequence output is compacted from the , (comma, space) what a "normal" flow style emits to just a ,. IIRC this cannot currently be influenced by any parameter, and since you have little contextual knowledge when emitting a collection, it is difficult to "not include the spaces when emitting a sequence that is a key". An additional option to dump() would require changes in several of the sources files and classes.

Less difficult issues, with indication of solution:

  • Your tuple has to magically convert to a sequence to get rid of the tag !!python/tuple. As you don't want to affect all tuples, this is IMO best done by making a subclass of tuple and represent this as a sequence (optionally represent such tuple as list only if actually used as a key). You can use comments.CommentedKeySeq for that (assuming ruamel.yaml>=0.12.14, it has the proper representation support when using ruamel.yaml.round_trip_dump()
  • Your key is, when tested before emitting, not a simple key and as such it get a '? ' (question mark, space) to indicate a complex mapping key. . You would have to change the emitter so that the SequenceStartEvent starts a simple key (if it has flow style and not block style). An additional issue is that such a SequenceStartEvent then will be "tested" to have a style attribute (which might indicate an explicit need for '?' on key). This requires changing emitter.py:Emitter.check_simple_key() and emitter.py:Emitter.expect_block_mapping_key().
  • Your scalar string value for c gets quotes, whereas your scalar string value for b doesn't. You only can get that kind of difference in output in ruamel.yaml by making them different types. E.g. by making it type scalarstring.SingleQuotedScalarString() (and using round_trip_dump()).

If you do:

import sys
import ruamel.yaml
from ruamel.yaml.comments import CommentedMap, CommentedKeySeq
assert ruamel.yaml.version_info >= (0, 12, 14)

data = CommentedMap()
data[CommentedKeySeq((2, 3, 4))] = cm = CommentedMap()
cm['a'] = [1, 2]
cm['b'] = 'Hello World!'
cm['c'] = ruamel.yaml.scalarstring.SingleQuotedScalarString('Voilà!')

ruamel.yaml.round_trip_dump(data, sys.stdout, explicit_start=True, version=(1, 2))

you will get:

%YAML 1.2
---
[2, 3, 4]:
  a:
  - 1
  - 2
  b: Hello World!
  c: 'Voilà!'

which, apart from the now consistent indentation level of 2, the extra spaces in the flow style sequence, and the required use of the round_trip_dump, will get you as close to what you want without major rework.

Whether the above code is ugly as well or not is of course a matter of taste.

The output will, non-incidently, round-trip correctly when loaded using ruamel.yaml.round_trip_load(preserve_quotes=True).


If control over the quotes is not needed, and neither is the order of your mapping keys important, then you can also patch the normal dumper:

def my_key_repr(self, data):
    if isinstance(data, tuple):
        print('data', data)
        return self.represent_sequence(u'tag:yaml.org,2002:seq', data,
                                       flow_style=True)
    return ruamel.yaml.representer.SafeRepresenter.represent_key(self, data)

ruamel.yaml.representer.Representer.represent_key = my_key_repr

Then you can use a normal sequence:

data = {}
data[(2, 3, 4)] = cm = {}
cm['a'] = [1, 2]
cm['b'] = 'Hello World!'
cm['c'] = 'Voilà!'

ruamel.yaml.dump(data, sys.stdout, allow_unicode=True, explicit_start=True, version=(1, 2))

will give you:

%YAML 1.2
---
[2, 3, 4]:
  a: [1, 2]
  b: Hello World!
  c: Voilà!

please note that you need to explicitly allow unicode in your output (default with round_trip_dump()) using allow_unicode=True.


¹ Disclaimer: I am the author of ruamel.yaml.