Error when parsing yaml file : found character '%' that cannot start any token

bcharfi picture bcharfi · Feb 16, 2017 · Viewed 11.7k times · Source

I am trying to parse data from yaml file having some expressions similar to jinaj2 template syntax, the goal is to delete or add some items to the file.

AddCodesList.yaml

AddCodesList:
  body:
    list:
    {% for elt in customer %}
      - code: {{ elt.code }}
        name: {{ elt.name }}
        country: {{ elt.country }}
    {% endfor %}   
  result:
    json:
      responseCode: {{ responseCode }}
      responseMsg: {{ responseMsg }}
      responseData: {{ responseData }}

parseFile.py

import ruamel.yaml
from ruamel.yaml.util import load_yaml_guess_indent

data,indent,block_seq_indent=load_yaml_guess_indent(open('AddCodesList.yaml'), preserve_quotes=True)

#delete item
del data['body']['list']['code']
#add new item
data['parameters'].insert(2, 'ssl_password','xxxxxx')#create new file
ruamel.yaml.round_trip_dump(data, open('missingCode.yaml', 'w'), explicit_start=True)

I have the following error when executing the parseFile.py script:

    Traceback (most recent call last):
      File "d:/workspace/TEST/manageItem.py", line 4, in <module>
        data, indent, block_seq_indent = load_yaml_guess_indent(open('AddCodesList.
...
        if self.check_token(ValueToken):
      File "C:\Python34\lib\site-packages\ruamel\yaml\scanner.py", line 1534, in ch
        self.fetch_more_tokens()
      File "C:\Python34\lib\site-packages\ruamel\yaml\scanner.py", line 269, in fet
        % utf8(ch), self.get_mark())
    ruamel.yaml.scanner.ScannerError: while scanning for the next token
    found character '%' that cannot start any token
      in "<unicode string>", line 4, column 6:
            {% for elt in customer %}
             ^ (line: 4)

Answer

Anthon picture Anthon · Feb 16, 2017

In YAML the '{' starts a flow style mapping, so (%) is going to be the start of the first key of that mapping and that character is not allowed as the first character.

Normally you would process the templates of the file first and then apply YAML. You cannot easily reverse that process, as the value for list would have to be a valid YAML construct.

One of the solutions to make it parseable is to change the value for list to valid YAML like:

list:
  - {% for elt in customer %}
  - code: {{ elt.code }}
    name: {{ elt.name }}
    country: {{ elt.country }}
  - {% endfor %} 

or:

list: |
    {% for elt in customer %}
      - code: {{ elt.code }}
        name: {{ elt.name }}
        country: {{ elt.country }}
    {% endfor %} 

and that would no longer make it templateable bij jinja2.

You can change the start sequence in jinja2 from {% but that doensn't help you (i.e. you still would not get valid YAML). The only real solution I see at the moment is to drop the jinja2 completely and implement the for loop using some list like object in Python (that gets expanded on access).

If it is allowable to always preprocess before applying jinja2, you can change the file to:

AddCodesList:
  body:
    list:
    # {% for elt in customer %}
      - code: '{{ elt.code }}'
        name: '{{ elt.name }}'
        country: '{{ elt.country }}'
    # {% endfor %}   

as that would load, but you might need to change the # b{ to just { before running your template engine.

Quote with single quotes as between those only the single quote has a special meaning. With double quotes you more often will get something inserted by the pre-processor that makes things incorrect YAML (e.g. DOS/Windows style full-file-paths: 'C:\yaml\abc.yaml' is correct but "c:\yaml\abc.yaml" will give you a error during YAML parsing.