Converting JSON into newline delimited JSON in Python

Fxs7576 picture Fxs7576 · Jul 12, 2018 · Viewed 28.1k times · Source

My goal is to convert JSON file into a format that can uploaded from Cloud Storage into BigQuery (as described here) with Python.

I have tried using newlineJSON package for the conversion but receives the following error.

JSONDecodeError: Expecting value or ']': line 2 column 1 (char 5)

Does anyone have the solution to this?

Here is the sample JSON code:

[{
    "key01": "value01",
    "key02": "value02",
    ...
    "keyN": "valueN"
},
{
    "key01": "value01",
    "key02": "value02",
    ...
    "keyN": "valueN"
},
{
    "key01": "value01",
    "key02": "value02",
    ...
    "keyN": "valueN"
}
]

And here's the existing python script:

with nlj.open(url_samplejson, json_lib = "simplejson") as src_:
    with nlj.open(url_convertedjson, "w") as dst_:
        for line_ in src_:
            dst_.write(line_)

Answer

Oleh Rybalchenko picture Oleh Rybalchenko · Jul 12, 2018

The answer with jq is really useful, but if you still want to do it with Python (as it seems from the question), you can do it with built-in json module.

import json
from io import StringIO
in_json = StringIO("""[{
    "key01": "value01",
    "key02": "value02",

    "keyN": "valueN"
},
{
    "key01": "value01",
    "key02": "value02",

    "keyN": "valueN"
},
{
    "key01": "value01",
    "key02": "value02",

    "keyN": "valueN"
}
]""")

result = [json.dumps(record) for record in json.load(in_json)]  # the only significant line to convert the JSON to the desired format

print('\n'.join(result))

{"key01": "value01", "key02": "value02", "keyN": "valueN"}
{"key01": "value01", "key02": "value02", "keyN": "valueN"}
{"key01": "value01", "key02": "value02", "keyN": "valueN"}

* I'm using StringIO and print here just to make a sample easier to test locally.

As an alternative, you can use Python jq binding to combine it with the other answer.