Is there a memory efficient and fast way to load big json files in python?

duduklein picture duduklein · Mar 8, 2010 · Viewed 56.6k times · Source

I have some json files with 500MB. If I use the "trivial" json.load to load its content all at once, it will consume a lot of memory.

Is there a way to read partially the file? If it was a text, line delimited file, I would be able to iterate over the lines. I am looking for analogy to it.

Any suggestions? Thanks

Answer

Jim Pivarski picture Jim Pivarski · Jun 26, 2013

There was a duplicate to this question that had a better answer. See https://stackoverflow.com/a/10382359/1623645, which suggests ijson.

Update:

I tried it out, and ijson is to JSON what SAX is to XML. For instance, you can do this:

import ijson
for prefix, the_type, value in ijson.parse(open(json_file_name)):
    print prefix, the_type, value

where prefix is a dot-separated index in the JSON tree (what happens if your key names have dots in them? I guess that would be bad for Javascript, too...), theType describes a SAX-like event, one of 'null', 'boolean', 'number', 'string', 'map_key', 'start_map', 'end_map', 'start_array', 'end_array', and value is the value of the object or None if the_type is an event like starting/ending a map/array.

The project has some docstrings, but not enough global documentation. I had to dig into ijson/common.py to find what I was looking for.