I have some json files with 500MB. If I use the "trivial" json.load to load its content all at once, it will consume a lot of memory.
Is there a way to read partially the file? If it was a text, line delimited file, I would be able to iterate over the lines. I am looking for analogy to it.
Any suggestions? Thanks
There was a duplicate to this question that had a better answer. See https://stackoverflow.com/a/10382359/1623645, which suggests ijson.
Update:
I tried it out, and ijson is to JSON what SAX is to XML. For instance, you can do this:
import ijson
for prefix, the_type, value in ijson.parse(open(json_file_name)):
print prefix, the_type, value
where prefix
is a dot-separated index in the JSON tree (what happens if your key names have dots in them? I guess that would be bad for Javascript, too...), theType
describes a SAX-like event, one of 'null', 'boolean', 'number', 'string', 'map_key', 'start_map', 'end_map', 'start_array', 'end_array'
, and value
is the value of the object or None
if the_type
is an event like starting/ending a map/array.
The project has some docstrings, but not enough global documentation. I had to dig into ijson/common.py
to find what I was looking for.