As the question explains the problem, I've been trying to generate nested JSON object. In this case I have for
loops getting the data out of dictionary dic
. Below is the code:
f = open("test_json.txt", 'w')
flag = False
temp = ""
start = "{\n\t\"filename\"" + " : \"" +initial_filename+"\",\n\t\"data\"" +" : " +" [\n"
end = "\n\t]" +"\n}"
f.write(start)
for i, (key,value) in enumerate(dic.iteritems()):
f.write("{\n\t\"keyword\":"+"\""+str(key)+"\""+",\n")
f.write("\"term_freq\":"+str(len(value))+",\n")
f.write("\"lists\":[\n\t")
for item in value:
f.write("{\n")
f.write("\t\t\"occurance\" :"+str(item)+"\n")
#Check last object
if value.index(item)+1 == len(value):
f.write("}\n"
f.write("]\n")
else:
f.write("},") # close occurrence object
# Check last item in dic
if i == len(dic)-1:
flag = True
if(flag):
f.write("}")
else:
f.write("},") #close lists object
flag = False
#check for flag
f.write("]") #close lists array
f.write("}")
Expected output is:
{
"filename": "abc.pdf",
"data": [{
"keyword": "irritation",
"term_freq": 5,
"lists": [{
"occurance": 1
}, {
"occurance": 1
}, {
"occurance": 1
}, {
"occurance": 1
}, {
"occurance": 2
}]
}, {
"keyword": "bomber",
"lists": [{
"occurance": 1
}, {
"occurance": 1
}, {
"occurance": 1
}, {
"occurance": 1
}, {
"occurance": 2
}],
"term_freq": 5
}]
}
But currently I'm getting an output like below:
{
"filename": "abc.pdf",
"data": [{
"keyword": "irritation",
"term_freq": 5,
"lists": [{
"occurance": 1
}, {
"occurance": 1
}, {
"occurance": 1
}, {
"occurance": 1
}, {
"occurance": 2
},] // Here lies the problem "," before array(last element)
}, {
"keyword": "bomber",
"lists": [{
"occurance": 1
}, {
"occurance": 1
}, {
"occurance": 1
}, {
"occurance": 1
}, {
"occurance": 2
},], // Here lies the problem "," before array(last element)
"term_freq": 5
}]
}
Please help, I've trying to solve it, but failed. Please don't mark it duplicate since I have already checked other answers and didn't help at all.
Edit 1:
Input is basically taken from a dictionary dic
whose mapping type is <String, List>
for example: "irritation" => [1,3,5,7,8]
where irritation is the key, and mapped to a list of page numbers.
This is basically read in the outer for loop where key is the keyword and value is a list of pages of occurrence of that keyword.
Edit 2:
dic = collections.defaultdict(list) # declaring the variable dictionary
dic[key].append(value) # inserting the values - useless to tell here
for key in dic:
# Here dic[x] represents list - each value of x
print key,":",dic[x],"\n" #prints the data in dictionary
What @andrea-f looks good to me, here another solution:
Feel free to pick in both :)
import json
dic = {
"bomber": [1, 2, 3, 4, 5],
"irritation": [1, 3, 5, 7, 8]
}
filename = "abc.pdf"
json_dict = {}
data = []
for k, v in dic.iteritems():
tmp_dict = {}
tmp_dict["keyword"] = k
tmp_dict["term_freq"] = len(v)
tmp_dict["lists"] = [{"occurrance": i} for i in v]
data.append(tmp_dict)
json_dict["filename"] = filename
json_dict["data"] = data
with open("abc.json", "w") as outfile:
json.dump(json_dict, outfile, indent=4, sort_keys=True)
It's the same idea, I first create a big json_dict
to be saved directly in json. I use the with
statement to save the json avoiding the catch of exception
Also, you should have a look to the doc of json.dumps()
if you need future improve in your json
output.
EDIT
And just for fun, if you don't like tmp
var, you can do all the data for
loop in a one-liner :)
json_dict["data"] = [{"keyword": k, "term_freq": len(v), "lists": [{"occurrance": i} for i in v]} for k, v in dic.iteritems()]
It could gave for final solution something not totally readable like this:
import json
json_dict = {
"filename": "abc.pdf",
"data": [{
"keyword": k,
"term_freq": len(v),
"lists": [{"occurrance": i} for i in v]
} for k, v in dic.iteritems()]
}
with open("abc.json", "w") as outfile:
json.dump(json_dict, outfile, indent=4, sort_keys=True)
EDIT 2
It looks like you don't want to save your json
as the desired output, but be abble to read it.
In fact, you can also use json.dumps()
in order to print your json.
with open('abc.json', 'r') as handle:
new_json_dict = json.load(handle)
print json.dumps(json_dict, indent=4, sort_keys=True)
There is still one problem here though, "filename":
is printed at the end of the list because the d
of data
comes before the f
.
To force the order, you will have to use an OrderedDict
in the generation of the dict. Be careful the syntax is ugly (imo) with python 2.X
Here is the new complete solution ;)
import json
from collections import OrderedDict
dic = {
'bomber': [1, 2, 3, 4, 5],
'irritation': [1, 3, 5, 7, 8]
}
json_dict = OrderedDict([
('filename', 'abc.pdf'),
('data', [ OrderedDict([
('keyword', k),
('term_freq', len(v)),
('lists', [{'occurrance': i} for i in v])
]) for k, v in dic.iteritems()])
])
with open('abc.json', 'w') as outfile:
json.dump(json_dict, outfile)
# Now to read the orderer json file
with open('abc.json', 'r') as handle:
new_json_dict = json.load(handle, object_pairs_hook=OrderedDict)
print json.dumps(json_dict, indent=4)
Will output:
{
"filename": "abc.pdf",
"data": [
{
"keyword": "bomber",
"term_freq": 5,
"lists": [
{
"occurrance": 1
},
{
"occurrance": 2
},
{
"occurrance": 3
},
{
"occurrance": 4
},
{
"occurrance": 5
}
]
},
{
"keyword": "irritation",
"term_freq": 5,
"lists": [
{
"occurrance": 1
},
{
"occurrance": 3
},
{
"occurrance": 5
},
{
"occurrance": 7
},
{
"occurrance": 8
}
]
}
]
}
But be carefull, most of the time, it is better to save a regular .json
file in order to be cross languages.