How to read lines from a json file in scrapy

Question 1

How to read lines from a json file in scrapy

python json scrapy readlines

Olivia · Dec 24, 2012 · Viewed 7.2k times · Source

Answer

Answer

Those are json lines as the exporter name implies.

Take a look in scrapy.contrib.exporter and see the difference between JsonItemExporter and JsonLinesItemExporter

This should do the trick:

import json

lines = []

with open('links.jl', 'r') as f:
    for line in f:
        lines.append(json.loads(line))

Question 2

I have a json file storing some user information including id, name and url. The json file looks like this:

{"link": "https://www.example.com/user1", "id": 1, "name": "user1"}
{"link": "https://www.example.com/user1", "id": 2, "name": "user2"}

This file was written by a scrapy spider. Now I want to read the urls from the json file and scrape each user's webpage. But I cannot load the data from the json file.

At this time, I have no idea how to get these urls. I think I should read the lines from the json file first. I tried the following code in Python shell:

import json    
f = open('links.jl')    
line = json.load(f)

I got the following error message:

Raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1- line 138 column 497(char498-67908)

I did some searches online. The search suggested that the json file may have some formatting issues. But the json file was created and populated with items using scrapy pipeline. Does anybody have a clue what caused the error? And how to solve it? Any suggestions on reading the urls?

Thanks a lot.

How to read lines from a json file in scrapy

Answer

Related questions