I am trying to download image in via scrapy. Here are my different files :
items.py
class DmozItem(Item):
title = Field()
image_urls = Field()
images = Field()
settings.py
BOT_NAME = 'tutorial'
SPIDER_MODULES = ['tutorial.spiders']
NEWSPIDER_MODULE = 'tutorial.spiders'
ITEM_PIPELINES = ['scrapy.contrib.pipeline.images.ImagesPipeline']
IMAGES= '/home/mayank/Desktop/sc/tutorial/tutorial'
spider
class DmozSpider(BaseSpider):
name = "wikipedia"
allowed_domains = ["wikipedia.org"]
start_urls = [
"http://en.wikipedia.org/wiki/Pune"
]
def parse(self, response):
hxs = HtmlXPathSelector(response)
items = []
images=hxs.select('//a[@class="image"]')
for image in images:
item = DmozItem()
link=image.select('@href').extract()[0]
link = 'http://en.wikipedia.com'+link
item['image_urls']=link
items.append(item)
In spite of all these setting I my pipeline is not getting activated.Please help. I am new to this framework.
First, settings.py
: IMAGES -> IMAGES_STORE
Second, spider
: You should return an item
so that ImagesPipeline
could download those images.
item = DmozItem()
image_urls = hxs.select('//img/@src').extract()
item['image_urls'] = ["http:" + x for x in image_urls]
return item