scraping the file with html saved in local system

Shiva Krishna Bavandla picture Shiva Krishna Bavandla · Jun 5, 2012 · Viewed 16.9k times · Source

For example i had a site "www.example.com" Actually i want to scrape the html of this site by saving on to local system. so for testing i saved that page on my desktop as example.html

Now i had written the spider code for this as below

class ExampleSpider(BaseSpider):
   name = "example"
   start_urls = ["example.html"]

   def parse(self, response):
       print response
       hxs = HtmlXPathSelector(response)

But when i run the above code i am getting this error as below

ValueError: Missing scheme in request url: example.html

Finally my intension is to scrape the example.html file that consists of www.example.com html code saved in my local system

Can any one suggest me on how to assign that example.html file in start_urls

Thanks in advance

Answer

iodbh picture iodbh · Mar 5, 2014

You can crawl a local file using an url of the following form:

 file:///path/to/file.html