For example i had a site "www.example.com"
Actually i want to scrape the html of this site by saving on to local system.
so for testing i saved that page on my desktop as example.html
Now i had written the spider code for this as below
class ExampleSpider(BaseSpider):
name = "example"
start_urls = ["example.html"]
def parse(self, response):
print response
hxs = HtmlXPathSelector(response)
But when i run the above code i am getting this error as below
ValueError: Missing scheme in request url: example.html
Finally my intension is to scrape the example.html
file that consists of www.example.com
html code saved in my local system
Can any one suggest me on how to assign that example.html file in start_urls
Thanks in advance
You can crawl a local file using an url of the following form:
file:///path/to/file.html