How to combine scrapy and htmlunit to crawl urls with javascript

HjySix picture HjySix · Nov 8, 2011 · Viewed 8.9k times · Source

I'm working on Scrapy to crawl pages,however,I can't handle the pages with javascript. People suggest me to use htmlunit, so I got it installed,but I don't know how to use it at all.Dose anyone can give an example(scrapy + htmlunit) for me? Thanks very much.

Answer

reclosedev picture reclosedev · Nov 17, 2011

To handle the pages with javascript you can use Webkit or Selenium.

Here some snippets from snippets.scrapy.org:

Rendered/interactive javascript with gtk/webkit/jswebkit

Rendered Javascript Crawler With Scrapy and Selenium RC