I use Scrapy and I try to scrape this site that uses Incapsula

I had already asked a Question about this issue 2 years ago, but this method (Incapsula-Cracker) does not work anymore.

I tried to understand How Incapsula works and I tried this for bypass it

def start_requests(self):
    yield Request('',  cookies={'store': 92}, dont_filter=True, callback = self.init_shop)
def init_shop(self,response) :
    result_content      = response.body
    RE_ENCODED_FUNCTION = re.compile('var b="(.*?)"', re.DOTALL)
    RE_INCAPSULA        = re.compile('(_Incapsula_Resource\?SWHANEDL=.*?)"')
    INCAPSULA_URL       = ''
    encoded_func        =
    decoded_func        = ''.join([chr(int(encoded_func[i:i+2], 16)) for i in xrange(0, len(encoded_func), 2)])
    incapsula_params    =
    incap_url           = INCAPSULA_URL % incapsula_params
    yield Request(incap_url)
def parse(self):
    print response.body 

But i'm redirected to RE-Captcha Page

So first of all there is no fool proof solutions to such problems. I as a actual user end-up having to solve captcha while answering on StackOverflow. Which means a bot will definitely get captchas.

Now there are few rules which I try and follow to decrease the chances of an captcha

  • Never ever use shared proxies for such projects. Using TOR is a big NO
  • Use Chrome + Selenium + Proxy
  • Use Chrome with existing profile. I prefer to have profiles which have browsing history with different websites, cookies from many other sites and trackers and going back month. You don't know how the evaluation of a user/bot difference may happen. So you want to look more like a real user
  • Never scrape at fast rates, use as many delays as possible and as random delays as possible
  • Always use a visible browser and keep monitoring the captcha, on captcha appearance manually solve the captcha or use a DeathByCaptcha or similar service. Try not to abort captcha pages as it may increase your bot probability check to a higher grade

This is a cat and mouse game, where you don't know what the other party has as a defense. So you try to play nice and easy