How to avoid being detected as bot on Puppeteer and Phantomjs?

Felipe S. Fernandes picture Felipe S. Fernandes · Aug 7, 2018 · Viewed 14.6k times · Source

Puppeteer and PhantomJS are similar. The issue I'm having is happening for both, and the code is also similar.

I'd like to catch some informations from a website, which needs authentication for viewing those informations. I can't even access home page because it's detected like a "suspicious activity", like the SS: https://i.imgur.com/p69OIjO.png

I discovered that the problem doesn't happen when I tested on Postman using a header named Cookie and the value of it's cookie caught on browser, but this cookie expires after some time. So I guess Puppeteer/PhantomJS both are not catching cookies, because this site is denying the headless browser access.

What could I do for bypass this?

// Simple Javascript example
var page = require('webpage').create();
var url = 'https://www.expertflyer.com';

page.open(url, function (status) {
    if( status === "success") {
        page.render("home.png");
        phantom.exit();
    }
});

Answer

Grubshka picture Grubshka · Jan 25, 2019

Things that can help in general :

  • Headers should be similar to common browsers, including :
  • If you make multiple request, put a random timeout between them
  • If you open links found in a page, set the Referer header accordingly
  • Images should be enabled
  • Javascript should be enabled
    • Check that "navigator.plugins" and "navigator.language" are set in the client javascript page context
  • Use proxies