"Failed to decode response from marionette" message in Python/Firefox headless scraping script

Reily Bourne picture Reily Bourne · Apr 9, 2018 · Viewed 27.4k times · Source

Good Day, I've done a number of searches on here and google and yet to find a solution that address this problem.

The scenario is:

I have a Python script (2.7) that loops through an number of URLs (e.g. think Amazon pages, scraping reviews). Each page has the same HTML layout, just scraping different information. I use Selenium with a headless browser as these pages have javascript that needs to execute to grab the information.

I run this script on my local machine (OSX 10.10). Firefox is the latest v59. Selenium is version at 3.11.0 and using geckodriver v0.20.

This script locally has no issues, it can run through all the URLs and scrape the pages with no issue.

Now when I put the script on my server, the only difference is it is Ubuntu 16.04 (32 bit). I use the appropriate geckodriver (still v0.20) but everything else is the same (Python 2.7, Selenium 3.11). It appears to randomly crash the headless browser and then all of the browserObjt.get('url...') no longer work.

The error messages say:

Message: failed to decode response from marionette

Any further selenium requests for pages return the error:

Message: tried to run command without establishing a connection


To show some code:

When I create the driver:

    options = Options()
    options.set_headless(headless=True)

    driver = webdriver.Firefox(
        firefox_options=options,
        executable_path=config.GECKODRIVER
    )

driver is passed to the script's function as a parameter browserObj which is then used to call specific pages and then once that loads it is passed to BeautifulSoup for parsing:

browserObj.get(url)

soup = BeautifulSoup(browserObj.page_source, 'lxml')

The error might be pointing to the BeautifulSoup line which is crashing the browser.

What is likely causing this, and what can I do to resolve the issue?


Edit: Adding stack trace which points to the same thing:

Traceback (most recent call last):
  File "main.py", line 164, in <module>
    getLeague
  File "/home/ps/dataparsing/XXX/yyy.py", line 48, in BBB
    soup = BeautifulSoup(browserObj.page_source, 'lxml')
  File "/home/ps/AAA/projenv/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 670, in page_source
    return self.execute(Command.GET_PAGE_SOURCE)['value']
  File "/home/ps/AAA/projenv/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 312, in execute
    self.error_handler.check_response(response)
  File "/home/ps/AAA/projenv/local/lib/python2.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
WebDriverException: Message: Failed to decode response from marionette

Note: This script used to work with Chrome. Because the server is a 32bit server, I can only use chromedriver v0.33, which only supports Chrome v60-62. Currently Chrome is v65 and on DigitalOcean I don't seem to have an easy way to revert back to an old version - which is why I am stuck with Firefox.

Answer

myol picture myol · Mar 17, 2019

For anyone else experiencing this issue when running selenium webdriver in a Docker container, increasing the container size to 2gb fixes this issue.

I guess this affects physical machines too if the OP fixed their issue by upgrading their server RAM to 2Gb, but could be coincidence.