Can't capture HAR using Python Selenium Script with BrowserMob-Proxy

Civic_Matt picture Civic_Matt · Sep 25, 2014 · Viewed 8.4k times · Source

Goal: I want to run a Selenium Python script through BrowserMob-Proxy, which will capture and output a HAR file capture.

Problem: I have a functional (very basic) Python script (shown below). When it is altered to utilize BrowserMob-Proxy to capture HAR however, it fails. Below I provide two different scripts that both fail, but for differing reasons (details provided after code snippets).

BrowserMob-Proxy Explanation: As mentioned before, I am using both 0.6.0 AND 2.0-beta-8. The reasoning for this is that A) LightBody (lead designer of BMP) recently indicated that his most current release (2.0-beta-9) is not functional and advises users to use 2.0-beta-8 instead and B) from what I can tell from reading various site/stackoverflow information is that 0.6.0 (acquired through PIP) is used to make calls to the Client.py/Server.py, whereas 2.0-beta-8 is used to initiate the Server. To be honest, this confuses me. When importing BMP's Server however, it requires a batch (.bat) file to initiate the server, which is not provided in 0.6.0, but is with 2.0-beta-8...if anyone can shed some light on this area of confusion (I suspect it is the root of my problems described below), then I'd be most appreciative.

Software Specs:

  • Operating System: Windows 7 (64x) -- running in VirtualBox
  • Browser: FireFox (32.0.2)
  • Script Language: Python (2.7.8)
  • Automated Web Browser: Selenium (2.43.0) -- installed via PIP
  • BrowserMob-Proxy: 0.6.0 AND 2.0-beta-8 -- see explanation below

Selenium Script (this script works):

"""This script utilizes Selenium to obtain the Google homepage"""
from selenium import webdriver

driver = webdriver.Firefox()       # Opens FireFox browser.
driver.get('https://google.com/')  # Gets google.com and loads page in browser.

driver.quit()                      # Closes Firefox browser

This script succeeds in running and does not produce any errors. It is provided for illustrative purposes to indicate it works before adding BMP logic.

Script ALPHA with BMP (does not work):

"""Using the same functional Selenium script, produce ALPHA_HAR.har output"""
from browsermobproxy import Server
server = Server('C:\Users\Matt\Desktop\\browsermob-proxy-2.0-beta-8\\bin\\browsermob-proxy')
server.start()
proxy = server.create_proxy()

from selenium import webdriver
driver = webdriver.Firefox()           # Opens FireFox browser.

proxy.new_har("ALPHA_HAR")             # Creates a new HAR
driver.get("https://www.google.com/")  # Gets google.com and loads page in browser.
proxy.har                              # Returns a HAR JSON blob

server.stop()

This code will succeed in running the script and will not produce any errors. However, when searching the entirety of my hard drive, I never succeed in locating ALPHA_HAR.har.

Script BETA with BMP (does not work):

"""Using the same functional Selenium script, produce BETA_HAR.har output"""
from browsermobproxy import Server
server = Server("C:\Users\Matt\Desktop\\browsermob-proxy-2.0-beta-8\\bin\\browsermob-proxy")
server.start()    
proxy = server.create_proxy()

from selenium import webdriver
profile = webdriver.FirefoxProfile()
profile.set_proxy(proxy.selenium_proxy())
driver = webdriver.Firefox(firefox_profile=profile)

proxy.new_har("BETA_HAR")             # Creates a new HAR
driver.get("https://www.google.com/") # Gets google.com and loads page in browser.
proxy.har                             # Returns a HAR JSON blob

server.stop()

This code was taken from http://browsermob-proxy-py.readthedocs.org/en/latest/. When running the above code, FireFox will attempt to get google.com, but will never succeed in loading the page. Eventually it will time out without producing any errors. And BETA_HAR.har can't be found anywhere on my hard drive. I have also noticed that, when trying to use this browser to visit any other site, it will similarly fail to load (I suspect this is due to the proxy not being configured properly).

Answer

user2415430 picture user2415430 · Jan 15, 2015

I use phantomJS, here is an example of how to use it with python:

import browsermobproxy as mob
import json
from selenium import webdriver
BROWSERMOB_PROXY_PATH = '/usr/share/browsermob/bin/browsermob-proxy'
url = 'http://google.com'

s = mob.Server(BROWSERMOB_PROXY_PATH)
s.start()
proxy = s.create_proxy()
proxy_address = "--proxy=127.0.0.1:%s" % proxy.port
service_args = [ proxy_address, '--ignore-ssl-errors=yes', ] #so that i can do https connections
driver = webdriver.PhantomJS(service_args=service_args)
driver.set_window_size(1400, 1050)
proxy.new_har(url)
driver.get(url)
har_data = json.dumps(proxy.har, indent=4)
screenshot = driver.get_screenshot_as_png()
imgname = "google.png"
harname = "google.har"
save_img = open(imgname, 'a')
save_img.write(screenshot)
save_img.close()
save_har = open(harname, 'a')
save_har.write(har_data)
save_har.close()
driver.quit()
s.stop()