Goal: I want to run a Selenium Python script through BrowserMob-Proxy, which will capture and output a HAR file capture.
Problem: I have a functional (very basic) Python script (shown below). When it is altered to utilize BrowserMob-Proxy to capture HAR however, it fails. Below I provide two different scripts that both fail, but for differing reasons (details provided after code snippets).
BrowserMob-Proxy Explanation: As mentioned before, I am using both 0.6.0 AND 2.0-beta-8. The reasoning for this is that A) LightBody (lead designer of BMP) recently indicated that his most current release (2.0-beta-9) is not functional and advises users to use 2.0-beta-8 instead and B) from what I can tell from reading various site/stackoverflow information is that 0.6.0 (acquired through PIP) is used to make calls to the Client.py/Server.py, whereas 2.0-beta-8 is used to initiate the Server. To be honest, this confuses me. When importing BMP's Server however, it requires a batch (.bat) file to initiate the server, which is not provided in 0.6.0, but is with 2.0-beta-8...if anyone can shed some light on this area of confusion (I suspect it is the root of my problems described below), then I'd be most appreciative.
Software Specs:
Selenium Script (this script works):
"""This script utilizes Selenium to obtain the Google homepage"""
from selenium import webdriver
driver = webdriver.Firefox() # Opens FireFox browser.
driver.get('https://google.com/') # Gets google.com and loads page in browser.
driver.quit() # Closes Firefox browser
This script succeeds in running and does not produce any errors. It is provided for illustrative purposes to indicate it works before adding BMP logic.
Script ALPHA with BMP (does not work):
"""Using the same functional Selenium script, produce ALPHA_HAR.har output"""
from browsermobproxy import Server
server = Server('C:\Users\Matt\Desktop\\browsermob-proxy-2.0-beta-8\\bin\\browsermob-proxy')
server.start()
proxy = server.create_proxy()
from selenium import webdriver
driver = webdriver.Firefox() # Opens FireFox browser.
proxy.new_har("ALPHA_HAR") # Creates a new HAR
driver.get("https://www.google.com/") # Gets google.com and loads page in browser.
proxy.har # Returns a HAR JSON blob
server.stop()
This code will succeed in running the script and will not produce any errors. However, when searching the entirety of my hard drive, I never succeed in locating ALPHA_HAR.har.
Script BETA with BMP (does not work):
"""Using the same functional Selenium script, produce BETA_HAR.har output"""
from browsermobproxy import Server
server = Server("C:\Users\Matt\Desktop\\browsermob-proxy-2.0-beta-8\\bin\\browsermob-proxy")
server.start()
proxy = server.create_proxy()
from selenium import webdriver
profile = webdriver.FirefoxProfile()
profile.set_proxy(proxy.selenium_proxy())
driver = webdriver.Firefox(firefox_profile=profile)
proxy.new_har("BETA_HAR") # Creates a new HAR
driver.get("https://www.google.com/") # Gets google.com and loads page in browser.
proxy.har # Returns a HAR JSON blob
server.stop()
This code was taken from http://browsermob-proxy-py.readthedocs.org/en/latest/. When running the above code, FireFox will attempt to get google.com, but will never succeed in loading the page. Eventually it will time out without producing any errors. And BETA_HAR.har can't be found anywhere on my hard drive. I have also noticed that, when trying to use this browser to visit any other site, it will similarly fail to load (I suspect this is due to the proxy not being configured properly).
I use phantomJS, here is an example of how to use it with python:
import browsermobproxy as mob
import json
from selenium import webdriver
BROWSERMOB_PROXY_PATH = '/usr/share/browsermob/bin/browsermob-proxy'
url = 'http://google.com'
s = mob.Server(BROWSERMOB_PROXY_PATH)
s.start()
proxy = s.create_proxy()
proxy_address = "--proxy=127.0.0.1:%s" % proxy.port
service_args = [ proxy_address, '--ignore-ssl-errors=yes', ] #so that i can do https connections
driver = webdriver.PhantomJS(service_args=service_args)
driver.set_window_size(1400, 1050)
proxy.new_har(url)
driver.get(url)
har_data = json.dumps(proxy.har, indent=4)
screenshot = driver.get_screenshot_as_png()
imgname = "google.png"
harname = "google.har"
save_img = open(imgname, 'a')
save_img.write(screenshot)
save_img.close()
save_har = open(harname, 'a')
save_har.write(har_data)
save_har.close()
driver.quit()
s.stop()