Selenium: How to Inject/execute a Javascript in to a Page before loading/executing any other scripts of the page?

Alex picture Alex · Jul 11, 2015 · Viewed 20.5k times · Source

I'm using selenium python webdriver in order to browse some pages. I want to inject a javascript code in to a pages before any other Javascript codes get loaded and executed. On the other hand, I need my JS code to be executed as the first JS code of that page. Is there a way to do that by Selenium?

I googled it for a couple of hours, but I couldn't find any proper answer!

Answer

Mattwmaster58 picture Mattwmaster58 · Aug 26, 2019

Since version 1.0.9, selenium-wire has gained the functionality to modify responses to requests. Below is an example of this functionality to inject a script into a page before it reaches a webbrowser.

import os
from seleniumwire import webdriver
from gzip import compress, decompress
from urllib.parse import urlparse

from lxml import html
from lxml.etree import ParserError
from lxml.html import builder

script_elem_to_inject = builder.SCRIPT('alert("injected")')

def inject(req, req_body, res, res_body):
    # various checks to make sure we're only injecting the script on appropriate responses
    # we check that the content type is HTML, that the status code is 200, and that the encoding is gzip
    if res.headers.get_content_subtype() != 'html' or res.status != 200 or res.getheader('Content-Encoding') != 'gzip':
        return None
    try:
        parsed_html = html.fromstring(decompress(res_body))
    except ParserError:
        return None
    try:
        parsed_html.head.insert(0, script_elem_to_inject)
    except IndexError: # no head element
        return None
    return compress(html.tostring(parsed_html))

drv = webdriver.Firefox(seleniumwire_options={'custom_response_handler': inject})
drv.header_overrides = {'Accept-Encoding': 'gzip'} # ensure we only get gzip encoded responses

Another way in general to control a browser remotely and be able to inject a script before the pages content loads would be to use a library based on a separate protocol entirely, eg: DevTools Protocol. A Python implementation is available here: https://github.com/pyppeteer/pyppeteer2 (Disclaimer: I'm one of the main authors)