Scrape dynamic HTML (YouTube comments)

shubham agarwal picture shubham agarwal · Oct 31, 2017 · Viewed 8.3k times · Source

With Beautiful Soup and Request Library I am able to scrape HTML content, but not what loads by JavaScript or AJAX calls.

How do I mimic this through my Python script? Because YouTube comments load when we scroll the page. I found 2 methods; one using Selenium and another using lxml requests, which I couldn't understand a bit.

Example (this is the video):

import requests
from bs4 import BeautifulSoup as soup

url = 'https://www.youtube.com/watch?v=iFPMz36std4'
response = requests.get(url)
page_html = response.content
#print page_html

page_soup=soup(page_html,"html.parser")
print page_soup

Answer

Aaditya Ura picture Aaditya Ura · Oct 31, 2017

You need to use selenium :

Here is a trick , Youtube only load comments when you scroll just down of video , if you scroll bottom or elsewhere, comments will not load , so first scroll to that down part and wait for loading comments after that scroll to bottom or whenever you want :

from selenium import webdriver

import time

driver=webdriver.Chrome()

driver.get('https://www.youtube.com/watch?v=iFPMz36std4')

driver.execute_script('window.scrollTo(1, 500);')

#now wait let load the comments
time.sleep(5)

driver.execute_script('window.scrollTo(1, 3000);')



comment_div=driver.find_element_by_xpath('//*[@id="contents"]')
comments=comment_div.find_elements_by_xpath('//*[@id="content-text"]')
for comment in comments:
    print(comment.text)

some part of output:

#can't post full output its too long
I love Kygo's Stranger Things and Netflix's Stranger Things <3
Stranger Things, Kygo and OneRepublic, could it be better?
Amazing Vibe!!!!!!!!!🔥🔥🔥🔥