With Beautiful Soup and Request Library I am able to scrape HTML content, but not what loads by JavaScript or AJAX calls.
How do I mimic this through my Python script? Because YouTube comments load when we scroll the page. I found 2 methods; one using Selenium and another using lxml requests, which I couldn't understand a bit.
Example (this is the video):
import requests
from bs4 import BeautifulSoup as soup
url = 'https://www.youtube.com/watch?v=iFPMz36std4'
response = requests.get(url)
page_html = response.content
#print page_html
page_soup=soup(page_html,"html.parser")
print page_soup
You need to use selenium :
Here is a trick , Youtube only load comments when you scroll just down of video , if you scroll bottom or elsewhere, comments will not load , so first scroll to that down part and wait for loading comments after that scroll to bottom or whenever you want :
from selenium import webdriver
import time
driver=webdriver.Chrome()
driver.get('https://www.youtube.com/watch?v=iFPMz36std4')
driver.execute_script('window.scrollTo(1, 500);')
#now wait let load the comments
time.sleep(5)
driver.execute_script('window.scrollTo(1, 3000);')
comment_div=driver.find_element_by_xpath('//*[@id="contents"]')
comments=comment_div.find_elements_by_xpath('//*[@id="content-text"]')
for comment in comments:
print(comment.text)
some part of output:
#can't post full output its too long
I love Kygo's Stranger Things and Netflix's Stranger Things <3
Stranger Things, Kygo and OneRepublic, could it be better?
Amazing Vibe!!!!!!!!!🔥🔥🔥🔥