I was new to web scraping and I was trying to create a scraper which looks at a playlist link and gets the list of the music and the author.
But the site kept rejecting my connection because it thought that I was a bot, so I used UserAgent to create a fake useragent string to try and bypass the filter.
It sort of worked? But the problem was that when you visited the website by a browser, you could see the contents of the playlist, but when you tried to extract the html code with requests, the contents of the playlist was just a big blank space.
Mabye I have to wait for the page to load? Or there is a stronger bot filter?
import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
ua = UserAgent()
melon_site="http://kko.to/IU8zwNmjM"
headers = {'User-Agent' : ua.random}
result = requests.get(melon_site, headers = headers)
print(result.status_code)
src = result.content
soup = BeautifulSoup(src,'html.parser')
print(soup)
POINT TO REMEMBERS WHILE SCRAPING
1)Use a good User Agent.. ua.random may be returning you a user agent which is being Blocked by the server
2) If you are Doing Too much scraping, limit down your scraping pace , use time.sleep() so that server may not get loaded by your Ip address else it will block you.
3) If server blocks you try using Ip rotating.