Using python and urllib to get data from Yahoo FInance

ng150716 picture ng150716 · Apr 16, 2014 · Viewed 8.8k times · Source

I was using urllib in python to get stock prices from yahoo finance. Here is my code so far:

import urllib
import re

name = raw_input(">")

htmlfile = urllib.urlopen("http://finance.yahoo.com/q?s=%s" % name)

htmltext = htmlfile.read()

# The problemed area 
regex = '<span id="yfs_l84_%s">(.+?)</span>' % name

pattern = re.compile(regex)

price = re.findall(pattern, htmltext)

print price

So I enter a value, and the stock price comes out. But so far I can get it to display a price, just a blank [ ]. I hace commented over where I believe the problem is. Any suggestions? Thanks.

Answer

shaktimaan picture shaktimaan · Apr 16, 2014

You have not escaped the forward slash in your regex. Change your regex from:

<span id="yfs_l84_%s">(.+?)</span>

to

<span id="yfs_l84_goog">(.+?)<\/span>

This will fix your problem assuming you enter the company's listing code as the input to your code. Ex; goog for google.

That said, regex is a bad choice for what you are trying to do. As suggested by others, explore BeautifulSoup which is a Python library for pulling data out of HTML. With BeautifulSoup your code can be as simple as:

from bs4 import BeautifulSoup
import requests

name = raw_input('>')
url = 'http://finance.yahoo.com/q?s={}'.format(name)
r = requests.get(url)
soup = BeautifulSoup(r.text)
data = soup.find('span', attrs={'id':'yfs_l84_'.format(name)})
print data.text