How to write a python script to search a website html for matching links

GeminiDNK picture GeminiDNK · Mar 4, 2010 · Viewed 16k times · Source

I am not too familiar with python and have to write a script to perform a host of functions. Basically the module i still need is how to check a website code for matching links provided beforehand.

Answer

Nick Presta picture Nick Presta · Mar 4, 2010

Matching links what? Their HREF attribute? The link display text? Perhaps something like:

from BeautifulSoup import BeautifulSoup, SoupStrainer
import re
import urllib2

doc = urllib2.urlopen("http://somesite.com").read()
links = SoupStrainer('a', href=re.compile(r'^test'))
soup = [str(elm) for elm in BeautifulSoup(doc, parseOnlyThese=links)]
for elm in soup:
    print elm

That will grab the HTML content of somesite.com and then parse it using BeautifulSoup, looking only for links whose HREF attribute starts with "test". It then builds a list of these links and prints them out.

You can modify this to do anything using the documentation.