Get the code from inspect element using Python

EspenG picture EspenG · Aug 10, 2014 · Viewed 16.7k times · Source

In the Safari browser, I can right-click and select "Inspect Element", and a lot of code appears. Is it possible to get this code using Python? The best solution would be to get a file with the code in it.

More specifically, I am trying to find the links to the images on this page: http://500px.com/popular. I can see the links from "Inspect Element" and I would like to retrieve them with Python.

Answer

gary picture gary · Aug 10, 2014

One way to get at the source code of a web page is to use the Beautiful Soup library. A tutorial of this is shown here. The code from the page is shown below, the comments are mine. This particular code does not work as the contents have changed on the site it uses as an example, but the concept should help you to do what you want to do. Hope it helps.

from bs4 import BeautifulSoup
from urllib2 import urlopen

BASE_URL = "http://www.chicagoreader.com"

def get_category_links(section_url):
    # Put the stuff you see when using Inspect Element in a variable called html.
    html = urlopen(section_url).read()    
    # Parse the stuff.
    soup = BeautifulSoup(html, "lxml")    
    # The next two lines will change depending on what you're looking for. This 
    # line is looking for <dl class="boccat">.  
    boccat = soup.find("dl", "boccat")    
    # This line organizes what is found in the above line into a list of 
    # hrefs (i.e. links). 
    category_links = [BASE_URL + dd.a["href"] for dd in boccat.findAll("dd")]
    return category_links

EDIT 1: The solution above provides a general way to web-scrape, but I agree with the comments to the question. The API is definitely the way to go for this site. Thanks to yuvi for providing it. The API is available at https://github.com/500px/PxMagic.


EDIT 2: There is an example of your question regarding getting links to popular photos. The Python code from the example is pasted below. You will need to install the API library.

import fhp.api.five_hundred_px as f
import fhp.helpers.authentication as authentication
from pprint import pprint
key = authentication.get_consumer_key()
secret = authentication.get_consumer_secret()

client = f.FiveHundredPx(key, secret)
results = client.get_photos(feature='popular')

i = 0
PHOTOS_NEEDED = 2
for photo in results:
    pprint(photo)
    i += 1
    if i == PHOTOS_NEEDED:
        break