How can I unshorten a URL?

Andrew picture Andrew · Nov 17, 2010 · Viewed 20.8k times · Source

I want to be able to take a shortened or non-shortened URL and return its un-shortened form. How can I make a python program to do this?

Additional Clarification:

  • Case 1: shortened --> unshortened
  • Case 2: unshortened --> unshortened

e.g. bit.ly/silly in the input array should be google.com in the output array
e.g. google.com in the input array should be google.com in the output array

Answer

Adam Rosenfield picture Adam Rosenfield · Nov 17, 2010

Send an HTTP HEAD request to the URL and look at the response code. If the code is 30x, look at the Location header to get the unshortened URL. Otherwise, if the code is 20x, then the URL is not redirected; you probably also want to handle error codes (4xx and 5xx) in some fashion. For example:

# This is for Py2k.  For Py3k, use http.client and urllib.parse instead, and
# use // instead of / for the division
import httplib
import urlparse

def unshorten_url(url):
    parsed = urlparse.urlparse(url)
    h = httplib.HTTPConnection(parsed.netloc)
    h.request('HEAD', parsed.path)
    response = h.getresponse()
    if response.status/100 == 3 and response.getheader('Location'):
        return response.getheader('Location')
    else:
        return url