How to extract meta description from urls using python?

Technologic27 picture Technologic27 · Jun 24, 2016 · Viewed 15.1k times · Source

I want to extract the title and description from the following website:

view-source:http://www.virginaustralia.com/au/en/bookings/flights/make-a-booking/

with the following snippet of source code:

<title>Book a Virgin Australia Flight | Virgin Australia
</title>
    <meta name="keywords" content="" />
        <meta name="description" content="Search for and book Virgin Australia and partner flights to Australian and international destinations." />

I want the title and meta content.

I used goose but it does not do a good job extracting. Here is my code:

website_title = [g.extract(url).title for url in clean_url_data]

and

website_meta_description=[g.extract(urlw).meta_description for urlw in clean_url_data] 

The result is empty

Answer

linpingta picture linpingta · Jun 24, 2016

Please check BeautifulSoup as solution.

For question above, you may use the following code to extract "description" info:

import requests
from bs4 import BeautifulSoup

url = 'http://www.virginaustralia.com/au/en/bookings/flights/make-a-booking/'
response = requests.get(url)
soup = BeautifulSoup(response.text)

metas = soup.find_all('meta')

print [ meta.attrs['content'] for meta in metas if 'name' in meta.attrs and meta.attrs['name'] == 'description' ]

output:

['Search for and book Virgin Australia and partner flights to Australian and international destinations.']