Top "Scrape" questions

DO NOT USE THIS TAG.

Extract / Identify Tables from PDF python

Are there any open source libraries that support table identification & extraction? By this I mean: Identify a table structure …

python pdf scrape pdf-parsing pdf-scraping
Parse Web Site HTML with JAVA

I want to parse a simple web site and scrape information from that web site. I used to parse XML …

java html scrape
curl 302 redirect not working (command line)

In the browser, navigating to this URL initiates a 302 (moved temporarily) request which in turn downloads a file. http://www.…

bash curl scrape
How can I input data into a webpage to scrape the resulting output using Python?

I am familiar with BeautifulSoup and urllib2 to scrape data from a webpage. However, what if a parameter needs to …

python scrape
PHP Curl following redirects

I'm trying to be a bit sneeky and as part of a learning process try and improve my page scraping …

php curl scrape
Export google search to a spreadsheet

Is it possible for me to create a list of google search results from a specific query and export it …

excel google-search scrape
Scrapy, only follow internal URLS but extract all links found

I want to get all external links from a given website using Scrapy. Using the following code the spider crawls …

python scrapy web-crawler scrape scrapy-spider
simple script to check if a webpage has been updated

There is some information that I am waiting for on a website. I do not wish to check it every …

bash web scrape
Reading data from PDF files into R

Is that even possible!?! I have a bunch of legacy reports that I need to import into a database. However, …

linux r pdf scrape pdf-scraping
Find next siblings until a certain one using beautifulsoup

The webpage is something like this: <h2>section1</h2> <p>article</p> &…

python find beautifulsoup scrape siblings