If given a website like e.g. http://www.barchart.com/historicaldata.php, is there a way to fill in the text box and then click the submit button to download the data?
I'm use to using urllib
to download entire pages, but can seem to figure out how to submit text into the text box and then click the button, from my script.
There are two paths I can think of:
Selenium
It's possible to directly simulate filling in data and clicking the button using a great library called Selenium Webdriver
Using Selenium, you can open up a programmatic browser session and do all manner of things that a user would do. Combined with ghost browser, this can be done behind the scenes in a browser-independent way (useful if this is going to run on a server, which won't have chrome installed).
While an awesome library (fantastic for testing web pages) Selenium requires learning quite a bit. It's required if you specifically want to perform the action of filling out and clicking. But I think there might be an easier way to accomplish what you're trying to do using Python requests.
Requests
Python's requests library is another library for requesting data from pages. You can use it to submit a GET request (what the browser would be doing in the event of just visiting the page) or a POST request (where the browser sends its form data off to after you click submit).
To know which fields you want to send off data to, look at the page's HTML for each form field, and grab the "name" attribute.
If it weren't for the fact that your content seems to be paywalled, you could accomplish this pretty easily. For example, let's say your form has 3 fields to fill in, with name attributes consisting of 'start_date', 'end_date', and 'type'. You could accomplish this with the following:
import requests
url = "http://www.barchart.com/historicaldata.php/"
r = requests.post(url, data = {
'item1': 'one of the form fields',
'color': 'green',
'location': 'Boston, MA',
...
}
)
with open("~~DESIRED FILE LOCATION~~", "wb") as code:
code.write(r.content)
Because of the paywall, you'll have to log in first, and retain that session data. I defer explanation of how to do that to this excellent answer
EDIT: Possibly one more thing to note regarding where you should be submitting your data to. The url for where you should submit your POST data might be the same as the barchart url that you gave, but it also might not be. To find out, look at the "action" attribute of the HTML form object itself. 9 times out of 10, that's where the data is getting submitted. If the site does something wonky with Javascript, you might have to open up a console and examine where exactly data is getting sent upon submission. But that bridge can be crossed if/when needed.