I found out how to retreive the html page of a topic from google search using a tutorial.This was given in the tutorial.
import mechanize
br = mechanize.Browser()
br.open('http://www.google.co.in')
br.select_form(nr = 0)
I understood till this that it retrieves the form.Then it was given that
br.form['q'] = 'search topic'
br.submit()
br.response.read()
This does output the html of the page related to the search topic. But my doubt is what should this parameter in br.form[parameter] be? Because I tried it for Google News and it gave a successful result.Can someone help me out?
It's the id of the form field, as given in the page source.
You can get the available id values like so:
import mechanize
br = mechanize.Browser()
br.open("http://www.google.com/")
for f in br.forms():
print f
which gives me:
<f GET http://www.google.ca/search application/x-www-form-urlencoded
<HiddenControl(ie=ISO-8859-1) (readonly)>
<HiddenControl(hl=en) (readonly)>
<HiddenControl(source=hp) (readonly)>
<TextControl(q=)>
<SubmitControl(btnG=Google Search) (readonly)>
<SubmitControl(btnI=I'm Feeling Lucky) (readonly)>
<HiddenControl(gbv=1) (readonly)>>
which says that:
There is only one form on the page
Hidden field id's are ie (page encoding), hl (language code), hp (? don't know), and gbv (also don't know).
The only not-hidden field id is q, which is a text input, which is the search text.