How to construct complex Google Web Search query?

JeanValjean picture JeanValjean · Apr 6, 2013 · Viewed 10.1k times · Source

Searching through the Web by using the Google search engine is a de facto standard for Internet users. Google provides a basic or an advanced form to prepare a query string to its search engine. Supposing to be interested in not using the web form, one can simply do an HTTP get request to the specific URL with a query string constructed upon the search conditions.

For instance I can search for results with word "hello" by doing an HTTP request at:

http://www.google.com/search?q=hello

I can add another word, e.g. "world", as follows:

http://www.google.com/search?q=hello+world

You know, the search can be more "complicated" by specifying nice parameters like:

  • or condition(s)
  • exact phrase(s)
  • search on specific domain(s)
  • avoid a specific word(s)
  • search with a specific language
  • limit search by geographical area
  • search for document type
  • etc.

How can I modify the query string to account for the above search parameters?

Answer

JeanValjean picture JeanValjean · Apr 7, 2013

I carefully examined the answers by Pratik Chowdhury and Robbie Vercammen. They provides a link to Web documents that report a list of possible textual filtering to be used within the Google search form. Despite this is interesting, they don't provide an answer to the question. Hence, I studied a lot the problem and I found the following solution.

Suppose that you need to make a una tantum HTTP call (e.g. by a PHP class runned via CRON once a month) to Google Search in order to retrieve the search results for a particular string query, e.g. all the pages with some words (i.e. "hello" and "world") in your website (i.e. mywebsite.com), then you can do an HTTP get call to the following address:

http://www.google.com/search?q=hello+world+site:mywebsite.com

The q parameter can contain the whole search query, however Google defined a dummy proof list of parameters.

Notice that the AND operator can be represented by the as_q parameter instead.

To get page results with one between "hello" and" world" (i.e. and OR), must be changed the query "q" parameter as:

q=hello+OR+world

while a more compact representation uses the as_oq parameter:

as_oq=hello+world

If one looks for the exact phrase "hello world", the q parameter is:

q="hello+world"

while, again, another compact representation uses the as_epq parameter:

as_epq=hello+world

If one looks for all the results that not contain the words "hello" and "world", the q parameter is:

q=-hello+-world

while, again, another compact representation uses the as_eq parameter:

as_eq=hello+world

Of course, as_q, as_oq, as_epq, as_eq, etc. can by combined in a unique search query as usual (i.e. by using the & character). Thus, for instance I can search for both words "hello" and "word" plus one between "programming" and "code" as follow here:

q=hello+world&as_oq=programming+code

One can search for a specific domain (again, mydomain.com) as follow:

as_sitesearch=mydomain.com

However, if you want to exclude a specific domain (e.g., because it is a spam source), you must recur to standard notation. E.g.:

q=hello+-site:mydomain.com

return all the pages with word "hello" that are not in site mydomain.com.

To get for a specific file type, e.g. a pdf, you can use as_filetype:

as_filetype=pdf

More complex search parameter can be used, as provided in Google support docs. For instance, to get also results with a synonym of a word, simply use the ~ operator in front of the word, e.g.

q=~hello

Moreover, if you want to use wildcards, e.g. to get all the exact phrases that start with "hello" and end with "world", you should use the * operator:

q="hello+*+world"

which probably will return something like: "hello to the world" and "hello sweet world".

One can also search for specific words inside the page title or in the page url by using the following keywords (read here for more details):

  • intitle
  • allintitle
  • inurl
  • allinurl

For instance, the following returns all the pages s.a. both words "hello" and "world" are in the url:

q=allinurl:hello+world

For the language of the Google GUI page (not the one of the results), one must insert into the query string the language string (e.g. en for English, fr for French, it for Italian, etc.) to the hl parameter. In other words, if one search with the English version of Google, the query string becomes as follow:

http://www.google.com/search?hl=en&q=hello+world+site:mywebsite.com

To select a specific language, e.g. Italian, use the lr query parameter:

lr=lang_it

One can also select pages published in a specific geographical region by using the cr parameter. E.g., to find all the pages published in Italy:

cr=countryIT