Google image search: How do I construct a reverse image search URL?

maks picture maks · Sep 28, 2011 · Viewed 16.8k times · Source

How can I programmatically through java convert an image to "some string" to pass it as a parameter for searching in google image search. Actually I have made some base64 convertion of image but it differs from that that google does in its image search engine. I've made such a convertion(java 7):

import javax.xml.bind.DatatypeConverter;
...
            Path p = Paths.get("my_photo.JPG");
            try(InputStream in = Files.newInputStream(p); 
                    PrintWriter write = new PrintWriter("base64.txt");
               ) {
                byte [] bytes = new byte[in.available()];
                in.read(bytes);
                String base64 = DatatypeConverter.printBase64Binary(bytes);
                write.println(base64);

            } catch(IOException ex) {
                ex.printStackTrace();
            }

the output of this simple program differs from the google's string in url. I talk about that string that goes after tbs=sbi:AMhZZ...

Answer

mikerobi picture mikerobi · Sep 28, 2011

This is my best guess for how the image search works:

The data in the URL is not an encoded form of the image. The data is an image fingerprint used for fuzzy matching.

You should notice that when you upload an image for searching, it is a 2 step process. The first step uploads the image via the url http://images.google.com/searchbyimage/upload. The Google server returns the fingerprint. The browser is then redirected to a search page with a query string based on the fingerprint.

Unless Google publishes the algorithm for generating the fingerprint, you will be unable to generate the search query string from within your application. Until then, you can have your application post the image to the upload URI. You should be able to parse the response and construct the query string.

EDIT

These are the keys and values sent to the server when I uploaded a file.

image_url       =
btnG            = Search
encoded_image   = // the binary image content goes here
image_content   =
filename        =
hl              = en
bih             = 507
biw             = 1920

"bih" and "biw" look like dimensions, but do not corrispond to the uploaded file.

Use this information at your own risk. It is an undocumented api that could change and break your application.