I'm looking for a batch or Powershell script to search for similar images on Google images using a local image as input.
My research so far
The syntax for a image search using a URL rather than a local file is as followes:
https://www.google.com/searchbyimage?image_url=TEST
where TEST can be replaced with any image URL you have.
I played with cURL for windows and imgur as temporary image saver. I was able to upload a file to imgur via batch. The image URL was then used to search similar images on Google.
But I wonder if it is possible without using any temporary cache like imgur or any other online picture service. Just a batch, curl, Google and me.
Just a thought. Is a VBS script maybe capable to search on Google Images with a local file as input?
Or are similar web services like Tineye better suited for that task?
This powershell snippet will open Googles Image Search.
$IE= new-object -com InternetExplorer.Application
$IE.navigate2("https://www.google.com/imghp?hl=en")
while ($IE.busy) {
sleep -milliseconds 50
}
$IE.visible=$true
The next steps would be to get the IDs of some buttons and click them programmatically to select the local file. But here I'm not experienced enough to achieve this.
Cool question! I spent far too much time tinkering with this, but I think finally got it :)
In a nutshell, you have to upload the raw bytes of your image, embedded and properly formatted along with some other stuff, to images.google.com/searchbyimage/upload
. The response to that request will contain a new URL which sends you to the actual results page.
This function will return back the results page URL. You can do whatever you want with it, but to simply open the results in a browser, pass it to Start-Process
.
Of course, Google could change the workflow for this at any time, so don't expect this script to work forever.
function Get-GoogleImageSearchUrl
{
param(
[Parameter(Mandatory = $true)]
[ValidateScript({ Test-Path $_ })]
[string] $ImagePath
)
# extract the image file name, without path
$fileName = Split-Path $imagePath -Leaf
# the request body has some boilerplate before the raw image bytes (part1) and some after (part2)
# note that $filename is included in part1
$part1 = @"
-----------------------------7dd2db3297c2202
Content-Disposition: form-data; name="encoded_image"; filename="$fileName"
Content-Type: image/jpeg
"@
$part2 = @"
-----------------------------7dd2db3297c2202
Content-Disposition: form-data; name="image_content"
-----------------------------7dd2db3297c2202--
"@
# grab the raw bytes composing the image file
$imageBytes = [Io.File]::ReadAllBytes($imagePath)
# the request body should sandwich the image bytes between the 2 boilerplate blocks
$encoding = New-Object Text.ASCIIEncoding
$data = $encoding.GetBytes($part1) + $imageBytes + $encoding.GetBytes($part2)
# create the HTTP request, populate headers
$request = [Net.HttpWebRequest] ([Net.HttpWebRequest]::Create('http://images.google.com/searchbyimage/upload'))
$request.Method = "POST"
$request.ContentType = 'multipart/form-data; boundary=---------------------------7dd2db3297c2202' # must match the delimiter in the body, above
$request.ContentLength = $data.Length
# don't automatically redirect to the results page, just take the response which points to it
$request.AllowAutoredirect = $false
# populate the request body
$stream = $request.GetRequestStream()
$stream.Write($data, 0, $data.Length)
$stream.Close()
# get response stream, which should contain a 302 redirect to the results page
$respStream = $request.GetResponse().GetResponseStream()
# pluck out the results page link that you would otherwise be redirected to
(New-Object Io.StreamReader $respStream).ReadToEnd() -match 'HREF\="([^"]+)"' | Out-Null
$matches[1]
}
Usage:
$url = Get-GoogleImageSearchUrl 'C:\somepic.jpg'
Start-Process $url
Here's some more detail. I'll basically just take you through the steps I took as I figured this out.
First, I just went ahead and did a local image search.
The URL it sends you to is very long (~1500 chars in the case of longcat), but not nearly long enough to fully encode the image (60KB). So you can tell right off the bat that it's more complex than simply doing something like a base64 encoding.
Next, I fired up Fiddler and looked at what's actually going on when you do a local image search. After browsing/selecting the image, you see some traffic to images.google.com/searchbyimage/upload
. Viewing that request in detail reveals the basic mechanism.
multipart/form-data
, and you need to specify what string of characters is separating the different fields (red boxes). If you Bing/Google around, you will find that multipart/form-data
is some kind of web standard, but it really doesn't matter for this example.encoded-image
field (green box).There are a few fields not shown here, way at the bottom. They aren't super interesting.
Once I figured out the basic workflow, it was only a matter of coding it up. I just copied the web request I saw in Fiddler as closely as I could, using standard .NET web request APIs. The answers to this SO question demonstrate the APIs you need in order to properly encode and send body data in a web request.
From some experimentation, I found that you only need the two body fields I included in my code (encoded_image
and image_content
). Going through the web UI includes more, but apparently they are not required.
More experimentation revealed that none of the other headers or cookies shown in Fiddler are really required.
For our purposes, we don't actually want to access the results page, only get a pointer to it. Thus we should set AllowAutoRedirect
to $false
. That way, Google's 302 redirect is given to us directly and we can extract the results page URL from it.
While writing this edit, I slapped my forehead and realized that Powershell v3 has the Invoke-WebRequest
cmdlet, which could potentially eliminate the need for the .NET web API calls. Unfortunately, I could not get it to work properly after tinkering for 10 min, so I gave up. Seems like some issue with the way the cmdlet is encoding the data, though I could be wrong.