In my previous post I provided details of how to use my JavaScript based bookmarklet tool to extract a list of URLs from a Google web search. That’s all well and good but what about extracting URLs from a Google Image search? Is this possible? That’s exactly what David Radicke asked me over on the High Position blog.
I decided to accept David’s challenge and modify the JavaScript bookmarklet to create a version capable of extracting Image Search results. As a result I have created another bookmarklet which extract the following information from an image search:
- Image Domain β the domain on which the image resides
- Image URL β the full URL of the image
- Source Domain β the domain which references the image
- Source URL β the page on which the image is references
How to Extract Google’s Image Search URLs
- Drag and drop the following bookmarklet to you bookmarks toolbar. I’m using Chrome to do this but you should be able to do the same in most modern browsers.
- Go to Google Image Search and perform your search.
- With the list of image search results in front of you, click on the bookmarlet which you placed in your toolbar in step 1. As a result you should see a new window open containing something similar to:
- Mission complete!
And that’s all there is to it. With any luck you’ll now be able to extract a list of Google Image Search URLs as well as a bunch of other relevant information.
Please note that Google often change the way in which Image results are served, so when make changes the bookmarlet needs to be updated. I aim to keep it updated where possible but if you notice anything not functioning as expected please let me know through the comments below.
The Extraction Bookmarklet Code
In case anyone wants to adapt the code here is the JavaScript code which I used to create the bookmarklet.
javascript:(function(){ output='<html><head><title>SEO Image SERP Extraction Tool</title><style type=\'text/css\'>body,table{font-family:Tahoma,Verdana,Segoe,sans-serif;font-size:11px;color:#000}h1,h2,th{color:#405850}th{text-align:left}h2{font-size:11px;margin-bottom:3px}table.data{table-layout: fixed;width: 100%;border-collapse: collapse;}table.data th, table.data td {overflow: hidden;border-bottom:1px solid #9eb8b0;padding:4px;}table.data th.id, table.data td.id {width: 50px;}</style></head><body>'; output+='<table><tbody><tr><td><a href=\'https://www.chrisains.com\'><img src=\'https://www.chrisains.com/wp-content/uploads/2015/06/chrisains.com-logo1.png\'></a></td><td><h1>Google Image Search URL Extractor</h1></td></tr></tbody></table>'; pageAnchors=document.getElementsByClassName('rg_l'); divClasses=document.getElementsByTagName('div'); var linkcount=0; var imgURLList=''; var geoDomain=window.location.host; output+='<table class=\'data\'><th class=\'id\'>ID</th><th>Image Domain</th><th>Image URL</th><th>Source Domain</th><th>Source URL</th>'; for(i=0;i<pageAnchors.length;i++){ linkcount++; var query = pageAnchors[i].href; var vars = query.split('&'); var arr; for (var t=0;t<vars.length;t++){ var pair = vars[t].split('='); var imgurl; var sourceurl; if(pair[0] == 'https://'+geoDomain+'/imgres?imgurl'){ imgURLList+=decodeURIComponent(decodeURIComponent(pair[1]))+'<br />'; imgurl=decodeURIComponent(decodeURIComponent(pair[1])); } if(pair[0] == 'http://'+geoDomain+'/imgres?imgurl'){ imgURLList+=decodeURIComponent(decodeURIComponent(pair[1]))+'<br />'; imgurl=decodeURIComponent(decodeURIComponent(pair[1])); } else if (pair[0] == 'imgrefurl'){ sourceurl=decodeURIComponent(decodeURIComponent(pair[1])); } } output+='<tr>'; output+='<td class=\'id\'>'+linkcount+'</td>'; arr = imgurl.split('/'); output+='<td style=\'width:80%;\'>'+arr[0] + '//' + arr[2]+'</td>'; output+='<td>'+imgurl+'</td>'; arr = sourceurl.split('/'); output+='<td>'+arr[0] + '//' + arr[2]+'</td>'; output+='<td>'+sourceurl+'</td>'; output+='</tr>\n'; } output+='</table><br/><h2>Image URL List</h2><div>'; output+=imgURLList; output+='</div><br/><br/><p align=center><a href=\'https://www.chrisains.com\'>www.chrisains.com</a></p>'; with(window.open()){document.write(output); document.close(); } })();
Many thanks for reading!
Postscript – July 2015
Unfortunately the bookmarklet script was initially configured to extract image URLs from Google.co.uk only, inevitably meaning that usage on any Google TLD outside of the UK would cause the script to fail.
Fortunately I have now updated the tool to quickly grab the host and automatically set this as the default which should resolve the issue.
If anyone does notice any failures please let me know! π
I tried this using Firefox and Chrome. It doesn’t work on either browser
Hi Dee,
I assume you are using the tool from outside of the UK? If so, apologies but the tool was configured for UK use only.
I have now update the code and it should work in any version of Google. If you follow the instructions again to place the newer bookmarklet in your toolbar then run the script hopefully it’ll function as desired.
If the problem persists please let me know.
Thanks.
Hi Chris,
Now that I found the latest version of the bookmarklet, I have installed it. Unfortunately, it simply doesn’t work.
I have tried it in IE as well as firefox browsers.
No URLs, no results. It doesn’t even open a window (popup blockers have been deactivated of course).
Any idea what could be done about it?
Wouldn’ it be much easier to put a website online which could do this?
Thanks again,
Jens
Hi Jens,
Apologies for the inconvenience. The problem is that whenever Google make a change to they way in which they serve image results (i.e. the underlying code structure) this tool needs updating. Whether this were an online tool via the website or a bookmarklet it would still require manual updates to mirror Google’s methods.
Unfortunately there are not enough hours in the day to monitor Google for every minor change (although I do try where possible) so it’s only through your valuable feedback I’m able to identify changes and modify the script accordingly.
The issue which you experienced is due to Google switching how they serve results via HTTP or HTTPS. I’ve modified the script now so it’s work with both rather than one or the other.
I hope that will suffice, for the time being at least! π
Chris
Hi Chris,
I can only get 6 pages (600 URL’s) to be displayed. I have tried this on multiple sites but there seems to be a limit set somewhere to only display the first 600 url’s.
Any thoughts?
Thanks
Jerry
Hi Jerry,
Very strange. With Google’s infinite scrolling on image SERPS I have not known there to be a particular limit on the number of images/URLs returned. For example an image search for “Chris Ainsworth” yields 840 results which the bookmarklet returns. I’ll experiment with a few queries though and analyse the results.
When you say you’ve “tried this on multiple sites” can you elaborate at all?
Kind regards,
Chris
Hi Chris,
Would it be possible to get the same result by passing the search string or the search url to the script instead of having to perform the search before running it ?
Thanks !
Unfortunately not. The script relies on effectively scraping search results. There’s no way in this instance it can be done without first performing the search.
The tool is very useful!!
Thank you implement the feature, making me save time to download images π
You are so kind that the code is public to everyone.
Unfortunatelly it is broken again, all urls are encoded like this:
https%3A%2F%2Fi.ytimg.com%2Fvi%2FFJmpxdkWXhM%2Fmaxresdefault.jpg//undefined
I cannot figure out hot to decode them and remove that //undefined part at the end
Hi Simba,
Apologies for the delayed reply. I’ve tested the tool and it appears to function correctly. Can you elaborate on the process you took which resulted in the encoded URLs?
Thanks.
Hello
The urls are still encoded.
Do a simple search for “cats”:
https://www.google.com/search?q=cats&tbm=isch
Result:
http://prntscr.com/b1ml4a
Thanks.
Hi Simba,
I see what you mean. Looks like Google have changed something again. Leave it with me, I’ll try to update the tool ASAP.
Many thanks,
Chris
Hi everyone,
Thanks to Simba for pointing out the encoded URLs. It looks like Google was doing some strange double URL encoding.
The script has now been updated to reflect the changes, using nested decodeURIComponent calls.
Please keep me posted and continue to let me know if you have any issues.
Thanks.
Chris
Hello Chris,
Thanks very much for your sharing.
This method really helps for extraction from Google search!
Just wondering, is there any open source codes available for Extraction Bookmarklet function, but not in JavaScript ? (like Java, python….)
Thanks a lot !
Yufen
Not as far as I’m aware Yufen, but if you find any please let me know!
hello Chris, I love your script, I have used it many times to download my favorite images from google, by the way do you have a script to Extract Bingβs Image Search URLs ? I will love it too.
Regards,
Girga
Hi Girga,
Thanks for the feedback.
As it stands I don’t have a similar tool for Bing, but that’s not to say that I couldn’t create one. I’ll add it to my to-do list π
Thanks.
Thank you Chris. The bookmarklet is really helpful and saved my lots of time.
Thank you again,
Shoaib
Hey Chris,
The tool works amazingly, love it!!
thanks a lot.
Hello Chris,
Right now your Tool is not working please update this Image extractor tool.
please update asap
Thanks
Hi Jai,
I’ve run the bookmarklet this morning and it appears to work fine.
Can you provide any more information regarding your problem i.e.
If you can provide that information I’ll happily take a closer look.
Many thanks.
Hey, I think it may be down again!
Hey Chris,
Just wanted to let you know, your Script still WORKS!!!!
Way to make a great little bookmarklet. And so useful too.
Hello Chris, thanks for the tool, it looks very useful. I tried for words like “catds”, “dogs”, “birds” and works perfect in Chrome, but when I try with words like “countries” or “countries america” simply doesn’t work, any idea?
Hi, i have a list of products in google sheet and want to extract google image search urls for each product. All done automatically. Can you help me please?
Chris many thanks from Spain, all of my community are we thankfull with you and your work. Thx a lot for create thsi tool.
Many thanks for this tool. It however doesn’t seem to be working for me. I’ve tried multiple browsers and systems. It only displays the headings without any content.
Dear Chris, Your Great tool was working awesome, but i think it need your help again, Please update this great tool.