Extract URLs from Google’s Image SERPs

In my previous post I provided details of how to use my JavaScript based bookmarklet tool to extract a list of URLs from a Google web search. That’s all well and good but what about extracting URLs from a Google Image search? Is this possible? That’s exactly what David Radicke asked me over on the High Position blog.

I decided to accept David’s challenge and modify the JavaScript bookmarklet to create a version capable of extracting Image Search results. As a result I have created another bookmarklet which extract the following information from an image search:

  • Image Domain – the domain on which the image resides
  • Image URL – the full URL of the image
  • Source Domain – the domain which references the image
  • Source URL – the page on which the image is references

How to Extract Google’s Image Search URLs

  1. Drag and drop the following bookmarklet to you bookmarks toolbar. I’m using Chrome to do this but you should be able to do the same in most modern browsers.
  2. Drop Image SERPS Bookmarklet

  3. Go to Google Image Search and perform your search.
  4. With the list of image search results in front of you, click on the bookmarlet which you placed in your toolbar in step 1. As a result you should see a new window open containing something similar to:
    Image SERP Extractor

  5. Mission complete!

And that’s all there is to it. With any luck you’ll now be able to extract a list of Google Image Search URLs as well as a bunch of other relevant information.

Please note that Google often change the way in which Image results are served, so when make changes the bookmarlet needs to be updated. I aim to keep it updated where possible but if you notice anything not functioning as expected please let me know through the comments below.

The Extraction Bookmarklet Code

In case anyone wants to adapt the code here is the JavaScript code which I used to create the bookmarklet.

javascript:(function(){
	output='<html><head><title>SEO Image SERP Extraction Tool</title><style type=\'text/css\'>body,table{font-family:Tahoma,Verdana,Segoe,sans-serif;font-size:11px;color:#000}h1,h2,th{color:#405850}th{text-align:left}h2{font-size:11px;margin-bottom:3px}table.data{table-layout: fixed;width: 100%;border-collapse: collapse;}table.data th, table.data td {overflow: hidden;border-bottom:1px solid #9eb8b0;padding:4px;}table.data th.id, table.data td.id {width: 50px;}</style></head><body>';
    output+='<table><tbody><tr><td><a href=\'https://www.chrisains.com\'><img src=\'https://www.chrisains.com/wp-content/uploads/2015/06/chrisains.com-logo1.png\'></a></td><td><h1>Google Image Search URL Extractor</h1></td></tr></tbody></table>';
	
	pageAnchors=document.getElementsByClassName('rg_l');
    divClasses=document.getElementsByTagName('div');
	
	var linkcount=0;
    var imgURLList='';
    var geoDomain=window.location.host;
	
	output+='<table class=\'data\'><th class=\'id\'>ID</th><th>Image Domain</th><th>Image URL</th><th>Source Domain</th><th>Source URL</th>';
	
	for(i=0;i<pageAnchors.length;i++){
		linkcount++;
		var query = pageAnchors[i].href;
		var vars = query.split('&');
		var arr;
	
		for (var t=0;t<vars.length;t++){
			var pair = vars[t].split('=');
			var imgurl;
			var sourceurl;
		
			if(pair[0] == 'https://'+geoDomain+'/imgres?imgurl'){
				imgURLList+=decodeURIComponent(decodeURIComponent(pair[1]))+'<br />';
				imgurl=decodeURIComponent(decodeURIComponent(pair[1]));
			}
	
			if(pair[0] == 'http://'+geoDomain+'/imgres?imgurl'){
				imgURLList+=decodeURIComponent(decodeURIComponent(pair[1]))+'<br />';
				imgurl=decodeURIComponent(decodeURIComponent(pair[1]));
			}
			else if (pair[0] == 'imgrefurl'){
				sourceurl=decodeURIComponent(decodeURIComponent(pair[1]));
			}
		}
		
		output+='<tr>';
		output+='<td class=\'id\'>'+linkcount+'</td>';
	
		arr = imgurl.split('/');
	
		output+='<td style=\'width:80%;\'>'+arr[0] + '//' + arr[2]+'</td>';
		output+='<td>'+imgurl+'</td>';
		
		arr = sourceurl.split('/');
		
		output+='<td>'+arr[0] + '//' + arr[2]+'</td>';
		output+='<td>'+sourceurl+'</td>';
		output+='</tr>\n';
	}
	
	output+='</table><br/><h2>Image URL List</h2><div>';
	output+=imgURLList;
	output+='</div><br/><br/><p align=center><a href=\'https://www.chrisains.com\'>www.chrisains.com</a></p>';
	
	with(window.open()){document.write(output);
	
	document.close();
    }
})();

Many thanks for reading!

Postscript – July 2015

Unfortunately the bookmarklet script was initially configured to extract image URLs from Google.co.uk only, inevitably meaning that usage on any Google TLD outside of the UK would cause the script to fail.

Fortunately I have now updated the tool to quickly grab the host and automatically set this as the default which should resolve the issue.

If anyone does notice any failures please let me know! πŸ™‚

29 thoughts on “Extract URLs from Google’s Image SERPs

    • Hi Dee,

      I assume you are using the tool from outside of the UK? If so, apologies but the tool was configured for UK use only.

      I have now update the code and it should work in any version of Google. If you follow the instructions again to place the newer bookmarklet in your toolbar then run the script hopefully it’ll function as desired.

      If the problem persists please let me know.

      Thanks.

  • Hi Chris,

    Now that I found the latest version of the bookmarklet, I have installed it. Unfortunately, it simply doesn’t work.
    I have tried it in IE as well as firefox browsers.
    No URLs, no results. It doesn’t even open a window (popup blockers have been deactivated of course).
    Any idea what could be done about it?
    Wouldn’ it be much easier to put a website online which could do this?

    Thanks again,

    Jens

    • Hi Jens,

      Apologies for the inconvenience. The problem is that whenever Google make a change to they way in which they serve image results (i.e. the underlying code structure) this tool needs updating. Whether this were an online tool via the website or a bookmarklet it would still require manual updates to mirror Google’s methods.

      Unfortunately there are not enough hours in the day to monitor Google for every minor change (although I do try where possible) so it’s only through your valuable feedback I’m able to identify changes and modify the script accordingly.

      The issue which you experienced is due to Google switching how they serve results via HTTP or HTTPS. I’ve modified the script now so it’s work with both rather than one or the other.

      I hope that will suffice, for the time being at least! πŸ™‚

      Chris

  • Hi Chris,

    I can only get 6 pages (600 URL’s) to be displayed. I have tried this on multiple sites but there seems to be a limit set somewhere to only display the first 600 url’s.

    Any thoughts?

    Thanks
    Jerry

    • Hi Jerry,

      Very strange. With Google’s infinite scrolling on image SERPS I have not known there to be a particular limit on the number of images/URLs returned. For example an image search for “Chris Ainsworth” yields 840 results which the bookmarklet returns. I’ll experiment with a few queries though and analyse the results.

      When you say you’ve “tried this on multiple sites” can you elaborate at all?

      Kind regards,

      Chris

  • Hi Chris,

    Would it be possible to get the same result by passing the search string or the search url to the script instead of having to perform the search before running it ?

    Thanks !

    • Unfortunately not. The script relies on effectively scraping search results. There’s no way in this instance it can be done without first performing the search.

  • The tool is very useful!!

    Thank you implement the feature, making me save time to download images πŸ˜€

    You are so kind that the code is public to everyone.

  • Unfortunatelly it is broken again, all urls are encoded like this:
    https%3A%2F%2Fi.ytimg.com%2Fvi%2FFJmpxdkWXhM%2Fmaxresdefault.jpg//undefined

    I cannot figure out hot to decode them and remove that //undefined part at the end

  • Hi everyone,

    Thanks to Simba for pointing out the encoded URLs. It looks like Google was doing some strange double URL encoding.

    The script has now been updated to reflect the changes, using nested decodeURIComponent calls.

    Please keep me posted and continue to let me know if you have any issues.

    Thanks.

    Chris

  • Hello Chris,

    Thanks very much for your sharing.
    This method really helps for extraction from Google search!
    Just wondering, is there any open source codes available for Extraction Bookmarklet function, but not in JavaScript ? (like Java, python….)

    Thanks a lot !

    Yufen

  • hello Chris, I love your script, I have used it many times to download my favorite images from google, by the way do you have a script to Extract Bing’s Image Search URLs ? I will love it too.

    Regards,
    Girga

    • Hi Girga,

      Thanks for the feedback.

      As it stands I don’t have a similar tool for Bing, but that’s not to say that I couldn’t create one. I’ll add it to my to-do list πŸ˜‰

      Thanks.

    • Hi Jai,

      I’ve run the bookmarklet this morning and it appears to work fine.

      Can you provide any more information regarding your problem i.e.

      • Country/origin of search
      • Search query
      • Sample URL

      If you can provide that information I’ll happily take a closer look.

      Many thanks.

  • Hey Chris,

    Just wanted to let you know, your Script still WORKS!!!!

    Way to make a great little bookmarklet. And so useful too.

  • Hello Chris, thanks for the tool, it looks very useful. I tried for words like “catds”, “dogs”, “birds” and works perfect in Chrome, but when I try with words like “countries” or “countries america” simply doesn’t work, any idea?

  • Hi, i have a list of products in google sheet and want to extract google image search urls for each product. All done automatically. Can you help me please?

  • Many thanks for this tool. It however doesn’t seem to be working for me. I’ve tried multiple browsers and systems. It only displays the headings without any content.

  • Dear Chris, Your Great tool was working awesome, but i think it need your help again, Please update this great tool.

Leave a Reply

Your email address will not be published. Required fields are marked *