Every now and then you may want to extract a list of URLs from a Google web search for a particular search query.
The most common reason for this (in my experience at least) is to obtain a list of all URLs which Google has indexed for your particular domain. This process itself can be useful for a number of reasons from analysing the visible Titles and Meta Descriptions to searching for indexation of rogue or redundant URLs.
Power users and webmasters will know that it is difficult to get a definitive list of indexed URLs directly from Google. Google Search Console (previously Webmaster Tools) offers an ‘Index Status’ which provides insight into the number of URLs indexed, historic trends and various filters. They also provide insight into the proportion of URLs index from any submitted sitemaps. But neither actually provide a definitive list of the URLs which Google has indexed for your domain.
You can of course extract the data manually using the ‘site:’ search operator and copy/pasting the results. That’s fine if you’re operating a relatively small number of URLs but trying to adopt this approach for any more than 10 results can become somewhat tiresome – even extracting 10 results manually can be a bore!
So what if I told you it’s possible to extract a list of URLs from SERPs with a few clicks? You wouldn’t believe me, right? Wrong! With this little bookmarklet which I originally adapted for High Position from a similar tool by Liam Delahunty you’ll be able to extract URL and anchor text information with minimal effort.
How to Extract Google’s Web Search URLs
Here’s how to do it.
- Use Chrome. If you’re not using Chrome you can download it here. If you’d prefer not to use Chrome the bookmarlet itself will work with other browsers, but we’ll be using a plugin not available in other browsers (to my knowledge!).
- Install the ginfinity plugin for Chrome. This will un-restrict the number of search results per page by seamlessly appending the next page of search results to the current list of SERPS – in essence creating infinite scrolling of SERPs.
- Go to Google and perform your search query. If you are extracting URLs for your domain use the Site search operator e.g. ‘site:chrisains.com’.
- If you’re working with a large website with hundreds of URLs you’ll probably benefit from increasing the number of search ‘Results Per Page’. By default Google is set to return 10 results but you can override this to a maximium of 100. To do this:
This will limit the number of queries against Google search results. Let’s say for example that your domain has 200 pages indexed. At 10 results a page you would need to query Google 20 times, but at 100 results a page you’ll only query Google twice. This will limit the chances of warning messages about ‘unusual traffic from your computer network’ which you can receive if persistently query Google.
- Next, go back to the Google search results page for your particular query. If you are indeed querying results of over 100 you should see gInfinity append the next page of search results to the current page:
Keep scrolling until you have a single page containing all search results for your query.
- Drag and drop the following bookmarklet to your ‘Bookmarks’ Toolbar in Chrome (you can also do this with most modern browsers).
- Mission Complete!
That’s it. You should now be able to extract a list of all website URLs indexed within Google for your particular domain.
Please bear in mind that Google are continually modifying the way in which they display search results. This means that the coding used to create the bookmarklet may cease to function as and when Google release changes. When this occurs I’ll try my best to update the code but if anyone notices it not working please let me know!
The Extraction Bookmarklet Code
Thanks for reading.