Every now and then you may want to extract a list of URLs from a Google web search for a particular search query.
The most common reason for this (in my experience at least) is to obtain a list of all URLs which Google has indexed for your particular domain. This process itself can be useful for a number of reasons from analysing the visible Titles and Meta Descriptions to searching for indexation of rogue or redundant URLs.
Power users and webmasters will know that it is difficult to get a definitive list of indexed URLs directly from Google. Google Search Console (previously Webmaster Tools) offers an ‘Index Status’ which provides insight into the number of URLs indexed, historic trends and various filters. They also provide insight into the proportion of URLs index from any submitted sitemaps. But neither actually provide a definitive list of the URLs which Google has indexed for your domain.
You can of course extract the data manually using the ‘site:’ search operator and copy/pasting the results. That’s fine if you’re operating a relatively small number of URLs but trying to adopt this approach for any more than 10 results can become somewhat tiresome – even extracting 10 results manually can be a bore!
So what if I told you it’s possible to extract a list of URLs from SERPs with a few clicks? You wouldn’t believe me, right? Wrong! With this little bookmarklet which I originally adapted for High Position from a similar tool by Liam Delahunty you’ll be able to extract URL and anchor text information with minimal effort.
How to Extract Google’s Web Search URLs
I’ll start by saying there is nothing magic or malicious about this approach. We’ll be utilising a JavaScript bookmarlet to process the search results provided by Google in combination with a nifty Chrome plugin to seamlessly scroll multiple pages of search results.
Here’s how to do it.
- Use Chrome. If you’re not using Chrome you can download it here. If you’d prefer not to use Chrome the bookmarlet itself will work with other browsers, but we’ll be using a plugin not available in other browsers (to my knowledge!).
- Install the ginfinity plugin for Chrome. This will un-restrict the number of search results per page by seamlessly appending the next page of search results to the current list of SERPS – in essence creating infinite scrolling of SERPs.
- Go to Google and perform your search query. If you are extracting URLs for your domain use the Site search operator e.g. ‘site:chrisains.com’.
- If you’re working with a large website with hundreds of URLs you’ll probably benefit from increasing the number of search ‘Results Per Page’. By default Google is set to return 10 results but you can override this to a maximium of 100. To do this:
This will limit the number of queries against Google search results. Let’s say for example that your domain has 200 pages indexed. At 10 results a page you would need to query Google 20 times, but at 100 results a page you’ll only query Google twice. This will limit the chances of warning messages about ‘unusual traffic from your computer network’ which you can receive if persistently query Google. - Next, go back to the Google search results page for your particular query. If you are indeed querying results of over 100 you should see gInfinity append the next page of search results to the current page:
Keep scrolling until you have a single page containing all search results for your query.
- Drag and drop the following bookmarklet to your ‘Bookmarks’ Toolbar in Chrome (you can also do this with most modern browsers).
- Once you’ve placed the bookmarklet in your toolbar, and making sure you have your list of Google SERPS in front of you, click on the bookmarlet. The JavaScripted based bookmarklet will open in a new window and display all URLs and anchor texts in a list
- Mission Complete!
That’s it. You should now be able to extract a list of all website URLs indexed within Google for your particular domain.
Please bear in mind that Google are continually modifying the way in which they display search results. This means that the coding used to create the bookmarklet may cease to function as and when Google release changes. When this occurs I’ll try my best to update the code but if anyone notices it not working please let me know!
The Extraction Bookmarklet Code
In case anyone wants to adapt the code here is the JavaScript code which I used to create the bookmarklet.
javascript:(function(){ output='<html><head><title>SEO SERP Extraction Tool</title><style type=\'text/css\'>body,table{font-family:Tahoma,Verdana,Segoe,sans-serif;font-size:11px;color:#000}h1,h2,th{color:#405850}th{text-align:left}h2{font-size:11px;margin-bottom:3px}</style></head><body>'; output+="<table><tbody><tr><td><a href=\'https://www.chrisains.com\'><img src=\'https://www.chrisains.com/wp-content/uploads/2015/06/chrisains.com-logo1.png\'></a></td><td><h1>SEO SERP Extraction Tool</h1></td></tr></tbody></table>"; pageAnchors=document.getElementsByTagName('a'); divClasses=document.getElementsByTagName('div'); var%20linkcount=0;var%20linkLocation=''; var%20linkAnchorText=''; output+='<table><th>ID</th><th>Link</th><th>Anchor</th>'; for(i=0;i<pageAnchors.length;i++){ if(pageAnchors[i].parentNode.parentNode.getAttribute('class')!='iUh30'){ var%20anchorText%20=%20pageAnchors[i].textContent; var%20anchorLink%20=%20pageAnchors[i].href; var%20linkAnchor%20=%20anchorLink%20+%20'\t'+anchorText; var%20anchorID%20=%20pageAnchors[i].id; if(anchorLink!=''){ if(anchorLink.match(/^((?!google\.|cache|blogger.com|\.yahoo\.|youtube\.com\/\?gl=|youtube\.com\/results|javascript:|api\.technorati\.com|botw\.org\/search|del\.icio\.us\/url\/check|digg\.com\/search|search\.twitter\.com\/search|search\.yahoo\.com\/search|siteanalytics\.compete\.com|tools\.seobook\.com\/general\/keyword\/suggestions|web\.archive\.org\/web\/|whois\.domaintools\.com|www\.alexa\.com\/data\/details\/main|www\.bloglines\.com\/search|www\.majesticseo\.com\/search\.php|www\.semrush\.com\/info\/|www\.semrush\.com\/search\.php|www\.stumbleupon\.com\/url|wikipedia.org\/wiki\/Special:Search).)*$/i)){ if(anchorID.match(/^((?!hdtb_more|hdtb_tls|uh_hl).)*$/i)){ linkLocation+=anchorLink+'<br%20/>'; linkAnchorText+=anchorText+'<br%20/>'; linkcount++; if%20(anchorText%20===%20undefined)%20anchorText%20=%20pageAnchors[i].innerText;output+='<tr>'; output+='<td>'+linkcount+'</td>'; output+='<td>'+pageAnchors[i].href+'</a></td>'; output+='<td>'+anchorText+'</td>'; output+='</tr>\n'; } } } } } output+='</table><br/><h2>URL%20List</h2><div>'; output+=linkLocation;output+='</div><br/><h2>Anchor%20Text%20List</h2><div>'; output+=linkAnchorText;output+='<br/>%C2%A0<br/><p%20align=center><a%20href=\'https://www.chrisains.com\'>www.chrisains.com</a></p>'; with(window.open()){document.write(output);document.close();}})();
Thanks for reading.
Hey Chris
this was great info.
I request you to help me with the knowledge more on
How to exactly use the extractor bookmarker code ?
I will wait for your kind help.
regards
jitesh
Hi Jitesh,
If you’re using/adapting the code provided (as opposed to using the pre-made bookmarklet provided) you’ll simply need to create a bookmark in a browser of your choice and enter the source code as the trigger URL.
If you need any more information please let me know.
Thanks,
Chris
Omg Chris, this works great, you have no idea how much work you saved me!
There are so many shady companies trying to get you to install their plugins to do something so simple (while catching your email and hopefully your purse too) and then it’s that easy.
Big thumbs up for giving this away for free, you’re my hero! 🙂
You’re very welcome Stefan, I’m glad you find the tool useful 🙂
Hey Chris, Its not working anymore, google changed the algo, could you please fix it?
Thanks you for your effort!
Hi Manuel,
Can you provide any further information at all? I have just tried the bookmarlet via various search criteria on various Google language (i.e. Google.ca, Google.com.au etc) and it seems to work fine?
Thanks in advance.
Chris
Hello Chris,
I am trying to get all urls into a txt or array from google search . My code does the search part but i also need urls . Could you help me about this ? Here is my code:
Google Web Search API
//
Thank you.
This worked perfectly. Thanks, Chris!
No problem Louise. Happy to help 🙂
Great bookmark, Saves me a lot of time 🙂 And thanks for sharing the code, I think i’ll try to add some custom code to it to meet my needs.
Thanks Chris 🙂 It works and works super awesome. Simple & quick 😉
Nice one Chris!
This has become one of my favourite SEO tools. MANY THANKS!
Awesome little tool, You’ve made my day Chris, thanks
Wow this saved me a LOT of time!
I’m a bit confused however about how Google is adding thing up. When I site:www.notmyrealsite.com it says “About 802 results”.
I click through copying each page of 10 results (I left results at 10 to be sure what was going on) and when I get to page 19 and it says “Page 19 of 187 results”.
I copy/pasted links from each page into excel and got 200 results.
So what’s going on? Is that “About 802 results” an unreliable number? I was feeling good about 800 or so link to my site, to find out it was only 200 pfft…
To try and validate the numbers above I went to Google Search Console > Search Traffic > Internal Links > Gives me 307 results.
I’m migrating an old website to a new one and want to keep as many of my Google links working as possible. Any advice appreciated!
Hi Grant,
Thanks for the comment. There’s a couple of good questions there.
Firstly, the estimated number of results (i.e. “About 802 Results”) is exactly that – an estimate. There have been numerous discussions surrounding this over the years from both industry experts and Google themselves. I would recommend watching this video from Google’s Matt Cutts in 2010 as it provides a bit more information – https://www.youtube.com/watch?v=2ix3mHeL7hg.
Google may also opt not to show all indexed URLs, often because similar results are contained within the supplemental index (often masked by the “In order to show you the most relevant results, we have omitted some entries very similar to the XX already displayed. If you like, you can repeat the search with the omitted results included.” message) or because Google just don’t want to show all results!
For an accurate indication of the number of pages indexed I would use Google Search Console; however it looks like you’re viewing the wrong into here Grant…
The ‘Internal Links Report’ which you viewed (Google Search Console > Search Traffic > Internal Links) provides information on your internal linking structure – not the number of indexed URLs. Here’s Google’s information on the ‘Internal Links Report’
“The Internal Links page lists a sample of pages on your site that have incoming links from other internal pages.”
You can read more here – https://support.google.com/webmasters/answer/138752?hl=en.
What you need is the ‘Index Status Report’ (Google Search Console > Google Index > Index Status). This will provide a more accurate representation of the number of pages that Google has indexed for your domain (subject to the domain verified).
“Shows the total URLs available to appear in search results, along with other URLs Google might discover by other means.”
Again you can read more on this here – https://support.google.com/webmasters/answer/2642366?hl=en
However, whilst the ‘Index Status Report’ will provide an indication of the number of URLs indexed, Google will (rather stupidly) not provide a list of those URLs – hence why I built this tool to help 🙂
Another way to get a list of URL, especially if you’re migrating websites, is to use the data available via Google Analytics. Website migration is a complex process, so if you need any help give me a shout.
Let me know if you have any other questions.
Thanks.
Chris
Great tool. is it possible to add export .csv file containing title and url same showing on the page of extraction? So that it becomes easy to extract urls
Hi Rajesh,
I can certainly look at creating a CSV download option in the future. For the time being though, you can just copy and paste to Excel 🙂
Thanks.
Thanks chris. If you will create that button, you are going to save my lots and lots of time. As i do search more than 500 keywords using your tool. Atleast it will save my 4 or 5 hours a days. Thanks. 🙂 🙂
Hi Rajesh,
the Cool Seo Tool extension for Chrome can export Serps to .csv.
Hope it helps.
Will this work on a mac
As long as you’re using Chrome or a browser capable of running bookmarklets then yes.
Hi Chris, I have tried to install this on Chrome, but it will not download when I hit the green download button, I have tried before, but give up. Now I am trying again, but still nothing
Hey Chris, Thanks for this amazing tutorial. Please keep posting such amazing hacks. Thanks a ton! 🙂
Just posted this on SEO G+ community and recommended a client to try it out. Will have a go when I find some time to dig into some long SERPs.
Looks promising!
Super-duper Chris, finally had the time and need for this, works very well!
Hi chris, I just want to thank you. Smart and usefull. CSV option would be great. thanks again for sharing with us.
Chris, nice one, easy and it works, I have been spreading it around and also summarised it on my tech blog ( properly attributed to you of course )
So very rarely comment on things like this, but it was so useful that I really wanted to say thank you.
Giving an ID number and the title tag and making it very easy copy-pastable into Excel, and not cluttering the output make me even more grateful.
Thank you!
Hi Chris.
Your article and especially Google SERPs Extractor are amazing!
Sometimes after SEO-audit I send clients to programmers for they pars and find out what are exactly those problem URLs. Other way – It took a long time of manual work.
Now I will use your tool! Thanks.
But you didn’t mention one more important purpose for SERPs Еxtractor – online research tasks!
For example, now I have 30 keywords and use your tool to save a list of TOP ranking competitors in Google for each keyword.
I’ll not explain the rest, guess you know what I’m talking about 🙂
So in this case it would be great if you add more options in this tool.
Just some useful ideas for future:
– Ignore Ads (only show Ads) – it’s very noisy cleaning them now 🙂
– Ignore Google Knowledge Graph results (only show those results)
– Only show URLs which have rich snippets (or ignore)
I hope those options will be useful for different analytics purposes.
Does the Extraction Tool still work? I’m using Chrome and it’s taking me the results for your website. Thanks.
Hi Brian,
Yes it still works. I’ve tested it and it works fine for me. The only time it should produce results for my website is if you perform the ‘site:’ search using my domain, so not sure how you got those results?
Thanks.
Hi Chris, great tool. It would have been easy if you can make a video demo. 😀
Hey Chris – great tool works like a charm. Thanks for sharing!!
Excelente post Chris, muchas gracias por el aporte
I test it today and it seem like its not working again…
Hi Collen,
Can you confirm whether this is still the case? I’ve tested it in Google UK and .com and it appears to work correctly to me.
Thanks.
Tried with google uk and still the same thing.
That’s strange. I’ve tested the bookmarklet script against various queries this morning and it works fine for me.
Can you provide the search query through which you are experiencing the problems?
Hi Chris! I just want to thank you!
Hello Chris. I am unfamiliar with book mark syntax. Could you publish a normal javascript implementation? I am trying to adapt this code to insert the url list in a page.
Hi Nate,
The bookmarlet is in “normal” JavaScript syntax, but runs as a browser bookmarklet format to allow for direct extraction from SERPS. Have you reviewed the code utilised?!
Doing this in-page would mean actually scraping search results which is not the objective of this post; and I don’t think you could do this in JavaScript alone, you’d need to rely on an advanced scripting language such as PHP and lean on a function such as ‘file_get_contents’.
Anyway, if you read the post and use the bookmarklet provided, it will insert the URLs into a page 😛
Thanks.
really useful!
thanks a lot
Hi all,
FYI I have published an updated the script today.
In summary, Google now hyperlinks certain display URLs when the site is running under HTTPS. This was causing duplicate and truncated URLs to be served via the bookmarklet. I’ve now stripped this second entry so it should appear in the URL lists.
Thanks to Sam Pennington for reporting the issue.
Let me know if anyone noticed any further issues!
Many thanks 🙂
Awesome – works perfectly again – thank you Chris!
Hey Chris! Are you noticing that a bunch of extra things are being extracted? For example, YouTube and paid search results?
Yes, looks like Google have changed the mark-up again. I’ll work on revising the tool ASAP.
Hey Chris.
Just stumbled across this site and tried it out, and I can confirm that it works perfectly on https://www.google.de/. Just used it today for my latest project before going live to gather old URLs and create proper URL rewrites.
Thanks a bunch, cheers
Btw, used this on Firefox 52.0
Thanks, Chris. it was helpful.
Thank you Chris for this tool and for keeping it updated!
This post seriously saved my life. My server was hacked and had to bulk remove urls from google for over one hundred sites.
This post together with https://github.com/noitcudni/google-webmaster-tools-bulk-url-removal really saved me from hell!
Thanks!!
Thank you, Chris for this wonderful bookmarklet. It is working fine.
Chris, you are a legend. This is exactly what I needed. I was thinking to create something myself as this is was driving me crazy :).
Thanks Chris, works like a charm 🙂 Weird that major SEO tools lack this usefull feature
Thank you Chris, very helpful
Holy amazing! This made my afternoon sooooo much better!!
Thank you!
It’s Awesomeeeeeee :* Thanks Alot for this great tool… Looking forward to seeing more interesting stuff from you 🙂
Chris!! You bloody beauti
This is fantastic…
Thanks for taking the time to help us all with such an excellent product
Awesome. Thanks Chris! I hope you don’t mind if I make this into a Safari Plugin 🙂
Good tips. This is very helpful for me. And l like your sharing very much.
Awesome Chris! I hope you can extract the description too in your next update.
Hi,Chris,
“Drag and drop the following bookmarklet to your ‘Bookmarks’ Toolbar in Chrome (you can also do this with most modern browsers).
Google SERPs Extractor”
After I did this,there is no bookmarks in Chrome.
Pls tell me how to do ?
Thank you!
If you’re using Chrome, pressing Ctrl+Shift+B should display the bookmark bar (on a Windows machines) then you should simply be able to drag-and-drop the ‘Google SERPs Extractor’ link into your bookmark bar.
Let me know if you have any further issues.
Hi all,
In case anyone is interested, it looks like Google may be finally addressing this issue directly with the new “Index Status” reports in Search Console.
See my tweets for more information:
Hi,
do you know if this is only in the US for the moment and if will be released in Europe too? I cannot see it on Search Console. Do you know of any articles about it?
Thanks
Hi Teresa,
It’s a limited trial at the moment – I’m not even sure I was meant to share it publicly (whoops!).
I’m sure it’ll probably roll-out in due course.
Thanks.
Thank you Chris for your valuable blog. it solved a lot of time of mine while creating placement targeting campaign.
Hi Chris,
Is it possible to have a fourth column showing the snippets (i.e. the result in grey text) if there is a snippet?
Thanks
Tom
Thank you!
You are awesome man, you helped me a lot. No other software is able to do this.
Thank you very much Chris.
You are awesome man, you helped me a lot. No other software is able to do this. thank you very much.
I lost this tool during a chrome update…so pleased to have the bookmarklet back.
Cheers Chris!
Chris, Thank You, Thank You, Thank You! Perfect! Thank You! Best, Jack
Chris,
We’ve had an indexing problem with our site and your tool has proved invaluable in getting it sorted out. One problem I’m hitting is that site: displays over 700 entries (8 pages at 100/page) but the tool only lists 415 URLs. Similarly, if we filter the results (-AUX/ site:) we get 3 pages, but only 134 URLs. The problems we’ve had involve “duplicate” links causing the links we want to not show up in the index. Any ideas you have on things I can check/change in the javascript to see why the number of entries don’t match up would be appreciated.
Thanks,
Bill
I had this on an older PC I wiped and lost the bookmark. Just spent 15 minutes looking for it and found it again – very useful and great script. Thank you!
When I try to save google SERL based on the tutorial, it doesn’t work from step 5.
The website I search has more than 2000 page, after scroll to page 4, It stops.
I have no idea how to “drag and drop” following book marklet to my “Bookmarks”.
Anyone meet the same situation?
I tried too without success, obviously I do not do something right or there is any change in the google algorithm. Chris tell me if they works for you.
Mee too, maybe ’cause I’m on a Mac?
Hello Chris,
Thank you so much, this is a great tool, and easy to set up too!
The only problem I’m having when using it is that I systematically get a duplicated link after each original link in the list.
The Anchor is not duplicated.
Might there be some redundant search in the javascript?
Thanks again!
Fabrice
Hi Fabrice ,
I had a routine in place which stripped the duplicates, but as Google is constantly changing the underlying code of their search results it’s a job to keep up. I’ve updated the bookmarklet so it should de-duplicate the results…for the time being at least!
Thanks,
Chris
Thank You for your awesomeness! Greatest tool, so super easy it’s crazy. I also have the dupes but i stuck it in textmechanic.com/text-tools/basic-text-tools/remove-duplicate-lines/ — takes 1 second.
THANK YOU!
Steve
Hi Steve,
The results should be de-duplicated now, subject to Google changing the coding again!
Thanks,
Chris
Good job at keeping this bookmarklet updated as most similar scraping techniques are out not working in just a few months. Keeping up with Google is not an easy task 🙂
Dear Chris
Thnx for awesome solution.
Code is not working normally now. The problem is duplicated links and not written links.
Hey Chris,
Was looking for a tool exactly like this one!
Yours is awesome and super fast!
Many thanks!
Best
G.
From one SEO Brit to another thanks so much for this, worked a treat!
Cool !!! It’s working great! Thank you, Chris!
Just weighing in to say that this was the most helpful result I could have asked for. THANK YOU!!
Thanks, Chris. It is one of the useful tools.
Thank you. It works well. Duplicates are only for the first page (first 10 results or so). Weird.
Anyway good work, Chris!
your an absolute legend m8
Really great tool, I love it!
You have saved me many days with this.
Is there a new limit now from Google? I can only see the first 3 pages á 100, even if it´s maybe 10K+ total results acording to Google.
Time for an update again? 🙂
um…. sweet tools. it has really made my work easy … really appreciate for giving it out free your are a legend .. thanks
really nice tool …u really a legend thanks
It seems Google has caught on. I used site:*.heritageparts.com/* , which shows 470,000 results. However, I am only able to see 300 of those results and no more. Any idea around that?
Hi Cameron,
It’s common for Google to only provide a sub-set of result within search when using the site: operator.
To be fair this post is a little dated now. You can now use the newer version of Google Search Console to see various list of URLs that Google has identified for your domain and whether they’re indexed or not. I’d recommend checking it out if you haven’t already done so.
Thanks.
Very usefull! Thanks! 🙂
Great tool. I’m working now on something similar, but your option is still a great solution in 2019!
This is really very helpful works great, thanks so much Chris! for this nice tiny tool but worked big 🙂
Thanks for the utility, Chris! It works perfectly for my use except for one thing:
The URL list that’s generated contains URLs both “plain” and what I might call “extended.” (I’m sure there are proper terms for what I’m referring to!) I just want the plain ones.
The list might, for example, contain:
http://www.xyzcatering.ca/
http://www.xyzcatering.ca/menu/dinner/
http://www.xyzcatering.ca/services/
How do I get only http://www.whiskcatering.ca/ to show up in the list?
Hi Ian,
Can you clarify exactly what you’re trying to achieve with the tool?
The core purpose of the tool is to extract each of the URLs listed in search results for a given query; so naturally, if the “extended” URLs are present in search results, the tool will extract those.
If you want to know the domains listed for a given search query (i.e if you wanted to strip ‘/menu/dinner/’ leaving only ‘xyzcatering.ca’, you could easily achieve this by pasting the list of URLs into Excel and running a formula to extract the domain (I can provide the formula if required).
I could adapt the tool to do that for you, but the tool is designed to extract results for a single domain (using the ‘site:’ operator), so it would kinda defeat the object of the tool if I changed it 🙂
Thanks,
Chris
Very cool… thanks Chris.
good work my dude thx
Thanks for the details’ instruction. I stopped in step 6, I can’t understand that how can I bookmark the code, drag and drop for step 7 or final stage. Please, make it clear Chris!
Can you confirm what operating system and web browser you are using?
If you’re using Windows and Chrome, Ctrl+Shift+B will toggle the ‘bookmarks’ bar on and off. You should literally be able to bookmarklet in this post (where is says ‘Google SERPs Extractor’ in the rectangular box) and drop this into the bookmark bar.
It may vary slightly on other operating systems and web browsers, but it’ll be a similar principle.
Hi – Installing the bookmarklet doesn’t work on mac.
@Chris
First let me thank you for sharing this magic wand.
I first tried this on Vivaldi but it didn’t work, then tried on Firefox (v65.0.2) and then lo and beholed the fairies appeared!
Chris you’re a wizard. Thank you!
Thank you very much for sharing this work Chris. It was really helpful for me and it still working like a charm.
Cheers!
Thanks for the great article!
But I have a question, is this method still valid in 2019?
Excellent post Chris! Keep up the good work!
This is a really helpful article Chris on Extracting URL’s from google. Smart and easy to use tool
Hi Chris,
Thankyou for this code now i can use my own urls to make it show as 410.
Worked perfectly for me. Thanks for sharing, Chris! No longer need to buy yet another $x/month tool for a simple URL scrape. Very beneficial for email marketers/link builders. This is magic. Owe you a ton!
Very useful post! Thanks!
Very useful man!! Thank you!
This tool is super helpful for our specific need. Thank you!
Hi Chris,
This tool is every helpful and saved me lots of time. Thanks for that!
Could you please help me for extracting crawled date for those extracted urls. How to do that?
Chris, THANK YOU! This is a huge time saver. Well done. Oh, and did I say THANK YOU.
You just saved me a ton of manual work! Thank you 😀
Hi Chris,
Thank you so much, using the functionality of URL extractor, we could extract much more data for our clients. Let us know if we can help you anyway.
Hi chris,
Thanks a lot for this tool.
This tools saves lots of time and effort.
THANK YOU.