Introduction

This post walks through the process of finding all the URLs for a website that have been indexed on Google and how to get a list of URLs that are broken so you can create redirects (advanced).

Get list of indexed URLs

The easiest way to get a list of URLs that Google has indexed is to do a simple Google search for your URL. It will be easiest if you do this using the Chrome browser. You can use the following snippet to find all the URLs Google has indexed for a given website: site:yourdomainname.com. Just make sure your replace yourdomainname.com with your website.

Enable Continuous Scroll

Next, you just need load all the search results onto the page. To do this you will need to make sure that Continuous Scroll is enabled. To do that you can follow these steps. If you already have Continuous Scroll enabled you can skip ahead to Load all results.

1. Navigate to Google.com

2. Click the Settings link in the bottom right hand corner

A screenshot of the Google homepage with a yellow arrow pointing at Settings

3. On the Settings page, click Other Settings in the left sidebar

A screenshot of Google.com's settings page with a yellow arrow pointing to Other Settings

4. Toggle the Continuous Scroll option

A screenshot of Google.com settings page with a yellow arrow pointing to the Continuous Scroll toggle.

Load all results

Now that Continuous Scroll is enabled you can load all the search results. To do this keep scrolling to the bottom of the page until you are unable to scroll any further. When you are at the bottom of the page click the More results button. Do this until the button no longer shows up

Extract URLs

Now that you have all the URLs loaded onto the page, you can extract them using Chris Ainsworth's Google SERP Extractor. In his article he provides a snippet of Javascript that you can use to extract all the links (also shown below).

javascript

javascript:(function(){

output='<html><head><title>SEO SERP Extraction Tool</title><style type=\'text/css\'>body,table{font-family:Tahoma,Verdana,Segoe,sans-serif;font-size:11px;color:#000}h1,h2,th{color:#405850}th{text-align:left}h2{font-size:11px;margin-bottom:3px}</style></head><body>';

output+="<table><tbody><tr><td><a href=\'https://www.chrisains.com\'><img src=\'https://www.chrisains.com/wp-content/uploads/2015/06/chrisains.com-logo1.png\'></a></td><td><h1>SEO SERP Extraction Tool</h1></td></tr></tbody></table>";

pageAnchors=document.getElementsByTagName('a');
divClasses=document.getElementsByTagName('div');
var%20linkcount=0;var%20linkLocation='';
var%20linkAnchorText='';

output+='<table><th>ID</th><th>Link</th><th>Anchor</th>';

for(i=0;i<pageAnchors.length;i++){

	if(pageAnchors[i].parentNode.parentNode.getAttribute('class')!='iUh30'){
	
		var%20anchorText%20=%20pageAnchors[i].textContent;
		var%20anchorLink%20=%20pageAnchors[i].href;
		var%20linkAnchor%20=%20anchorLink%20+%20'\t'+anchorText;
		var%20anchorID%20=%20pageAnchors[i].id;
		
		if(anchorLink!=''){
			if(anchorLink.match(/^((?!google\.|cache|blogger.com|\.yahoo\.|youtube\.com\/\?gl=|youtube\.com\/results|javascript:|api\.technorati\.com|botw\.org\/search|del\.icio\.us\/url\/check|digg\.com\/search|search\.twitter\.com\/search|search\.yahoo\.com\/search|siteanalytics\.compete\.com|tools\.seobook\.com\/general\/keyword\/suggestions|web\.archive\.org\/web\/|whois\.domaintools\.com|www\.alexa\.com\/data\/details\/main|www\.bloglines\.com\/search|www\.majesticseo\.com\/search\.php|www\.semrush\.com\/info\/|www\.semrush\.com\/search\.php|www\.stumbleupon\.com\/url|wikipedia.org\/wiki\/Special:Search).)*$/i)){
				if(anchorID.match(/^((?!hdtb_more|hdtb_tls|uh_hl).)*$/i)){
					linkLocation+=anchorLink+'<br%20/>';
					linkAnchorText+=anchorText+'<br%20/>';
					linkcount++;
					if%20(anchorText%20===%20undefined)%20anchorText%20=%20pageAnchors[i].innerText;output+='<tr>';
					output+='<td>'+linkcount+'</td>';
					output+='<td>'+pageAnchors[i].href+'</a></td>';
					output+='<td>'+anchorText+'</td>';
					output+='</tr>\n';
					}
				}
			}
		}
	}

output+='</table><br/><h2>URL%20List</h2><div>';
output+=linkLocation;output+='</div><br/><h2>Anchor%20Text%20List</h2><div>';
output+=linkAnchorText;output+='<br/>%C2%A0<br/><p%20align=center><a%20href=\'https://www.chrisains.com\'>www.chrisains.com</a></p>';

with(window.open()){document.write(output);document.close();}})();

Create a bookmarklet

Now just take the code above and create a bookmarklet in Google Chrome. To add a bookmarklet open your bookmarks in Chrome and click Add new bookmark.

Screenshot of Google Chrome's bookmark manager with yellow arrow pointing to vertical ellipsis menu
Screenshot of Google Chrome's bookmark manager with yellow arrow pointing to Add new bookmark on dropdown menu

Run bookmarklet

Bookmarklets work like any other bookmark. Only instead of taking you to a website, they run JavaScript code. So once you have the bookmarklet created, you can click the bookmarklet while on the Google Search Results Page to extract the URLs.

Screenshot of Google search results with a yellow arrow pointing to a bookmarklet from the Booksmarks dropdown list

Once you've run the bookmarklet you will be taken to a page that shows all the extracted URLs.

Screenshot of a list of URLs extracted from Google using chris ainsworth's SEO SERP Extraction Tool

Quickly Find Broken URLs (Advanced)

Now you have a list of all the URLs that Google has indexed for your website. Next you just need to go through them to find any that are broken.

To make finding all the bad URLs less of a tedious task, I built a simple tool that will automatically check all the URLs in a given list and return only those that have issues. That tool can be found here:

https://github.com/zpthree/bad-url-checker

Just download the code, spin up the app, and enter your list of URLs.

Sorry to any non-developer that doesn't feel comfortable setting something like this up. I may end up hosting it somewhere eventually but for now I'm only providing the code 🙃.

Final Thoughts

Hopefully you learned a thing or two, but if not here is a video to make up for it! If you have any questions, please don't hesitate to ask!

Was this post helpful?