Introduction
This post walks through the process of finding all the URLs for a website that have been indexed on Google and how to get a list of URLs that are broken so you can create redirects (advanced).
Get list of indexed URLs
The easiest way to get a list of URLs that Google has indexed is to do a simple Google search for your URL. It will be easiest if you do this using the Chrome browser. You can use the following snippet to find all the URLs Google has indexed for a given website: site:yourdomainname.com
. Just make sure your replace yourdomainname.com
with your website.
Enable Continuous Scroll
Next, you just need load all the search results onto the page. To do this you will need to make sure that Continuous Scroll is enabled. To do that you can follow these steps. If you already have Continuous Scroll enabled you can skip ahead to Load all results.
1. Navigate to Google.com
2. Click the Settings link in the bottom right hand corner
3. On the Settings page, click Other Settings in the left sidebar
4. Toggle the Continuous Scroll option
Load all results
Now that Continuous Scroll is enabled you can load all the search results. To do this keep scrolling to the bottom of the page until you are unable to scroll any further. When you are at the bottom of the page click the More results
button. Do this until the button no longer shows up
Extract URLs
Now that you have all the URLs loaded onto the page, you can extract them using Chris Ainsworth's Google SERP Extractor. In his article he provides a snippet of Javascript that you can use to extract all the links (also shown below).
javascript
Create a bookmarklet
Now just take the code above and create a bookmarklet in Google Chrome. To add a bookmarklet open your bookmarks in Chrome and click Add new bookmark
.
Run bookmarklet
Bookmarklets work like any other bookmark. Only instead of taking you to a website, they run JavaScript code. So once you have the bookmarklet created, you can click the bookmarklet while on the Google Search Results Page to extract the URLs.
Once you've run the bookmarklet you will be taken to a page that shows all the extracted URLs.
Quickly Find Broken URLs (Advanced)
Now you have a list of all the URLs that Google has indexed for your website. Next you just need to go through them to find any that are broken.
To make finding all the bad URLs less of a tedious task, I built a simple tool that will automatically check all the URLs in a given list and return only those that have issues. That tool can be found here:
https://github.com/zpthree/bad-url-checker
Just download the code, spin up the app, and enter your list of URLs.
Sorry to any non-developer that doesn't feel comfortable setting something like this up. I may end up hosting it somewhere eventually but for now I'm only providing the code 🙃.
Final Thoughts
Hopefully you learned a thing or two, but if not here is a video to make up for it! If you have any questions, please don't hesitate to ask!
Was this post helpful?