Lots of reasons exist for moving a site from HTTP to HTTPS: security, a potential Google ranking boost, and soon (with Chrome 62) Chrome will warn users that the site is not secure when they fill out a form (such as a search box or an email signup). That newest alert, coupled with Google SEO benefits has accelerated the migration timeframe for many sites.
Numerous detail and checklists exist for the engineering side of things, but sometimes SEO specifics can get overlooked. Below is a HTTP to HTTPS checklist that focuses just on SEO considerations: planning, migration, and post-migration monitoring. (This post doesn’t include all of the non-SEO specific tasks required to migrate from HTTP to HTTPS. It’s intended to augment your existing engineering migration plan, not replace it.)
The checklist is followed by an explanation of each task and a list of frequently asked questions.
You can also copy this Google Sheets template to create your own internal checklist and QA test plan.
The last section of this post talks about how Keylime Toolbox can help monitor the migration. If you already have a Keylime Toolbox account, learn more about adding https property data to an existing reporting group in your Keylime Toolbox account. If you don’t have an account, you can check out a free trial.
- With Keylime Toolbox Query Analytics, you can aggregate HTTP and HTTPS query data from Google Search Console and track trends in clicks, ranking, impressions, and click through rate over time (in aggregate, at the individual query level, and for categories of queries).
For example, in the chart below, you can see that when filtering for just unbranded queries on mobile devices, rankings are down slightly post-migration (so should be monitored) but, so far the decline has been on average less than 1 ranking position.
And below you can see an example of ranking trends for top keywords over the same date range.
- Keylime Toolbox Crawl Analytics provides daily log analysis to help you monitor the Googlebot crawl, assess how long a full crawl and reindexing will take, and identify any issues.
For example, in the chart below, you can see that the migration seems to be going well from a Googlebot crawl perspective: the majority of the URLs crawled return a 200, although the number of 301s spiked when the migration began, and there was no increase of error status codes (such as 404s or 500s).
Each link in the checklist is to the section of this document with more details.
- Crawl Efficiency: If the site is large and the crawl is particularly inefficient, consider implementing crawl efficiency elements before beginning the migration (so the reindexing process happens as quickly as possible)
- HTTP to HTTPS Redirects: 301 redirect HTTP URLs to HTTPS URLs
- Existing Redirects: As much as possible, incorporate all existing redirect rules in initial redirect to final target (this is mostly unrelated to the HTTP to HTTPS redirect, but that redirect will add to any existing redirect chains and Google will only crawl up to 5 redirects in a chain)
- Internal Links: Update all internal links to HTTPS URLs, if possible
- Meta Data and Structured Markup: Update meta data and structured markup, including:
- canonical attribute values
- pagination attribute values
- hreflang and rel alternate media values (if applicable)
- structured markup like breadcrumbs and videos (if applicable)
- Resources: Ensure all resources are moved to HTTPS and the references are updated in the page source code
- Google Search Console: Verify the https property in Google Search Console and do the following:
- Create a property set that combines the HTTP and HTTPS properties (for monitoring purposes)
- Configure parameter handling for the HTTPS version of the domain in Google Search Console (to match the HTTP settings)
- Set international targeting (if applicable)
- Set the crawl rate (if this was set on the HTTP)
- Upload any disavow files that had been uploaded to the HTTP property
- Set the preferred domain (www or non-www) and country (if applicable)
- Robots Exclusion Protocol:
- Ensure that the HTTPS robots.txt file contains same contents as was previously listed for the HTTP (and doesn’t disallow all)
- Ensure that the HTTP robots.txt file either redirects or 404s (and make sure it doesn’t disallow all)
- Ensure the HTTPS pages don’t have meta noindex attributes
- Web Analytics: Ensure web analytics (and other third-party) tagging is in place (this may happen automatically)
- XML Sitemaps:
- Submit an XML Sitemap for the URLs on the HTTPS
- Leave existing XML Sitemap for the HTTP (to monitor indexing metrics for both)
- Remove the old Sitemap reference from the robots.txt file (now hosted on the HTTPS) and replace it with a reference to the new one(s)
- Ensure both the HTTP and HTTPS versions of the XML Sitemaps are comprehensive and canonical
What Not To Do
- Don’t block the HTTP URLs with robots.txt – if you block the HTTP URLs with robots.txt, search engines won’t be able to see the redirects to the HTTPS URLs and PageRank and other value signals won’t be transferred.
- Don’t remove the HTTP URLs from Google’s index using the URL removal tool – as Google recrawls the HTTP URLs and gets the 301 status code, they’ll begin crawling the HTTPS URLs and replacing the HTTP URLs in the index. If you request removal of the HTTP, Google will remove the HTTPS as well. (Note that the same thing happens if you try to remove the non-www URLs using this tool. The www URLs also are removed in that case.) The URL removal tool should only be used when you want the page (and all URL variations that load that page) to removed from the index.
- Google Search Console Indexing Status: Monitor Google Search Console XML Sitemap indexing stats to ensure indexing is declining on the HTTP and increasing on the HTTPS. (Or if comprehensive, canonical XML Sitemaps aren’t available, monitoring the Google Search Console Indexing Status reports.
- Google Search Console Crawl Errors: Monitor the Google Search Console Crawl Errors reports on both the HTTP and HTTPS
- Manual Checks: Spot check that everything is in place (redirects, canonical attributes, and so on) by crawling the site, using the Google Search Console Fetch as Google tool, and by manually loading pages..
- Traffic and Ranking: Monitor organic search traffic in web analytics to ensure traffic is steady or only temporarily dips as the HTTP URLs are deindexed and HTTPS URLs are indexed (this is a rolling process that does not happen all at once). Monitor rankings and other SEO-focused data in Google Search Console.
- Search Results Display: Manually check the search results display to ensure everything looks correct as the HTTPS URLs are indexed.
Keylime Toolbox Additional Monitoring Steps
With Keylime Toolbox, you can also monitor the following:
- Googlebot Crawl Behavior: Monitor how Googlebot is crawling both the HTTP and HTTPS (what URLs are being crawled, what response codes Googlebot receives). This data is only available if you are using Crawl Analytics and upload server log files for processing.
- Google-Reported Errors: Download the full list of Google-provided crawl errors for both the HTTP and HTTPS (vs. the smaller sample available from the Google Search Console user interface), along with link source data for each error
- SEO-Focused Query Metrics: Monitor aggregated HTTP and HTTPS ranking and traffic over time (for branded vs. unbranded queries, for specific query topics, and for individual queries), along with other SEO data, such as individual and aggregated click through rates and the aggregated number of queries the site appears for
Implementing Crawl Efficiency Elements
If the site is large and the crawl is inefficient, it may take some time for Google to re-crawl all of the HTTP URLs and replace them in the index with the HTTPS versions. To help speed up this process, you can make the crawl as efficient as possible. Log files are a great resource to pinpointing crawl efficiency issues. (Keylime Toolbox Crawl Analytics offers log file processing, as do other third party tools.)
Review our post on how Google crawls sites for comprehensive details on crawl efficiency, but some key elements that make the crawl efficient include:
- 301 redirecting rather than returning a 200 status code for all URL variations (such as URLs with a trailing slash and without, URLs with www and without, case variations, and so on).
- Configuring parameters that don’t change content in the Google Search Console URL Parameters tool.
301 redirect HTTP URLs to HTTPS URLs
301 redirects are permanent (compared to 302 redirects, which are temporary).
Although Google has said that they now treat PageRank flow the same for 301s and 302s, they still handle these redirects differently in other ways. Since a 302 redirect is technically “temporary”, Google continues to index a URL that 302 redirects (and also indexes the 302 target URL). With a 301 redirect, however, Google drops the redirecting URL from the index and only indexes the 301 target URL. (Note that this may be less of an issue with HTTP to HTTPS redirects vs. other types of URL changes, but it’s still a best practice to help ensure the migration goes as smoothly as possible.)
To ensure the HTTP URLs are removed from the index and replaced with the HTTPS URLs, ensure the redirects are all 301.
As Much as Possible, Consolidate Redirect Rules
Googlebot only follows up to 5 redirects and as URLs change over time and canonicalization rules are added, redirect chains become common. In addition, redirect chains slow page load time (especially on mobile devices).
Google says: “Avoid chaining redirects. While Googlebot and browsers can follow a “chain” of multiple redirects (e.g., Page 1 > Page 2 > Page 3), we advise redirecting to the final destination. If this is not possible, keep the number of redirects in the chain low, ideally no more than 3 and fewer than 5. Chaining redirects adds latency for users, and not all browsers support long redirect chains.”
In many cases, HTTP/HTTPS and www/non-www redirects are done at the server level and all other redirects are done at the application level. In this case, the ideal scenario is to use one (server level) 301 redirect to account for both HTTP/HTTPS and non-www/www and one (application level) 301 redirect to account for all other redirects.
This latter redirect would include rules such as:
- Older URL patterns to current ones (ensure all older rules are updated with the current final targets)
- Case normalization (for instance, from example.com/Page1 to example.com/page1) and trailing slash (for instance, from example.com/page1 to example.com/page1/). In this example, example.com/Page1 would 301 redirect directly to example.com/page1/ in one redirect.
When reviewing all of the older rules and updating and consolidating them, also make sure that all redirects are 301 and not 302. Although, as noted above, Google has said that 302 redirects transfer value signals like PageRank in the same way as 301 redirects, URLs that 302 redirect may remain indexed, leading to unexpected search results display elements. Not only might the wrong URL appear in search results, but other undesired behavior might result. For instance, if meta data like sitelinks is associated with the current URL but the old URL appears in search results, no sitelinks will appear.
Update all internal links to canonical URLs
This is an optional step. But it’s useful to update the internal links for a couple of reasons:
- Redirects add to page load time, especially on mobile, so when a user clicks on a link on the site, the resulting page load is slower than it otherwise would be.
- Search engines use internal links as a signal in determining how to crawl the site so the crawl efficiency may be slightly less if internal links are to URLs that redirect (search engines will crawl the HTTP URLs since they are linked internally and the HTTPS URLs based on the redirects vs. reducing the crawl of the HTTP URLs over time).
Ideally, internal links should be absolute and not relative and should be updated to the HTTPS URLs. That means that if a page has an internal link to http://www.example.com/page1, you should update that internal link to be https://www.example.com/page1.
You could also use relative links rather than absolute links, which would eliminate the need to update the internal links. This is OK but it’s not ideal. This is is because internal links are a strong canonical signal to search engines, so if any URLs are misconfigured to not redirect (or the site gets accidentally duplicated on a subdomain, or scraped, etc.), all the links on those pages will be to non-canonical versions.
If you use absolute links and don’t update the internal links, they’ll all redirect to the HTTPS URLs. Google has said that this is OK — that it may not be worth the effort of updating. Although that statement was really about external incoming links, not about internal links.
It would be a lot of work to try to get external links updated, but it shouldn’t be much work at all to update internal links in most cases. Often, internal links can be updated via configuration setting, programmatically, or all at once via a script.
Update Meta Data and Structured Markup
- Update canonical attribute values to HTTPS URLs (if you 301 redirect from HTTP to HTTPS but the HTTPS URLs have canonical attributes to the HTTP, Google will see an infinite loop and you may see unpredictable indexing results)
- Update pagination attribute values to HTTPS URLs.
- Update hreflang attribute values to HTTPS URLs.
- Update rel alternate values (if using for separate mobile URLs) to HTTPS URLs.
- Update structured markup, such as breadcrumbs, videos, carousels, and the sitelinks searchbox to HTTPS URLs (if applicable).
Ensure all resources are moved to HTTPS
Google has recommended tools for finding “mixed content” on your site.
Verify the HTTPS version of the domain in Google Search Console
Depending on the verification method, you may lose verification of the HTTP. If possible, keep HTTP verification or re-verify using another method.
It’s worthwhile to verify http://, http://www, https://, and https://www. This way you can detect if any URL on any variation is getting traffic (so may not be redirecting) and can set configuration options (like preferred domain) and can be alerted to any errors Google is having crawl redirects from those variations.
Configure Google Search Console
- Create a property set that contains both the HTTP and HTTPS versions of the domain (for monitoring as described in the next section).
- Configure parameter handling for the HTTPS version of the domain in Google Search Console (to match what had been set for the HTTP).
- Set international targeting, if applicable (to match what had been set for the HTTP).
- Update the crawl rate if you had set this for the HTTP.
- Upload any disavow files you had uploaded for the HTTP.
- Set the preferred domain. (www or non-www)
Set the Robots Exclusion Protocol for Both the HTTP and HTTPS
- Ensure HTTP robots.txt file redirects or 404s.
- Ensure HTTPS robots.txt file has same content as previous HTTP (other than the Sitemaps reference as outlined below).
- Ensure HTTPS pages don’t have meta noindex attribute.
Ensure Web Analytics (and Other) Tagging is Still in Place
In many cases, the site will continue to use the same web analytics tagging (such as the Google Analytics property ID). But if you change this, be sure the pages of the site are updated. In addition, make sure that the source code that contains the tagging doesn’t get dropped from the pages as part of the migration.
You can use a third party tool that verifies the tagging is in place or can configure a crawler such as Screaming Frog to check for it.
Submit Comprehensive and Canonical XML Sitemaps for the HTTPS Property
Ideally, you already have a comprehensive and canonical set of XML Sitemaps for the HTTP property. If you do, you can leave those Sitemaps in place and add corresponding Sitemaps for the HTTPS property.
Leaving the HTTP XML Sitemaps in Place
The reasons for leaving the HTTP Sitemaps in place are twofold:
- URLs listed in an XML Sitemap is a signal for Googlebot to (re)crawl those URLs and you want to provide as many signals as possible for Google to quickly recrawl the HTTP URLs (to get the 301 redirects and then crawl and index the HTTPS URLs)
- If the XML Sitemaps are added to Google Search Console, you can use the Google Search Console XML Sitemaps reports to monitor the indexing decrease (see more on this in the monitoring section below).
Creating HTTPS XML Sitemaps
The reasons for creating Sitemaps for the HTTPS property are similar:
- You jumpstart Googlebot’s crawling of the HTTPS URLs when they are in an XML Sitemaps
- You can monitor the indexing increase using the Google Search Console XML Sitemaps indexing reports (as described later in the monitoring section)
Submitting Comprehensive and Canonical XML Sitemaps
If possible, always create XML Sitemaps that are comprehensive and canonical (not just for the purposes of an HTTP to HTTPS migration).
- Canonical URLs – by including only canonical URLs, you provide a useful signal to search engines about what URLs to crawl and you also make it possible to use the Google Search Console XML Sitemaps indexing reports. (If non-canonical URLs are included, the indexing reports will provide skewed percentages.)
- Comprehensive URLs – by including all URLs you want indexed (or recrawled, in the case of an HTTP to HTTPS migration), you provide a complete view of the site, provide a signal of which URLs should be crawled regularly (even if they are already indexed), and enable the indexing reports to be accurate. As described more fully in the monitoring section below, the XML Sitemaps indexing reports can only provide the total number of URLs indexed from the Sitemap, so you need a comprehensive list of URLs in order to get comprehensive indexing statistics.
Replacing the XML Sitemaps Reference in the robots.txt File
The XML Sitemaps reference in your robots.txt file lets search engines (like Google and Bing) know the files exist. when you submit the XML Sitemaps to Google via the Google Search Console, the robots.txt directive is redundant. But it’s useful to add the reference for other search engines and as a back up in case the Google Search Console account loses verification or otherwise becomes unavailable.
The robots.txt file is specific to the HTTP or HTTPS property and the content of the file applies only to that property. That means you can’t list both the HTTP and HTTPS XML Sitemaps location in the robots.txt file for the HTTPS property.
Instead, what you’ll likely need to do is move or copy the robots.txt file from the HTTP property to the HTTPS property and then update the XML Sitemaps reference (most directives in the robots.txt file are relative, but the Sitemaps reference is absolute).
Monitor Google Search Console XML Sitemap Indexing Stats
As noted in the migration steps, you should submit an XML Sitemap for the URLs on the HTTPS and leave existing XML Sitemap for the HTTP. This will enable you to monitor the indexing decrease for the HTTP property and the indexing increase for the HTTPS property.
- All URLs in HTTP sitemap should have status code of 301 and indexing should decline over time.
- All URLs in HTTPS sitemap should have status code of 200 and indexing should increase over time.
This process may take some time and you may find that some HTTP URLs are still indexed months later. The most common reasons for this are:
- The HTTP URLs are blocked with robots.txt (so Googlebot can’t crawl the redirect) and are partially indexed
- The HTTP URLs are “non-canonical” (before the 301, had a canonical attribute to a different URL, for instance) and aren’t crawled very often
- The HTTP URLs don’t return a 301 (instead return a 302 or an error)
If you find that some URLs are still indexed months later but the Google Search Console Search Analytics reports don’t show any clicks to the HTTP property, then this issue may not be worth investigating. The URLs don’t rank for queries and aren’t causing problems.
However, if you see clicks to the HTTP property, then the URLs are ranking for queries. The easiest way to start the investigation is to review the server logs (see more on this below) to see exactly what Googlebot gets when crawling. The Keylime Toolbox Crawl Analytics log detail Excel files include a tab that lists all URLs with a 200 response code, all URLs with a 302 response code, and so on.
You can also crawl the site with a tool like Screaming Frog to get a list of URLs that return a 200, are blocked by robots.txt, and the like.
Also take a look at Google Search Console > Crawl > robots.txt Tester for the HTTP property to see if Google is seeing a robots.txt file and if so, if it’s blocking any URLs.
Monitor the Google Search Console Indexing Status
These reports are less useful than the XML Sitemaps indexing metrics detailed above, but if you are not able to submit comprehensive and canonical XML Sitemaps, then the indexing status trends may be the best option.
Monitor the Google Search Console Crawl Errors Reports
Monitor the Google Search Console crawl error reports for both the HTTP and HTTPS property. If you see an increase of errors (such as 404s or 500s) on either property, something may be going wrong with the redirects, for instance.
You can click on any URL to see the “linked from” sources. These source URLs may actually be redirects or canonical attributes so can surface issues other than broken links.
If you are using Keylime Toolbox (see below), you can get a full list of all URLs and linked from sources in a CSV file for easier review.
Manually Check That Everything is Working Correctly
You can also set up automated testing scripts for this, but I find that a quick manual check can sometimes provide good insight as well. A QA review can surface issues before they appear in other places (crawl error reports, log files, and so on).
- Crawl both the HTTP and HTTPS properties (such as using Screaming Frog) to check elements like canonical attributes and internal links.
- Manually check a few HTTP URLs (such as with the Firefox HTTP Headers Add-on) to ensure the redirects are functioning as expected.
- Manually check a few HTTP URLs using the Google Search Console Fetch as Google feature to ensure Google sees the redirects correctly.
Monitor Organic Search Traffic in Google Search Console and Web Analytics
Google organic traffic should either remain steady or dip temporarily during the migration. Since the reindexing is a rolling process that doesn’t happen all at once, you shouldn’t see a sharp decline in traffic (even if something has gone wrong with the redirects).
You can do more granular monitoring of traffic (by branded/unbranded traffic and categories of queries) with Keylime Toolbox as described later.
Web Analytics Data
If you don’t change your analytics tagging, you will only be able to see the aggregated trended traffic data (vs. traffic for the http and https properties separately). Aggregated data is useful for monitoring any dips or continued declines that may need further investigation.
Google Search Console Search Analytics
You should see clicks increasing on the HTTPS property and decreasing on the HTTP property. These trends will likely be gradual over a period of weeks (and potentially months) as Googlebot recrawls the already indexed HTTP URLs, then crawls the HTTPS URLs, and reindexes accordingly.
Since Google Search Console query data is typically 2-3 days behind and it will take a least a few days before you start seeing much traffic on the https property (due to the time to crawl and index the pages), you won’t be able to monitor this right away.
You should review the click data for the HTTP and HTTPS properties separately and can also monitor aggregated clicks with a property set.
Monitoring Rankings in Google Search Console
It is difficult to monitor large scale rankings for all queries in Google Search Console. Third party tools exist that may help including Keylime Toolbox (as described more fully below).
However, you can compare individual query rankings and that can be useful for spot checking high traffic pages.
Look at the periods before and after the migration for both the HTTP and HTTPS properties and compare:
- Traffic to high value pages
- Ranking for high traffic queries
Ideally, the newly indexed HTTPS URL will rank in the same position for a query as the previously indexed HTTP URL. The example below shows ranking, CTR, impressions, and clicks for a single query over a period of five days. The first graph is the HTTP property and the second graph is the HTTPS property.
The final ranking for the HTTPS is the same as the initial ranking position for the HTTP.
Manually Check the Search Results Display to Ensure Everything Looks Correct as the HTTPS URLs are Indexed
Looking at ranking reports alone can’t tell you what the search results display looks like. Make sure nothing has gone wrong (with the ranking URL, related sitelinks and rich markup, titles and descriptions, and so on).
- Ensure branded searches return correct home page URL.
- Ensure top queries return correct URL. If they don’t, the HTTPS version of the URLs may not yet be indexed. Do an inurl: search for the HTTPS URL. If it’s not indexed, you can check the server logs (see details later in this document) to see if it’s been crawled. If it has, make sure nothing is keeping the URL from being indexed, such as:
- noindex attribute
- robots.txt disallow
- canonical attribute value to another URL
- part of a pagination attribute chains
- URL response code other than 200
The spin bike company Peloton recently changed its domain name from pelotoncycle.com to onepeloton.com. While this isn’t exactly the same as an HTTP to HTTPS migration, the process is very similar.
In this case, initially the redirect was a 302. As noted above, URLs that return a 302 redirect remain indexed.
So while Google began indexing onepeloton.com, pelotoncycle.com URLs remained indexed.
A search for [peloton] returned pelotoncycle.com (even though it 302 redirected to onepeloton.com).
Once the redirect was changed from a 302 to a 301, Google dropped pelotoncycle.com and began ranking the new domain, onepeloton.com, for a search for [peloton].
One type of monitoring that you can’t do is monitor Google’s indexing of HTTP vs. HTTPS URLs using the site: search command. Google ignores the protocol in that command, so you’ll get the same results for both of these:
Using Keylime Toolbox
With Keylime Toolbox, you can monitor SEO metrics (ranking, click through rate, impressions, and clicks) from Google Search Console over time and can filter by device as well as by type of query (branded. vs unbranded, topic area, and so on).
In addition, if you upload server logs, Keylime Toolbox can provide details on exactly what Googlebot is crawling and provide insight on technical SEO issues with the migration.
Monitoring What URLs Googlebot is Crawling and What Errors Googlebot Received
Monitor the Keylime Toolbox Crawl Analytics reports for both the HTTP and HTTPS properties (to ensure search engine bots are crawling the redirects correctly on the original site and crawling the correct URLs on the new site).
Google Search Console crawl errors provides a list of errors (like 404s and 500s) but missing from that data (but available through server logs) includes:
- What percentage of the crawl resulted in an error (are the errors a small or large percentage of the crawl?)
- What URLs Googlebot is crawling that return a 200 (are these the URLs you want and expect to be crawled?)
- URLs that return a 302 (when you may be expecting a 301)
Initially, you should see a large number of 301s, but that should decline over time.
You can choose to have Excel detail files generated separately for the HTTP and HTTPS properties. If you do this, ideally, the tabs for the HTTP should look like this:
And the tabs for the HTTPS should look like this:
If you see other tabs for the HTTP, you can look at the URLs listed in that tab for additional investigation.
Monitor Errors Googlebot Reports
Keylime Toolbox provides a CSV file with a complete list of URLs Googlebot had an issue crawling, along with linked from sources. (We get this data from Google’s API.) This data is similar to what you find in the Google Search Console user interface and downloadable CSV, but you only see a sample of URLs when looking at Google Search Console directly and you can only see linked from sources for one URL at a time (through a popup).
Note that the linked from source might not be a link. It could be a redirect, a canonical attribute, a pagination attribute, structured markup (such as from breadcrumb markup), and so on.
Below is an example:
In this example:
- The first URL is the result of an invalid redirect from the HTTP property
- The second URL is the result of a broken link on the HTTPS property
- The third URL is the result of an invalid URL listed in the XML Sitemap
Monitoring Indexing, Ranking, and Traffic
With Keylime Toolbox Query Analytics, you can monitor aggregate Google organic traffic, as well as aggregated ranking and click through rate details (for sitewide averages, branded vs. unbranded queries, categories of queries, and individual queries).
If traffic, ranking, or click through rate are declining, you can investigate whether it’s down for all queries or for a particular subset.
But hopefully, the trended reports available in Keylime Toolbox will give you peace of mind that the migration is progressing well and you can provide these reports to management to assure them as well.
Here’s an example from the Traffic Dashboard that shows the top queries haven’t lost ranking during the migration:
Here’s an example of the SEO Trends report that shows steady traffic and ranking in August (pre-launch) compared to September (post-launch) for both branded and unbranded Google organic traffic, as well as the specific query categories of clothing and shoes:
Here’s an example of the aggregated HTTP and HTTPS trends during the migration period for the clothing category and the shoe category of queries, filtered to just mobile devices:
Identifying Specific Query Issues
Using the Keylime Toolbox Filter and Download Query functionality, you can choose a before and after date range, a subset of queries (or all queries), and a device type (or all). Keylime Toolbox will generate Excel files that you can use to pinpoint any changes in traffic, ranking, or click through rate for further investigation.
Query Analytics Set Up
Keylime Toolbox can aggregate Google Search Console query data from multiple properties (into a “reporting group”). This functionality is useful in monitoring an HTTP to HTTPS migration because you can see full click, ranking, and click through rate trends beginning with HTTP only data (before the migration begins), then combined HTTP and HTTPS data (while the reindexing is in process), then HTTPS only data (once the reindexing is complete).
Keylime Toolbox’s aggregation functionality is similar to Google Search Console’s property sets, but differs in that:
- Google Search Console property sets are user-specific. You can’t share them with other users. Each user has to create the property set. The historical data varies based on when the property set is created, so a user who creates a property set later may not have access to all historical data. With Keylime Toolbox, an unlimited number of users can be given access to a reporting group.
- If the Google Search Console account loses verified access to a property in the set, the entire property set becomes inaccessible. This behavior can be problematic in cases where the verification is no longer possible for the HTTP property once it’s been migrated to HTTPS (in some cases, the HTTP can remain verified and that is recommended for a variety of reasons as described above, but in reality, it’s not always possible). With Keylime Toolbox reporting groups, the reporting group and data remain available (data for the HTTP is reported up to the date the property lost verification).
- Google property sets have a limit of 200 properties. Keylime Toolbox reporting groups have no limit (and the subscription price per reporting group doesn’t change based on the number of properties it contains).
- Google property set data aggregates clicks but filters impressions, whereas Keylime Toolbox reporting group data aggregates clicks and impressions. What this means is that if only part of the site has migrated from HTTP to HTTPS, then both types of URLs may appear in a single search (for instance, due to sitelinks) and in those cases, Keylime Toolbox will report an impression for both properties. If you are moving only part of the site, you can contact us at email@example.com and we can talk more about the best configuration for your situation.
- Google Search Console data goes back a maximum of 90 days. Keylime Toolbox initially imports the 90 days of data then stores all data for longer historical trends.
- Keylime Toolbox provides additional reporting and analysis, including trends for configurable categories of queries (such as branded/unbranded and topics).
See our user guide for more details on adding the https property to an existing reporting group or creating a new reporting group that contains both the HTTP and HTTPS property.
At a high level, the reporting group will import Google Search Console data for both the HTTP and HTTPS properties (along with any subfolder or subdomain properties), along with the Google Analytics data.
If you don’t yet use Keylime Toolbox, you can sign up for a free trial. If you haven’t begun the HTTPS migration, you can start storing data for the HTTP property.
Crawl Analytics Set Up
See our user guide for more details on ensuring that server log files being uploaded to Keylime Toolbox contain the protocol from the fetch request and that all logfiles (that contain requests for both HTTP and HTTPS) are uploaded nightly to Keylime Toolbox.
If you don’t yet use Keylime Toolbox, email us at firstname.lastname@example.org to let us know you’d like to check out a trial of Crawl Analytics.
Is it OK to run HTTP and HTTPS in parallel to make sure everything is working correctly before adding the redirect from HTTP to HTTPS?
If you choose to do this in production, block the HTTPS version of the domain with robots.txt until you are ready for the migration. This will ensure Google doesn’t crawl and index the HTTPS URLs as a duplicate of the HTTP site. Google has said that when they only see links to HTTP URLs, they will try to crawl the HTTPS version of those URLs to see if they exist. If they find HTTPS URLs, in most cases, they’ll index those instead of the HTTP URLs (if the HTTPS URLs are indexable) as Google considers the HTTPS the “canonical” URL in a duplicate cluster.
If you block the HTTPS URLs with robots.txt, make sure you remove the block once you redirect from HTTP to HTTPS!
Should I do the migration all at once or one section at a time?
Google recommends that you move the entire site to https at once (although Google says that you can generally make site moves a section at a time). Google says moving the entire site from HTTP at HTTPS at once can help them better understand the move. (Probably what they mean is that they handle a site-switch from http to https in a different way than they handle individual URL changes. Google’s comments on http to https migrations not being exactly the same as URL structure migrations suggest this as well.)
It can also add complications to ensure that the HTTPS URLs serve only HTTPS resources, the correct redirects and meta data are in place, and so on, if you’re only moving one section of the site to HTTPS. However, if circumstances require that you move only a section to HTTPS, Google has noted that it’s “not great” but is “fine”.
How long will it take for Google to reindex the HTTPS URLs?
The process goes basically like this:
- Google crawls an HTTP URL, gets a 301 response, and notes the redirect target (the HTTPS URL) and adds that HTTPS URL to the list of URLs to crawl (Google doesn’t crawl the redirect target immediately).
- At some later time (probably within a day), Google crawls the redirect target (the HTTPS URL) and gets a 200 response code (as long as the HTTPS URL isn’t blocked by robots.txt).
- Google extracts all meta data and content from the https URL, including (but not limited to):
- canonical attribute
- Meta robots noindex attribute (if present)
- pagination attributes
- If the HTTPS URL doesn’t have a canonical attribute value pointing elsewhere and doesn’t have a meta robots noindex attribute, Google compares the content on the HTTPS URL to what was previously indexed for the HTTP URL and if the content is generally similar, Google removes the HTTP URL from the index and replaces it with the HTTPS URL. (If the canonical attribute value points elsewhere or the page has a meta robots noindex attribute, Google does not index the URL and things get more complicated.)
That process generally only takes a day or two, but is only kicked off when Google crawls the HTTP URL. (Google may also discover and crawl the HTTPS URLs independently, but the process to deindex the HTTP URL and transfer any value signals from it to the HTTPS URL can only begin once Google crawls the HTTP URL and receives the 301.)
This means that how long it will take for the entire HTTP site to be replaced in the index by the HTTPS site depends on how long it takes for Google to crawl all HTTP URLs on the site. As noted in the earlier post about how Google crawls sites, it can take some time for Google to crawl an entire site, based on a number of factors, including how valuable Google finds the overall site and the individual pages of the site, how robust the server is, and how efficient the crawl is.
It can take 60 days or longer for Google to completely re-crawl a site and even then, you may find that some HTTP URLs remain indexed (although these are generally URLs that are non-canonical or otherwise low value and likely won’t rank for any queries).
Google says: “As a general rule, a medium-sized website can take a few weeks for most pages to move in our index; larger sites can take longer. The speed at which Googlebot and our systems discover and process moved URLs largely depends on the number of URLs and your server speed. Submitting a sitemap can help make the discovery process quicker.”
If you don’t have a comprehensive XML Sitemap of the HTTP URLs or a comprehensive internal link structure (for instance, some pages are only available through search functionality vs. a browsable navigation structure), the complete recrawl may take longer as Google uses XML Sitemaps and internal links as a signal for recrawling.
You can estimate how long the recrawl will take by noting how many unique URLs Google crawls per day and what percentage of those URLs are high value URLs that are generally recrawled each day. You can calculate a more accurate estimate by comparing a full list of unique URLs crawled over a period of time (such as 60 days) to a comprehensive list of URLs (for instance, from the XML Sitemap). Were all URLs crawled during that time period? How long did it take for all of them to be recrawled?
You can also check the cache dates of URLs that don’t get much traffic to see how long ago they were crawled to get a general sense of how long Google keeps low value URLs in the cache before recrawling.
If your site is particularly large, it’s worthwhile to make sure the crawl is as efficient as possible before beginning the migration.
Will my site lose ranking and traffic during the migration?
Maybe, but probably not if everything goes right. As described in the reindexing process above, Google crawls the HTTP URL and gets the 301, then crawls the HTTPS URL at a later time. It’s possible that in some cases, Google will remove the http URL from the index before indexing the HTTPS URL (although this generally doesn’t happen). It’s also possible that Google will temporarily rank the HTTPS URL lower than the HTTP URL during the comparison/verification process (but this is also not common).
Even if either of these things happen, sites often experience little to no traffic loss during the migration since this is a rolling process as URLs are crawled and reindexed. You might see fluctuations if you’re looking at individual queries or pages, but the ranking and traffic to the site overall should be minimally impacted.
Google says: “Expect temporary fluctuation in site ranking during the move. With any significant change to a site, you may experience ranking fluctuations while Google recrawls and reindexes your site.” And has also said that there shouldn’t be any long term negative impact.
Google has also said that the site shouldn’t lose traffic. So if you do see a serious negative impact to ranking or traffic, then something has probably gone wrong with the migration. There may be a technical issue with the indexing of the HTTPS URLs, with the redirects, or with the content generation.
Will I see a big rankings and traffic boost?
Probably not. Google has said that HTTPS URLs “receive a small rankings boost, but don’t expect a visible change”.
What if I don’t switch to HTTPS? Will my site stop ranking in Google?
No, your site can still rank well. Google has said that HTTPS is only a “lightweight” signal that impacts fewer than 1% of queries globally. However, Chrome version 62 (and later) will show a “not secure” warning when the user enters data on an HTTP page. This may influence user behavior, so if your site includes forms (including a search box), monitor your site’s data to see if users are leaving forms incomplete as a result of the warning.
Google has said that eventually Chrome will show the “not secure” warning for all HTTP pages.
But I got a warning about HTTPS in Google Search Console. Doesn’t that mean my site has a penalty?
No. Google began sending these warnings to sites late last year. In some cases, these warnings identify pages that should always be served over HTTPS (such as pages with credit card entry fields) but others are about WordPress (and similar) login pages when the site was HTTP (with no user login area) and only the admin login page is impacted. (Of course, you probably want your own admin login to your site to be secure, but this is unrelated to the user experience.) The latest warnings are about any pages that use any forms.
In all of these cases, Chrome will display a “not secure” warning for users of these pages, but this warning is unrelated to ranking.
- Google help center: Securing Your Site with HTTPS
- Google help center: Site Moves
- Google help center: Site Move Recommendations
- Google: Incoming links that redirect are OK
- Google tries to crawl https versions of HTTP URLs (even with no other discovery source)
- Google’s http to HTTPS FAQ
- Google News http to HTTPS FAQ
21 thoughts on “Moving from HTTP to HTTPS: A Step by Step Guide for Avoiding SEO Pitfalls and Maximizing Google Organic Traffic”
The ULTIMATE GUIDE – thanks Vanessa 🙂
Pingback: SearchCap: Bing ads account linking, scary SEO & duplicate content
Great comprehensive Guide Vanessa! We are in the middle of an https migration now, and this is the best list I’ve seen! One thing that wasn’t clear though – and that no one is clear on – is on which Google Search Console property to submit the different site maps? As I understand it, we should create new properties in Search Console – we have one now for http://www.domain.com and so we need to create new properties https://www.domain.com and also I read somewhere we should also have a 3rd property https://domain.com. So…with regard to the XML sitemaps, you say we need to keep the old one for http and add a new one for https. That makes sense…but…should both sitemaps be submitted within all 3 properties? Or do we keep the old sitemap in the old http property, and then in the new https://www property, put the new https XML sitemap? Or submit both sitemaps in https://www property? What about the third property – https://domain.com (without www)? This wasn’t clear in your Guide…but don’t worry, I can’t find anyone else that explains this either!
I will take a stab at clearing all of this up in the guide, but to answer your questions:
-If possible, you should have the following verified in Google Search Console:
In your scenario, https://www.domain.com is the current “canonical” so all other three variations should 301 redirect to it.
The main reason you want to add all 4 variations to Google Search Console is so that you can be alerted of any issues with those other three variations.
For instance, if some of the URLs aren’t redirecting to the https://www.domain.com version, then you might see traffic for the other versions. Of course, for a time after the migration, you will see traffic to the http://www.domain.com property, but this should decline over time as I outlined in the article. But if everything is set up right, you shouldn’t see any traffic going to http://domain or https://domain.com.
Also, if you see any crawl errors listed for the http://domain.com or https://domain properties, then there may be technical issues preventing Googlebot from crawling the redirects. (Etc.)
The other reason to verify the https://domain.com property is so you can set the preferred domain (basically, you can tell Google whether you want the site indexed with the www or without). You can only set this if you have verified both the https://www.domain.com and https://domain.com properties.
Now, about the XML Sitemaps, you should leave the old XML Sitemap that lists the http://www.domain.com URLs submitted where it is (under the http://www.domain.com property). And you should submit the new XML Sitemap that lists the https://www.domain.com under the https://www.domain.com property.
You don’t need to submit an XML Sitemap for the https://domain.com property since you only need to submit XML Sitemaps for URLs you want crawled (and that you want to monitor indexing metrics for).
Technically, you could submit both XML Sitemaps under the same property (in this case, probably https://www.domain.com), but that gets into advanced territory. In your case, I recommend just leaving the http://www.domain.com XML Sitemap where it is and then submitting the new one under the https://www.domain.com property.
Thank you for your reply! Very good…and about the sitemaps I WOULD do exactly as you said…but…not that easy in my case. My CMS (Drupal 7, and more specifically, the XML Sitemap module), puts the sitemap at /sitemap.xml when it generates it, and doesn’t have an option to create a second one at a different path. And since I just went live with https, now the sitemap.xml file shows https urls only.
So…based on what you said, the new https property is fine – it has the new sitemap.xml file which is https.
But in my case, I am worried about the http property since it also has the exact same sitemap.xml file (since that is the location of the sitemap it has indexed all along), and that is the same https sitemap.
“Technically, you could submit both XML Sitemaps under the same property (in this case, probably https://www.domain.com), but that gets into advanced territory. ”
So I guess I need to ask about this “advanced territory”…
To clarify my situation right now…
1. I have just gone to https so new site is https://www.domain.com
2. Sitemap has always been at /sitemap.xml but the URLs in that sitemap now are all https
3. Therefore, the “old” Google SC property http://www.domain.com has this sitemap (and that sitemap was automatically updated to https links with the migration).
4. I submitted the same sitemap.xml on the new GSC property as well.
5. Therefore, right now both of these GSC properties have the same sitemap.xml, and it has https links (and not much I can do about that).
So my follow-up questions…
A. Is this OK for the old http:// GSC property to have the https sitemap?
– I you say NO, then I’m not sure how I can change it without a Drupal code change
B. Would you recommend that I – on the old http:// GSC property – I manually create an submit an http sitemap (that would be showing what it was just before the https switch), maybe something like http-sitemap.xml ? That would mean that the old http:// GSC property would have both.
In this case, does the http://www.domain.com/sitemap.xml URL redirect to https://www.domain.com/sitemap.xml?
The reasons for leaving the http XML Sitemap submitted are two-fold and both are optional:
I think your best bet here is to not worry about creating an XML Sitemap with the http URLs and just skip these two benefits — the work required would not be worth the trade-off.
Instead what you should do is:
That way, you’ll cleanly have just an https XML Sitemap.
If you can’t redirect the http robots.txt file and http sitemap.xml file to the https versions, let me know and we can go further into advanced territory.
Pingback: Will Removing Thin Pages Increase Ranking Site Wide? – SEOForLunch Issue #52 -
Awesome – thank you, Vanessa!
To answer your question, yes in this case the http://www.domain.com/sitemap.xml URL redirects to https://www.domain.com/sitemap.xml. But the real complication is that the actual sitemap.xml file contents have changed, e.g., it is now all https urls in the sitemap. So the “old” sitemap no longer exists as it was.
I will follow your recommended steps, starting with removing the XML sitemap submission from the http property.
Yes, that’s perfect. The old http Sitemap doesn’t exist anymore. The only XML Sitemap that exists is the one with the https URls in it. The http://www.domain.com/sitemap.xml URL redirects to the https://www.domain.com/sitemap.xml URL.
So basically, you now just have the one XML Sitemap with the https URLs. That should work great.
You’ll just have fewer methods of monitoring the migration — one key thing will be to watch the search analytics report in Google Search Console – you should see the clicks decline for the http and the clicks increase for the https.
Hi Vanessa, this is a fantastic guide on moving to HTTPS, many things that most site owners forget!
If it helps, for the fixing mixed content & finding insecure links over a site, this app can take care of that: https://httpschecker.net/guides/https-checker
From my experience, it is important to check canonical loops when page A has canonical to B, and B canonical to A. It is very difficult to find it but it is worth to check.
I remember the googlebots crawling non-existing pages over and over after I migrated my website to https and those pages were not even there in the first place or afterwards, which all ended with manual 301 redirections. Is this behavior of the bots natural?
Yes, Googlebot crawls URLs based on all kinds of signals, so in cases that Googlebot is crawling URLs that don’t exist (and never existed), it could be that other sites linked to your site incorrectly (or you could have broken internal links).
If you find the URLs listed in the Google Search Console Web Crawl Errors Not Found report, click on them to see if any “Linked from” URLs are listed. If the Linked from URLs are from your own site, check those pages for broken links (fix the broken links and 301 the URLs to the correct place). If the Linked from URLs are from other sites, 301 redirect the URLs.
Googlebot also might try using standard web patterns to see if other pages existed on your site that you didn’t link to. For instance, Google sometimes tries to fill out forms or adds common URL patterns to your domain name (like index.html). Or, when Google detects a site migration has taken place, Googlebot might try recrawling very old URLs (even from an older version of the site before you owned it, if applicable) to see if those URLs have migrated as well).
If the URLs never existed, it’s fine to return a 404 in those cases. Google won’t keep re-crawling those URLs.
You can read more on Googlebot’s crawl here: https://www.keylimetoolbox.com/news/googlebots-crawl-budget-impacts-sites-visibility-search-results-can-improve/
Great guide… the best I’ve found after a lot of searching 🙂
But this section is confusing:
“The robots.txt file is specific to the HTTP or HTTPS property and the content of the file applies only to that property. That means you can’t list both the HTTP and HTTPS XML Sitemaps location in the robots.txt file for the HTTPS property.
Instead, what you’ll likely need to do is move or copy the robots.txt file from the HTTP property to the HTTPS property and then update the XML Sitemaps reference (most directives in the robots.txt file are relative, but the Sitemaps reference is absolute).”
This doesn’t make sense.
Assuming you are not moving domains at the same time as the http>https migration, there is only ONE robots.txt. It is a single file in the server’s file system and exists in the root directory of the domain. Regardless of whether this file is served over http or https protocol, it remains a SINGLE file. It is therefore not possible to edit or copy http or https “versions” of the file.
Could you please confirm this understanding is correct?
As I understand, what I should do is check http://domain.com/robots.txt is 301 re-directed to https://domain.com/robots.txt and replace the old reference in the robots.txt file to the http sitemap with a reference the new https site map.
It depends on the server set up. But typically the http://domain.com/robots.txt URL would 301 redirect to the https://domain.com/robots.txt version. So as long as your setup is typical, you are correct. All you would need to do is make sure of that and then update the robots.txt reference to the https XML Sitemap (leave the reference to the http Sitemap and add the reference to the https Sitemap so that Googlebot will quickly recrawl the http URLs and get the redirects).
Wow, thanks so much for the quick reply Vanessa! 🙂
I’ve been preparing for this move for nearly 2 weeks and finally hoping to do it today (get it in before the deadline!)
I was wondering if there was one more thing I could ask your expertise on?
Prepping for this move has uncovered a big issue with my existing http site map (basically it’s rubbish).
ACTUAL SITUATION: my site has 500 main urls and 10,000 forum urls (topics)
The http sitemap lists 100,000 urls (most of which don’t exist or direct to the same content).
Luckily Google has been smart enough to figure this out and in GSC I can see it has indexed 6,000 urls for my site and shows all the erroroneous ones in the sitemap as ‘excluded’.
Now, I’ve created a clean and accurate https site map with all 10,500 urls.
So you see, there are now around 4000 urls I need to ‘tell’ Google about.
My concern is whether I should update the sitemap on the old http property with a new clean/accurate version before doing the https migration? This would mean telling Google about the extra new 4000 urls to index over http (old buried forum posts presumably) and I was worried this might take some time to crawl and throw off Google’s ‘detection’ of the migration….?
So, should I just keep the old corrupt sitemap as-is and just list all the urls in my https sitemap, or should I add the new urls to my http property and clean it up first?
If the later, should I leave any time gap to allow crawling before I do the migration?
(trying to get this done before the July ‘deadline’ 🙂
I would probably just update the new https XML Sitemap with the full set of URLs.
The value of having the http XML Sitemap post-migration is just to get those URLs (and 301s) crawled as quickly as possible.
I would remove the http Sitemap now since it’s inaccurate.
You *could* submit an http XML Sitemap *after* the migration with the same set of URLs as you have in the https Sitemap (but with http URLs). That would also help you monitor the migration (the http Sitemap should eventually show 0 indexed and the https Sitemap will hopefully show 100% indexed (eventually)). But it’s not necessary. If you do decide to do that, I wouldn’t submit the http Sitemap until the migration since you aren’t looking to index more http URLs as this point — you would just want the 301s picked up quickly once they’re in place.
Good luck with the migration!
You are the best Vanessa.
One more quick Q – is there any value doing a “Fetch As Google” on the old http property after the migration to get it to crawl the 301s as quick as poss.?
No, “fetch as Google” doesn’t trigger a crawl for the index. You would have to also use the “submit to index” function, but that has a fairly small limit. The way to ensure the http 301s get picked up quickly would be to submit an http XML Sitemap, but if you don’t do that, the recrawl should still happen in a fairly efficient manner (unless your crawling has other issues). 🙂
Sorry to keep badgering you with this……
I am going to do as you suggest and submit a new http sitemap after I put the 301’s in place.
Is there a period of time (minutes, hours, days?) I should wait between setting up the New HTTPS property in GSC with the new https sitemap and submitting the new http sitemap on the old property?
If I’m understanding this right, the idea is that we first index the https pages and then re-crawl the http pages so that google will “map” the old page onto the new page. How can this work if it hasn’t had time to index the new https pages yet (which it wouldn’t have if I submit both sitemaps immediately after I do the 301’s).
Or maybe that doesn’t matter and I’m over thinking it.
So close now….. Thanks 🙂
It doesn’t matter, really. You can submit the http Sitemap at the same time the migration happens. Google doesn’t need to first index the https URLs. In fact, submitting the http Sitemap can help get the https URLs crawled and indexed (as Google can discover those URLs when crawling the http 301s).
Google’s systems don’t map the value of a 301 in real-time. The information is stored (and then if the 301 target hasn’t been crawled/indexed yet, that happens, the system compares the content of the original URL to the new URL to see if they are basically the same, and so on).