User blog:Mech.wikia/Migrating domains from wikia.com to fandom.com

In the first few months of 2019, we switched all our communities to HTTPS and migrated them to new domains. That switch is the end of a year-long project during which we touched several pieces of our infrastructure. This blog post summarizes the project, from planning through execution up to the results of the switch.

Rationale
The decision to switch to HTTPS was a no brainer for us. We've been serving our login pages over HTTPS for quite a long time, but the wikis themselves were still using the unencrypted HTTP protocol. With the whole Web moving towards encrypted connections, browsers showing warnings on HTTP pages, HTTP/2 available only on HTTPS and possibly some search ranking drawbacks of not having HTTPS by default, we decided it was time to switch all pages to HTTPS.

There were also reasons to change our main domain from wikia.com to fandom.com. We've been operating under the Fandom brand since 2016 but our pages remained under “wikia.” We also knew that due to our non-English wikis operating under multiple subdomains we had to restructure our domains during the HTTPS migration, and there are potential SEO penalties for changing domains too often, so if we were going to change URLs anyway, it made sense complete the rebranding at the same time and make all URL changes at once.

The migration project certainly looked big, as well as exciting as a result of being important, challenging, but also risky. We rely heavily on people discovering our content in search engines and knew that switching domains could end badly. There are sites which gained traffic after doing similar switches but there are also examples of sites which lost the traffic and have never fully recovered after a domain change.

Challenges
After reading some blog posts about domain migrations, we knew we needed to get everything right. After the switch, there is no coming back. You had to serve valid 301 redirects to new locations update the robots files, take care of the sitemaps and set up new canonical URLs. Seems pretty obvious and simple, but our unique setup was making things complicated.

Certificates and non-English language subdomains
In order to serve HTTPS traffic you need a valid certificate for your domain. We have over 390K communities, each with its own subdomain. While you can get a wildcard certificate for your domain, it is only valid for first level subdomains. So the “*.fandom.com” certificate covers the “gta.fancom.com” wiki, but it does not cover the German “de.gta.fandom.com” version. This meant that we would have to obtain a wildcard certificate for every community that had language versions. We have over 130K language communities and users can create new any given moment. Obtaining thousands of certificates was not an option.

Mostly due to the need to support HTTPS certificates (but also for some SEO benefits), after considering four proposals we decided to go with the best but most expensive solution: put the other language communities under the same domain as the English wiki. This means that “gta.fandom.com” is the English wiki with articles served under “[] and “[] will point to the German wiki. This itself was a challenge because over the years we produced code that was working under these assumptions: There were also some product decisions, such as if there is a non-English community under a given domain but there is not an English version at the root URL, how do we handle that and what should be served at the root path of the domain?
 * Articles are always served under the “/wiki” path and the API entrypoints are at the root of the domain like “[]. This was not true anymore.
 * A domain name can be uniquely mapped to a single community. This was not true anymore because suddenly several communities could share the same domain.
 * All extensions worked in the context of a single wiki and after the domain change, extensions like RobotsTxt---responsible for serving robots rules for “gta.fandom.com”---had to be aware of other language communities under the same domain; because you need to have a single robots.txt file for a domain.

We had to untangle a lot of code to get rid of those hardcoded ideas.

User submitted content
On Fandom, users are in control of their communities. They not only create content, they also customize the look and feel of their wikis but using their own fonts, CSS rules, images, embedded videos or even JavaScript code. All of this means that we had to make sure those customizations work after switching to HTTPS and changing the URL structure.

Several thousand community domains
Having almost 400K communities not only means you will have a lot of work while switching them to HTTPS (which had to be done one-by-one), it also means each of those communities are different. Each community looks different, has different features enabled, and the fact that you had previously migrated 1,000 communities to the new protocol/domain and they just worked does not mean there would not be any issues when migrating the remaining 300K+ domains.

Ads and ad networks
We offer our product free of charge and we cover the costs by serving ads and using several ad networks. We had to make sure these ads are HTTPS-ready and we can still make money after the switch.

Several applications
Changing our MediaWiki setup was not enough, there are other applications serving traffic on our community domains. Our mobile-wiki stack is based on EmberJS and serves wiki pages for mobile devices. Our Discussions application takes over the /d paths and serves discussions posts. “Feeds and Posts” is another form of lightweight contributions with its own backend. The list goes on. We had to make sure support for HTTPS and the new domain are introduced everywhere and that redirects behave consistently across all the applications.

Internal (proxied) requests
HTTPS connections are established between user browsers and our CDN level (this reduces roundtrips when establishing connections). Behind the scenes our services talk with each other using HTTP. We had to distinguish requests coming from the outside (and redirect to HTTPS when needed) from the internal calls --- when another internal application connects to a MediaWiki instance to get the content of an article, it should not be redirected to HTTPS.

Caching
We have several layers of cache: CDN, internal varnish servers, Memcached servers memoizing configurations or database query results. At some point, purging becomes challenging. When you switch a domain to HTTPS or a new domain, you would like all systems to use this new setup immediately, not only to have consistent responses but also to avoid redirect loops. As an example, imagine your about to switch your domain to HTTPS. This means that previously each page under [//youdomain.wikia.com https://youdomain.wikia.com] has been redirecting to [//youdomain.wikia.com http://youdomain.wikia.com]. If you suddenly roll out HTTPS, pages under [//youdomain.wikia.com http://youdomain.wikia.com] will start redirecting to the HTTPS protocol but the HTTPS->HTTP responses you’ve been serving so far are cached somewhere, and as a result you end up in nasty redirect loops.

Anonymous vs logged-in users
We decided to roll out HTTPS to logged-in users much earlier. This gave the benefits of HTTPS to our contributors earlier and allowed us to test HTTPS before switching the whole site and affecting our search rankings. However, this decisions had consequences. A page served over HTTPS should have HTTPS links to other articles (given they exist on HTTPS-compatible domains). For anonymous users and crawlers, however, the links should use HTTP. This meant we had to produce different links for anons and logged-in users and take this into account when using cache. While using protocol-relative links simplified things, it was not a silver bullet. You cannot generate a protocol-relative link if it is going to be use in an email. We also could not use protocol-relative links in cases where a page like gta.wikia.com included a link to an article on de.gta.wikia.com as the later one did not have a valid HTTPS certificate.

Development/staging domains
To add the complexity to our domain setup, our communities are also available under staging/sandbox/devbox environments used during development and automated testing. Obviously we did not want to have a scenario where we would not be able to check if our changes are HTTPS-ready before going to production. That means we needed wildcard HTTPS certificates for each of our dozens of environments. Getting official certificates on the fly (for example when setting up a machine for new developers) were costly and too slow, so we needed to set up our internal certificate authority.

“wikia.com” hardcoded everywhere
As you can imagine, we did not have the “wikia.com” string in a single place per app. This was hardcoded in several URL-generating factories, regular expressions, articles stored in Memcached and so on. Making sure we did not miss anything when switching to fandom.com was fun :)

Confidentiality/PR
We started planning and preparing the domain migration to fandom.com in Q2 2018 and although some engineering work started then, we did not want our users to learn about this from commits to our public MediaWiki repo without official announcements and conversations with the community. This meant some of the work could not be committed or had to be done in other parts of our codebase.

Planning
We began the thinking about HTTPS and domain switches in Q3 2017 when setting goals for 2018. At that time, there were several unknowns and competing ideas. Like the certificate issues: should we move the language wikis to paths like gta.wikia.com/de or should we just replace dot with hyphens like de-gta.wikia.com? Should we switch from wikia.com to fandom.com? How much traffic can we lose when switching and how much time will it take before we recover? Behind each of the decisions there are several criteria. What are the pros and cons from business perspective? Are the options equally risky? What is the cost of implementation/maintenance? Does it increase or decrease the complexity of our platform or technical debt?

Some decisions were fairly simple. In some cases we had several brainstorming sessions and back and forth meetings. Sometime you just lack knowledge and it makes sense to try to implement something to get a feeling of its cost and complexity, and we did run some time boxed proof-of-concept projects.

We enumerated all the areas and components that needed to be changed and used our knowledge to propose some rough estimates for individual parts (mostly in weeks/months). After adding some margins and slacks before milestones, we estimated the whole project should take around three quarters given a dedicated team works only on this project and can ask for help in case changes to applications owned by other teams turn out to be too complex.

We knew we wanted to switch in first quarter of the year. Examples of other sites switching domains suggested we could lose around 20-60% of our traffic while switching and it can take weeks to recover. Q4 is our highest revenue quarter and Q1 is the lowest. Given we estimated our project to take three quarters to complete and started the work mid-Q1 2018, the first weeks of 2019 were chosen as the “big switch” date.

We decided to split the project into the following milestones: This was the first stage of the project. For enabling HTTPS on English wikis, we did not have to worry about the certificates, language paths under domains or sharing domains between several communities. It gave us a chance to roll out HTTPS to most of our users. We serve ads mostly to anonymous users so our revenue would not be affected in case our ad stack or networks were not HTTPS-ready. It also allowed us to test the HTTPS in a safe setup with limited caching (as logged-in users are served with pages generated on-the-fly). At this stage, we also enabled Content-Security-Policy reporting which proved to be very useful. When a page was loaded by logged in users over HTTPS and there were some non-HTTPS compatible resources, we would receive and aggregate such reports. This allowed us to fix error which we would not be able to reproduce or spot ourselves. In this milestone, we wanted to implement the possibility to load language wikis under paths like gta.wikia.com/de. This required fundamental changes in the core parts of our applications and also prepared for serving several communities under shared domains. While we were not planning to roll out this change to the users after this stage (as it would require migrating the wikis to new locations), we could still create new test wikis under a language path and make sure everything worked. After that milestone we wanted to be sure all the ad networks and campaigns we were using were serving HTTPS-compatible assets. This milestone was about untangling all the hardcoded wikia.com strings and regular expressions. It also was a point for delivering our autologin solution. Over the years, we’ve been giving out auth cookies for the wikia.com domain. As we wanted the switch to be seamless for the users, we did not want to force them to log in again when visiting our communities on fandom.com. This meant we wanted to create and push authorization cookies to the new domain, ideally even before users visit those wikis. This proved to be more challenging than we initially thought as browsers are becoming better at blocking tracking and third-party cookies, so one domain cannot simply generate cookies for the other domain. But eventually we managed the transfer most of our users' cookies. This was also the point at which we had to have all our redirects correct. It is important from an SEO perspective that when switching domains you set up valid 301 redirects to your new location and limit the number of multiple redirects. Initially that was not the case for us. For example, we had a wiki alias [//wookiepedia.wikia.com http://wookiepedia.wikia.com] redirecting to the primary domain at [//starwars.fandom.com http://starwars.fandom.com], then the HTTPS addon was redirecting it to [//starwars.fandom.com https://starwars.fandom.com], and eventually MediaWiki was doing another redirect to the main page at [//starwars.fandom.com/wiki/Main_Page https://starwars.fandom.com/wiki/Main_Page]. From the beginning of the project, we wanted to get all SEO-critical redirects down to a single 301 redirect. We were making progress but at some point we started running circles, where fixes were causing regressions by breaking previously fixed redirect scenarios. Over time we created over 100 redirect test scenarios, with each including the initial URL, expected number of redirects, and the expected target URL. These tests were run on a regular basis to make sure we caught and fixed regressions on the spot and to verify this important part of the domain migration worked as expected. We knew everything had to be working. If you get the switch wrong (by messing up sitemaps, canonical URLs or miss-redirecting), you may never recover your position in search engines. We knew that if we wanted to switch a the beginning of 2019, all the code should be ready in Q3 2018 so we could spend 2-3 months just making sure everything was ready and fixing some minor edge cases. Well, this was the final and exciting moment when after all the hard work it was time to push the button and migrate everything. While working on the project, we figured out we could not migrate everything at once. Each subdomain had to be switched separately. This was actually good --- we could split the migration into smaller batches. We could also migrate smaller and less risky communities first before pulling the lever on our biggest and most valuable communities.
 * HTTPS support for logged-in users on English wikis
 * Support for language code in the URL path
 * Make sure ads are HTTPS-ready
 * Migrate a test batch of wikis from wikia.com to fandom.com, create new wikis on fandom.com
 * Prepare the migration scripts and test everything
 * Migrate!

Execution
At the beginning of the project, the plan was not very detailed but the roadmap looked very similar to the one presented above. We split the work into milestones and used some rough estimates for each of those. This helped us along the way. We knew the Q1 2019 was a hard deadline for us so we used the milestones as check points. In case things were going slower than expected, the plan was either to change the scope or to pull in more resources early.

While some parts of the project turned out to be more time consuming than predicted, there were also parts that went smoother. At the end we were able to hit the estimated milestones completion dates within 1-3 weeks accuracy. We also front-loaded the must/have or risky parts of the project, so in case those took longer, we still had an option to cut off some of the optional parts scheduled for later (like the autologin service).

While working on the project, we were also doing some research about safe ways of migrating domains. Over time, we gained confidence that we can migrate some small wikis or create new communities before doing the big switch. That possibility gave us a lot of confidence, as all the changes could be tested during those initial migrations before hitting the button on our biggest wikis.

Migrations/Final switch
We completed the work at the beginning of Q4 2018 and focused mostly on testing. In October, we started migrating communities to HTTPS and fandom.com to check how long (and if!) it would take them to regain traffic.

We started migrating the test batch on October 5th by migrating five communities and followed on October 9th with 45 communities. In the following week, we migrated 500 communities and finished the test batch around October 22 with an additional 800 communities. We picked a wide variety of communities --- from small communities with little traffic and risk, some middle-sized domains like gameofthrones.fandom.com, up to some of our bigger communities like witcher.fandom.com or runescape.fandom.com. As most of our traffic comes from Google Search, we noticed around a 30% drop in traffic after migration and it slowly recovered over the course of 2 months. This is the average search rank of communities migrated in October 2018 (lower is better):

​​​​​​

As you can see, they recovered nicely with the new domain replacing the old one in the search results. The traffic returned to previous levels in December and gave us confidence to migrated everything in January 2019.

The main migration happened between January 16 and January 25th when we migrated most of the domains. In the beginning, we were migrating a few thousand communities per day, but sped up at the end once we figured out we can safely do larger batches of subdomains with lower traffic.

Just for fun, this is the actual button we smashed when starting the migration script:



This is our page views graph from Google Analytics, most of the traffic was migrated in the week of January 21:



As expected, the traffic started recovering after the initial drop. On February 20th our page views were around 13% lower than before the migration. By the end of March, our search visibility had recovered and we can more or less say our traffic is back. By that we mean we’re around the same levels as before migrating to fandom.com although there are some differences, but they may not be related to the migration itself, as over time more and more traffic is coming from mobile. Also some communities gained traction due to new releases and TV seasons (like the Game of Thrones wiki).

The graph below shows the average search ranking of our communities (lower is better), as you can see after a few weeks the fandom.com subdomains replaced the previous wikia.com entries.



Of course there were some surprises, for example we noticed some ads problems on new domains and after long investigation it turned out there was a DMCA sanction imposed on fandom.com domain that had occurred prior to our ownership, but we were able to figure it out and solve it with the help from the Google team. After Google whitelisted our new domain, we saw the ad traffic started recovering along with our user traffic.

Conclusions and lessons learned

 * To help with the estimation, don’t be afraid to spend some time (2 days, 1 week) and just hack around and learn. You will know so much more after that time.
 * If possible, do the must-haves and critical parts of the project first. With hard deadlines you may need to cut out some scope at the end of the project so “leave nice to haves.” In other words: prioritize.
 * Split the work into smaller parts/milestones. Ideally you want to be able to roll our your changes on a regular basis.
 * Automated test are the key. We struggled a lot with regressions, where one change to our redirection code would break other test scenarios and without tests we would be running circles. We ended up having over 100 of test cases just for making sure we serve correct and single 301 redirects for each of the scenarios.
 * For the HTTPS switch itself, monitoring and reporting HTTPS issues with the CSP reporting is a must. Especially in case of a diversified platform and user-generated content.
 * Read a lot and learn from other projects. We spent a lot of time upfront reading about the Stack Overflow migration or learning from the wired.com migration. It allowed us to learn about best practices and avoid repeating mistakes. It helps you spot risks or parts of the projects you would miss otherwise while planning. It also helps with the estimation by comparing the complexity of your own setup and project to others.
 * Estimates were a self-fulfilling prophecy. Knowledge about the time left till the end of a milestone/project and the amount of work left tend to influence the technical decisions and it can be either conscious or unconscious. There were for sure parts of the project that could be easily turned into full-blown epics or projects on their own, but we did not even consider the most expensive options. Most often we were doing the best we could without increasing the technical debt and keeping the projects within the time constraints.

Summary
As of today, we’re serving all the traffic over HTTPS and we’ve switched our top domain from wikia.com to fandom.com. From the business point of view, the migration was successful as after the migration we’ve regained our search traffic. From the engineering point of view, it was an interesting adventure. We’ve touched all parts of our infrastructure. We collaborated with all Fandom teams while working on this project. We did break some things and we improved others. And we’ve managed to plan a year-long project and deliver it on time!