How to Canonicalize a Domain in Elixir/Phoenix

What is a cannonical domain?

A cannonical domain name is a single source of truth for your website's domain. Canonicalization of a domain includes picking a www or non-www, and https or http version of your root domain. This is because web crawlers treat each version of your domain as seperate things since www is technically a sub-domain and could be pointed to another site, and https is a completely different protocol. As you can see below.

Web crawlers will treat each one of those examples as unique and unrelated, and we have very little control of how someone types or copies and pastes a URL from our site.

Canonicalization also extends down to the URL path, and in some cases you will need to use a meta tag to ensure the correct domain is recorded and attributed for the content by the web crawler. canonical domains and canonical URLs on Moz.com

Next time you're checking out a major site, try their www address and non-www address they will have chosen one or the other as their canonical domain, and have canonical link tags for each unique url on their site. The jury is still out on which one is better, but most people agree you pick one and stick with it and work towards consistency. Web crawlers reward consistency.

Why Canonicalize your domain?

Proper site SEO requires the cannonicalization of your domains. Major search engines are usually good at sorting our duplicate data and dealing with the web as it is. They try to figure out which is the primary source of the content, and where it was first seen, however if we make this easier for them it can prevent us from potentially losing any ranking, or "SEO juice" as they say, due to us making their job harder than it already is.

How Canonicalize Your Domain

Recently I needed to handle this is a graceful way on a Phoenix/Elixir project. This can be accomplished a number of ways using DNS, Nginx, Apache, or application logic and it really just comes down to how complex the logic needs to be for your redirect rules. I usually choose to do this at application level since that's where the majority of our routing rules live and it prevents it from being obfuscated from the software engineers. So I will be focusing on how to do this with application code, and not the other methods.

Coming from a Ruby and Rails background I expected to find a library that would handle this sort of thing for me. However, I was surprised to find their aren't any pre-existing libraries for handling this in Phoenix. While Phoenix does have a built in force_ssl option which is handy switch for enfocing SSL across the site. This lead me to believe there might be a canonical: true option somewhere in the config, and maybe it will be added to the framework later, it currently doesn't exist. So for now if I want all my traffic to go to a single domain I have to write something to make this happen.

Here is my production configuration for the Phoenix application, test and development have similar configurations. I have my host set to the domain I want to be the primary domain, incidentally this is also the domain Phoenix will use to create URLs across the site when calling the built in Phoenix helper methods.

# ./config/prod.ex

config :your_app_name, YourAppName.Endpoint,
   http: [port: {:system, "PORT"}],
   url: [scheme: "https", host: "yourappname.com", port: 443],
   force_ssl: [rewrite_on: [:x_forwarded_proto]],
   cache_static_manifest: "priv/static/cache_manifest.json",
   secret_key_base: "SuperSecretKeyBase"

Once we have the canonical domain selected and entered in our config we can use that config option to inform a simple plug script of where to redirect our traffic too, and enforce a single domain. If we wanted to work some redirect magic like redirecting specific domains to certain locations on your site we could handle that here, but for now we'll keep it simple.

# ./lib/your_app_name/canonical_domain.ex

defmodule YourAppName.Plugs.CanonicalDomain do
  import Plug.Conn

  def init(options) do
    options
  end

  def call(conn, _options) do
    if not_canonical_domain?(conn.host) do
      conn
      |> put_status(:moved_permanently)
      |> Phoenix.Controller.redirect(external: canonical_domain())
      |> halt()
    else
      conn
    end
  end

  defp canonical_domain do
    "#{canonical_scheme}://#{canonical_host()}"
  end

  defp canonical_host() do
    YourAppNameWeb.Endpoint.config(:url)[:host]
  end

  defp canonical_scheme() do
    YourAppNameWeb.Endpoint.config(:url)[:scheme]
  end

  defp not_canonical_domain?(host) do
    !Regex.match?(~r/(\Awww\.)?#{host}.*\z/i, canonical_host())
  end
end

Now that we have a plug script we need to make sure this plug gets run before any of our other routing logic, and on every page load. So we need to add it to the application endpoint.

   # ./lib/your_app_name_web/endpoint.ex
   
   defmodule YourAppNameWeb.Endpoint do
      use Phoenix.Endpoint, otp_app: :your_app_name
   
      if Mix.env == :prod do
        plug YourAppName.Plugs.CanonicalDomain
      end
      .....
   end 

I did find that this soulution can cause some weirdness and false positive errors with controller tests in Phoenix, and I didn't think it was ultra important to test redirection in each and every test case. So I turned it off in all environments except production, and test the script directly using unit tests. However it is safe to turn it on in dev and it shouldn't cause you any issues if you prefer to develop with it on.

Finally to round off the canonicalization process you need to add a link tag to the top of every page clearly stating the canonical domain.

<html>
    <head>
       <link rel="canonical" href="<%= YourAppNameWeb.Router.Helpers.url(@conn) <> @conn.request_path %>">
    </head>
</html>

This is a decent first pass and should work for most site, but there maybe edge case that require this to be pulled out into a more robust function or series of functions as your site grows. Just remember consistency is the key.

If you have a solution to this problem or suggestions for improvements please comment below I would love to hear from you.