Elixir Is the Language AI Codes Best

Jared Smith May 27, 2026 8 min read Elixir AI tools engineering

A Tencent benchmark across 20 languages found Elixir at the top of LLM code-completion rates — Claude Opus 4 hit 80.3% on Elixir vs 74.9% on C#. The reasons aren't an accident; they're the same boring properties that have always made Elixir pleasant, now compounded by AI.

Editorial card titled 'The Language AI Codes Best' with a glowing Elixir drop above a benchmark bar chart

TL;DR: A Tencent benchmark across 20 languages found Elixir had the highest LLM code-completion rate of any mainstream language — 97.5% of problems solved by at least one model, with Claude Opus 4 hitting 80.3% on Elixir vs 74.9% on C# and 72.5% on Kotlin. Dashbit broke down why. The reasons aren't a coincidence — they're the same boring properties that have always made Elixir pleasant to work in, now compounded by the fact that AI agents are writing more of your code every quarter. The strategic takeaway isn't "rewrite everything in Elixir." It's that the cost of choosing the trendy stack now includes "your AI tools will be measurably worse at it."

The benchmark nobody saw coming

I have a small reflex when a benchmark says my preferred stack wins: assume someone cooked the books. So when Dashbit pointed at a Tencent study showing Elixir at the top of an LLM coding benchmark, my first move was to look for the catch.

I didn't find one. The result is real. Across 20 languages, 97.5% of Elixir problems were solved by at least one model — the highest of any language tested. Claude Opus 4 scored 80.3% on Elixir, against 74.9% on C# and 72.5% on Kotlin. Those are not borderline gaps; in a benchmark where a few points decides the order, eight points is a moat.

The puzzle is why. Elixir is not the most popular language. It is not in the top ten for Stack Overflow answers or GitHub commits. Models should have a lot less Elixir to learn from than, say, Python or JavaScript. And yet here we are.

Dashbit's answer is the one I find convincing: the same language design choices that make Elixir nice to work in for humans turn out to be a force multiplier for the next decoder predicting the next token. None of it is about how much training data exists. It's about what the data looks like when there is some.

What Dashbit's argument actually says

The full piece is worth reading at the source, but the load-bearing claims are four.

Immutability gives models local reasoning. In mutable languages, a function can quietly mutate an object passed in from a caller, and now reasoning about what the function does requires understanding everyone who might have touched the object first. Dashbit calls this "spooky action at a distance." Elixir doesn't allow it: anything a function needs is given as input, anything a function changes is given as output. A model — or a human — can predict the next few lines from the function signature alone. The pipe operator (|>) then makes the flow of transformations literal in the source. Local reasoning is cheap, and cheap is what models are good at.

Documentation is a first-class language feature. Elixir distinguishes @moduledoc and @doc from inline comments. The doc strings are part of the language; they ship to HexDocs; they support iex> examples that are also executed as tests via doctests. The training corpus for Elixir is therefore unusually clean: a function's docs include a literal demonstration of its inputs and outputs, verified by CI to still be true. That is high-signal data per token. The recent addition of TypeSense-backed mix hex.search makes those docs version-aware, which matters more than it sounds — models trained on stale docs are wrong in confident, hard-to-detect ways.

Stability means training data ages well. Elixir 1.0 shipped in 2014. It is still on 1.x. Phoenix is on 1.8; Ecto is on 3. The community treats deprecation warnings as the upgrade path, not breaking changes. Compare against any JavaScript framework you can name. Every blog post and tutorial about Elixir from the last decade is still mostly correct, which means the training data isn't polluted with contradictions between v2 and v15 advice for the same library. A model trained on "how to do X in Elixir" has one answer to learn, not five.

Tooling closes the agent feedback loop. Compiled language, parallel compilation and tests, type inference (not full annotation) that catches the usual class of bugs without forcing ceremony, warnings rather than errors so iteration isn't blocked, and — critically — runtime introspection. You can inspect a live BEAM process's state, mailbox, and ancestry from a shell or programmatically. Tidewave's MCP server exposes that introspection to coding agents directly. The runtime is legible to an AI in a way Python or Node aren't, because it was built legible for humans first.

That last property is the one nobody else is replicating.

Why this compounds

Pretend for a second the benchmark is right and accept the explanation. What does it mean for a team picking a stack in 2026?

It means a tax that didn't used to exist now does. Every language choice already had a hiring cost, a library cost, a hosting cost. Now it has an agent-effectiveness cost, and that cost is not flat across stacks.

If a model is eight points better at completing Elixir tasks than the runner-up — and your engineers now spend a meaningful chunk of their day reviewing model output, asking for changes, and watching agents do work autonomously — that delta walks straight into your team's throughput. It compounds because every PR your humans don't have to babysit becomes time spent on the next one. It compounds again because the AI getting things right means less context-switching back into "I'd better write this from scratch, the agent's confused." Friction tax dropped twice.

This is the angle that's mostly missing from "which language should we use" debates. People still talk about ergonomics and ecosystems and hiring. Those matter. They no longer fully describe the cost function. The boring, stable, well-documented language that AI models reason about cleanly is now strictly more valuable than it used to be, and the trendy framework with twelve breaking changes a year is strictly more expensive.

The kind of code Elixir produces — small, isolated, well-typed by signature, documented in a format that's part of the language — is what AI is good at:

defmodule Inventory do
  @moduledoc "Stock-level operations for product SKUs."

  @doc """
  Decrements stock for a SKU by `qty`. Returns `{:ok, new_stock}` or
  `{:error, :insufficient_stock}` when the request would go negative.

  ## Examples

      iex> Inventory.decrement(%{"WIDGET-1" => 5}, "WIDGET-1", 2)
      {:ok, 3}

      iex> Inventory.decrement(%{"WIDGET-1" => 1}, "WIDGET-1", 5)
      {:error, :insufficient_stock}
  """
  def decrement(stock, sku, qty) when qty > 0 do
    case Map.get(stock, sku, 0) do
      current when current >= qty -> {:ok, current - qty}
      _ -> {:error, :insufficient_stock}
    end
  end
end

Ten lines of executable spec — the docstring is the test, the signature is the contract, the function is pure, the failure case has a tagged tuple instead of an exception. A model reading this sees a complete, verifiable unit. There is nothing offstage. That is the shape of Elixir at rest, and the shape AI handles best.

The honest costs

If Elixir-for-AI were free, every team would already be on it. It isn't. We ship Elixir in production for clients, and the costs are real.

The hiring pool is still smaller. This was the leading objection in 2018 and it's still the leading objection in 2026. You will interview fewer senior Elixir engineers than senior Go or Node engineers, full stop. The mitigation — "strong devs pick it up in two weeks" — is true and we've watched it work, but you cannot run a hiring strategy on it if you need three engineers next month.

Ecosystem corners still gap. Phoenix, LiveView, and Ecto are excellent. Outside that core: certain cloud SDKs are thinner than the Python or Go equivalents, niche protocol clients sometimes don't exist, and you'll occasionally write a NIF or shell out to another runtime. None of that is fatal; all of it is work you wouldn't have on a mainstream stack.

The BEAM isn't a number-crunching runtime. This is the one the AI conversation specifically muddles. Elixir is excellent for orchestrating AI workloads — calling models, streaming tokens to a LiveView UI, supervising long-running inference jobs, fanning out across providers. It is not where you run your matrix math. Nx and EXLA give you a real answer for numerical work via XLA, and they're impressive, but they're an escape hatch into compiled native code — not a claim that pure Elixir is fast at tensor ops. Get the framing right or your benchmarks will lie to you.

Erlang-isms leak through. Stack traces drop into Erlang term syntax. Docs split across Elixir and Erlang/OTP. Eventually you read Erlang source to understand a library. That tax is paid by every engineer, forever — not just at onboarding.

Verdict

The Tencent number is real, the Dashbit explanation is the right one, and the strategic implication is bigger than the benchmark headline suggests. You are not choosing a language for humans anymore. You are choosing a language for humans and the increasingly autonomous tools they work alongside. The properties that have always made Elixir pleasant — immutability, doc culture, version stability, runtime introspection — turn out to be the exact properties that make AI good at it, too. That is not a coincidence; it is a design philosophy paying off twice.

Pick Elixir for the reasons you would have picked it anyway: a stateful backend where things must fail independently, real-time UI without three layers of glue, a team that wants to ship boring code that ages well. Then notice that your AI assistants are also measurably better at it. Then pay the hiring tax with your eyes open, keep it away from your matrix math, and let it compound.

Source: "Why Elixir is the best language for AI" — Dashbit.

All writing