Why Every AI Agent Framework Is Written in Go (And What That Costs You)

May 16, 2026 12 min read go agents engineering

Open the repos behind the agent tooling you run — Ollama, the MCP SDKs, the orchestration engines — and it's all Go. Not because Go is good at AI. Because an agent tool is a concurrent network daemon that ships as one binary.

Terminal showing go build -o mcp producing a single static binary, then ./mcp launching a stdio MCP server

TL;DR: Open the repos behind the agent tooling you actually run — Ollama, the MCP SDKs, the orchestration engines — and you keep landing on the same language: Go. Not because Go is good at AI; it isn't, particularly. Because the thing an agent tool actually is — a long-lived concurrent network daemon that has to ship as one binary your users can run without a runtime — is exactly what Go was built for. That's a great trade for the daemon and a bad one for the reasoning layer, which is why the ecosystem is quietly splitting in two: Go for the process, Python/TS for the prompt logic. Here's the honest version, with a stdlib-only MCP server you can compile to a single file.

The roll-call

Go look at what you're running.

If you've followed along with the Claude Code Resource Bible, you've already installed half of this list without noticing the pattern. Ollama, the thing you brew install to run a local model, is Go — a single binary, no Python environment to break (github.com/ollama/ollama). The Model Context Protocol's official Go SDK is Go, maintained in collaboration with Google (github.com/modelcontextprotocol/go-sdk). The most-used community MCP library, mark3labs/mcp-go, is Go (github.com/mark3labs/mcp-go). CloudWeGo's Eino — billed flatly as "the ultimate LLM/AI application development framework in Go" — is Go (github.com/cloudwego/eino). LangChainGo is Go (github.com/tmc/langchaingo). Drop down a layer to the orchestration substrate agents run on and it's the same story: Temporal, the durable-execution engine a lot of agent workflows are built on, is Go (github.com/temporalio/temporal); Dagger, the automation engine that grew agent modules, is Go (github.com/dagger/dagger); Docker and Kubernetes — the things the whole circus is shipped and scheduled in — are Go.

The reasoning content of all of this — the models, the research, the prompt engineering, the eval harnesses — is overwhelmingly Python. But the machinery that delivers it to you is overwhelmingly Go. That split is not an accident and it is not taste. It's an architectural tell, and once you see it you can't unsee it.

The title of this post is the argument. The rest of it earns the claim, and then tells you what believing it costs you — because the same properties that make Go the right call for the daemon make it an actively annoying place to write the part of an agent that thinks.

Why Go fits the shape of the problem

Start from what an agent tool actually is, mechanically, once you strip the word "AI" off it.

It's a process that starts up, holds open some connections (stdio to a host, a socket, an HTTP listener), waits for requests, fans each one out to a few concurrent things — a model call, a tool call, a file read, a subprocess — collects the results, and writes a response. It does that for a long time without being restarted. It needs to be installed by people who do not have, and do not want, your development environment. Describe that to a backend engineer without ever saying "LLM" and they will tell you the language: it's a network daemon, and Go is a language designed, deliberately and narrowly, to write network daemons.

It ships as one file. This is the single biggest reason and it has nothing to do with AI. The Go FAQ states it plainly: "The linker in the gc toolchain creates statically-linked binaries by default. All Go binaries therefore include the Go runtime" (go.dev/doc/faq). A Go MCP server is one executable. No pip install that resolves differently on the user's machine than yours. No "works on my Python 3.11, your 3.13 broke a transitive dep." No asking a user to manage a virtualenv to run your tool. You cross-compile it for three platforms in CI and the install instruction is "download this file." For a tool whose entire job is to be installed into other people's agent setups, that property is worth more than any language feature. It's why Ollama feels like a native app and most Python AI tooling feels like a science experiment you have to host.

Goroutines are the right concurrency primitive for fan-out. An agent step is a fan-out: call the model, and while that's in flight maybe pre-warm a tool, read a file, hit a cache. Go's answer to "do these concurrently" is go f() and a channel, and the runtime cost is genuinely low. From the same FAQ: goroutines "can be very cheap: they have little overhead beyond the memory for the stack, which is just a few kilobytes," the CPU overhead "averages about three cheap instructions per function call," and "it is practical to create hundreds of thousands of goroutines in the same address space" (go.dev/doc/faq). You do not need an async framework, an event-loop mental model, or function coloring. A goroutine per in-flight tool call, a context.Context for cancellation when the model returns early, done. (It is worth knowing where this model stops helping you — a panic in one goroutine is not isolated the way people assume; I went deep on exactly that failure mode in Elixir's concurrency model, and it applies directly to long-lived agent daemons.)

The standard library already has the daemon parts. net/http is a production HTTP server and client in the stdlib. encoding/json is in the stdlib. os/exec for spawning the tool subprocesses agents love is in the stdlib. context for deadline and cancellation propagation is in the stdlib. The dependency footprint of a competent MCP server in Go can be zero third-party packages, which means the supply-chain surface of the thing you're injecting into a user's machine is the Go team plus you. Compare that to the transitive dependency tree of an equivalent Python or Node tool. For software whose threat model includes "runs with access to a developer's filesystem and shell," a near-empty go.mod is not austerity; it's a security property.

Cold start and steady-state are both cheap. A statically linked Go binary has no interpreter to boot and no JIT to warm. It starts, it serves, its memory is roughly what its working set is. This matters for the specific way agent tools get used: spawned per-session, sometimes per-invocation, by a host process that may start and stop them constantly. A tool that takes 800ms to import its dependency tree before it can answer the first request is a tool that makes the whole agent feel slow. (Treat the comparison as directional, not a benchmark — "no interpreter boot, no JIT warmup" is an architectural fact; the millisecond figure depends entirely on your dependency tree.)

None of these four is an AI capability. That's the point. Go didn't win agent infrastructure by being good at the AI part. It won by being unreasonably good at the boring 90% of an agent tool that isn't the AI part.

The MCP angle: proof in ~30 lines

The cleanest demonstration is the Model Context Protocol, because an MCP server is the agent-tool shape distilled to its essence: read framed JSON-RPC requests on stdin, do something, write JSON-RPC responses on stdout, live as a subprocess of the host. Here is a working line-delimited MCP-style stdio server in Go that exposes one tool, with zero imports outside the standard library. It compiles to one binary.

package main

import (
	"bufio"
	"encoding/json"
	"os"
	"strings"
)

type rpc struct {
	JSONRPC string          `json:"jsonrpc"`
	ID      json.RawMessage `json:"id,omitempty"`
	Method  string          `json:"method,omitempty"`
	Params  json.RawMessage `json:"params,omitempty"`
	Result  any             `json:"result,omitempty"`
}

func main() {
	in := bufio.NewScanner(os.Stdin)
	in.Buffer(make([]byte, 1<<20), 1<<20)
	out := json.NewEncoder(os.Stdout)

	for in.Scan() {
		var req rpc
		if json.Unmarshal(in.Bytes(), &req) != nil {
			continue
		}
		resp := rpc{JSONRPC: "2.0", ID: req.ID}
		switch req.Method {
		case "initialize":
			resp.Result = map[string]any{
				"protocolVersion": "2025-06-18",
				"capabilities":    map[string]any{"tools": map[string]any{}},
				"serverInfo":      map[string]any{"name": "echo", "version": "0.1.0"},
			}
		case "tools/list":
			resp.Result = map[string]any{"tools": []any{map[string]any{
				"name":        "shout",
				"description": "Uppercases its input.",
				"inputSchema": map[string]any{"type": "object"},
			}}}
		case "tools/call":
			var p struct{ Arguments struct{ Text string } }
			_ = json.Unmarshal(req.Params, &p)
			resp.Result = map[string]any{"content": []any{map[string]any{
				"type": "text", "text": strings.ToUpper(p.Arguments.Text),
			}}}
		default:
			continue // notifications and unknown methods: no reply
		}
		_ = out.Encode(&resp)
	}
}

I am being deliberately honest about what this is: a shape demonstration, not a spec-complete server — a production MCP server adds Content-Length framing, proper error objects, schema validation, and the rest of the lifecycle, which is precisely what an SDK like mark3labs/mcp-go or the official modelcontextprotocol/go-sdk gives you so you don't write it by hand. But look at what the 40 lines already are: a long-lived process, reading framed requests, dispatching concurrently if you wanted (go handle(req)), zero dependencies, one go build away from a binary you hand someone. That is the entire job. The reason every MCP SDK has a first-class Go implementation is that this is the language where that job is a Tuesday. The same daemon written to be spec-complete in Python is more code and a dependency tree and a runtime your user has to already have.

If you want to see what this looks like when it's not a toy — many tools, real lifecycle, orchestrated — that's essentially the architecture I pulled apart in the ruflo / claude-flow multi-agent deep-dive: a swarm of these daemons is still, underneath, this loop.

What it costs you

Here is the part the title doesn't say and most "Go for AI" posts skip, because it's the part that bites you three weeks in.

The LLM-orchestration layer is verbose and joyless in Go. The work of an agent's brain — assemble a prompt from fragments, call a model, parse a structured response, branch on it, maybe retry with a tweaked prompt, thread some state through — is exactly the kind of code Go is worst at. It's data-shuffling glue, and Go's error-handling model means every one of those steps is three lines (x, err := ...; if err != nil { return ... }) where Python is one. A prompt pipeline that is fifteen readable lines of Python becomes sixty lines of Go where the logic is buried under ceremony. None of those lines are wrong. They're just noise drowning the part you actually want to iterate on.

Generics help, but the ergonomics for prompt pipelines are still weak. Go only got generics in 1.18 — the announcement calls it "our biggest change ever to the language," released 15 March 2022 (go.dev/blog/go1.18). They're real and they help, but they're constrained by design: no sum types, no rich pattern matching, no ergonomic "this is one of these five structured outputs" the way you'd model an LLM's response variants in a language with proper ADTs. Modeling "the model returned either a tool call, or text, or a refusal, or a malformed blob" is clean in TypeScript's discriminated unions and clumsy in Go's type switches and interface assertions. The reasoning layer is full of that shape of problem.

JSON ⇄ struct friction is constant, and LLM output is the worst case for it. Go's encoding/json wants to marshal into known, typed structs. LLM output is semi-structured, frequently almost-but-not-quite the schema, and routinely needs "parse what you can, tolerate the rest." Go's typed unmarshalling fights you here: you end up reaching for map[string]any and type-asserting your way through a blob, which is exactly the dynamically-typed code Go is trying to prevent you from writing — except now it's verbose dynamically-typed code. Python's "it's a dict, deal with it" is genuinely better for the messy boundary where model output meets program.

There's no official REPL, so prompt iteration has no inner loop. Iterating on a prompt is inherently interactive: tweak wording, run it, look at the output, tweak again, ten times in two minutes. Python and a notebook are built for that loop. Go's edit-compile-run cycle is fast by compiled-language standards but it is not a REPL, and there is no official one — your prompt-tuning loop is "edit file, go run, read stdout, repeat," which is enough slower per iteration that you simply iterate less, which makes your prompts worse. This is a real product cost hiding inside a developer-experience complaint.

Notice these four costs all live in the same place: the reasoning layer, not the daemon. Go's weaknesses are precisely Python's strengths and vice versa, which is why mature teams stop trying to pick one.

The split that's actually emerging

The interesting thing isn't "Go won" or "Python won." It's that the production answer is increasingly both, with a wire between them, and the seam is falling in a consistent place.

The pattern: Go owns the daemon, Python/TS owns the reasoning. The long-lived process — the MCP server, the orchestrator, the thing holding connections and fanning out concurrent tool calls and shipping as one binary — is Go, for every reason in the "why Go fits" section. The model-facing logic — prompt construction, the eval harness, anything you want to iterate on interactively, anything that benefits from the ML ecosystem — is Python or TypeScript, called across a boundary: a subprocess the Go daemon spawns, a local HTTP call, an MCP tool that is itself implemented in Python. The Go process doesn't do the thinking; it's the supervisor and the I/O multiplexer for the things that do. You can see this directly in the ecosystem: the infrastructure repos are Go, and they call out to model code rather than embedding it.

This is a better factoring than "rewrite the prompt logic in Go for consistency," which teams try once and regret. The boundary is load-bearing: it's the line between the part that has to be operationally boring (one binary, cheap concurrency, tiny dependency surface) and the part that has to be iteration-friendly (interactive, dynamically typed, ML-ecosystem-adjacent). Fighting that boundary by forcing one language across it costs you either the daemon's deployability or the reasoning layer's iteration speed.

Verdict: when to write your agent tool in Go

The honest decision rule, not the language-war version:

Your situation	Write it in	Why
MCP server, CLI tool, orchestrator, anything users install	Go	One binary, near-zero deps, cheap concurrency, ops-boring — its home turf
Prompt logic, eval harness, anything you iterate on interactively	Python / TS	REPL/notebook loop, ergonomic semi-structured data, ML ecosystem
A tool that is mostly daemon with a thin model call	Go	The model call is one HTTP request; don't move the daemon for it
A tool that is mostly reasoning with a thin I/O wrapper	Python / TS	Don't pay Go's orchestration tax to save a few MB of binary
A system that is meaningfully both	Both, split at the daemon/reasoning seam	Go supervises and multiplexes; Python/TS thinks; talk over a wire
"We want one language for consistency"	Pick by the dominant axis above	Consistency is not a strong enough reason to eat the wrong side's tax

The reason every AI agent framework is written in Go is that "agent framework" is a misleading name. The framework part — the part that's actually a framework — is a concurrent network daemon, and Go is the best mainstream language for concurrent network daemons by a wide margin. The AI part isn't in the framework. It's in the model, and in the prompt logic you should keep in a language built for iterating on it.

Use Go for the thing Go is for. Ship the binary. Just don't let "all the infra is Go" talk you into writing your prompt pipeline there too — that's how you end up with sixty joyless lines doing fifteen lines of thinking, and a prompt you tuned three times instead of thirty because the loop was too slow to bother.

All writing