Build an AI Agent Loop in 50 Lines of Elixir
Every AI agent framework runs the same loop: observe, decide, act, repeat. Here it is in 50 lines of Elixir — no framework, just a GenServer.
TL;DR: Claude Code, Cursor, Devin, Codex — they all run the same pattern under the hood. The LLM proposes an action. Your code executes it. You feed the result back. The LLM proposes the next action. Repeat until done. That’s it. In Elixir, this is a GenServer with a recursive message loop. Fifty lines. No framework, no SDK beyond an HTTP client. Once you see the pattern, you can’t unsee it — and you’ll never need to treat agent frameworks as black boxes again.
What every agent framework is hiding from you
Open the source of any AI agent framework — LangChain, CrewAI, Anthropic’s own Claude Code — and strip away the plugin registries, the YAML configs, the abstract base classes, the middleware stacks. What’s left is always the same thing:
loop do
response = call_llm(messages)
if response.wants_to_use_a_tool?
result = run_tool(response.tool_name, response.tool_args)
messages = messages ++ [response, result]
else
break response.final_answer
end
end
That’s the agent loop. The entire intellectual content of “agentic AI” is a while loop with an LLM call inside it.
The frameworks add real value on top — tool registries, memory management, multi-agent routing, streaming, token tracking — but none of those are the core. The core is this loop. And if you don’t understand the loop, you can’t debug the framework. You can’t reason about costs. You can’t explain why your agent got stuck in an infinite cycle or burned $40 on a task that should’ve cost $0.50.
So let’s build it.
The pattern: observe → decide → act → loop
Before we write code, here’s the pattern with proper names. It’s called a ReAct loop in the literature, but you don’t need the paper. It’s four steps:
- Observe — gather the current state. On the first turn, this is the user’s goal. On subsequent turns, it’s the result of the last tool call.
- Decide — send the accumulated context to the LLM. It either proposes a tool call or returns a final answer.
- Act — if it proposed a tool call, execute it. If it returned a final answer, you’re done.
- Loop — append the tool result to the conversation and go back to step 2.
That’s it. The LLM is the decision engine. Your code is the executor. The conversation history is the state.
In Elixir, this maps perfectly to a GenServer. The conversation
history is the process state. Each iteration is a
handle_info callback that sends itself the next
:step message. Supervision, crash isolation, and process
monitoring come free from OTP — the things agent frameworks in other
languages spend thousands of lines reimplementing.
The 50-line agent
Here it is. The full module — GenServer, API call, tool dispatch, loop. Count the lines yourself.
defmodule Agent.Loop do
use GenServer
def run(goal, tools \\ []), do: GenServer.start_link(__MODULE__, {goal, tools})
@impl true
def init({goal, tools}) do
send(self(), :step)
{:ok, %{messages: [%{role: "user", content: goal}], tools: tools}}
end
@impl true
def handle_info(:step, state) do
case call_llm(state.messages, state.tools) do
{:tool_use, name, input, assistant_msg, tool_use_id} ->
result = dispatch_tool(name, input)
tool_msg = %{
role: "user",
content: [%{type: "tool_result", tool_use_id: tool_use_id, content: result}]
}
send(self(), :step)
{:noreply, %{state | messages: state.messages ++ [assistant_msg, tool_msg]}}
{:done, answer} ->
IO.puts("\n✅ Agent finished: #{answer}")
{:stop, :normal, state}
end
end
defp call_llm(messages, tools) do
body = %{model: "claude-sonnet-4-20250514", max_tokens: 4096, messages: messages}
body = if tools == [], do: body, else: Map.put(body, :tools, tools)
headers = [{"x-api-key", System.get_env("ANTHROPIC_API_KEY")}, {"anthropic-version", "2023-06-01"}]
{:ok, %{status: 200, body: resp}} = Req.post("https://api.anthropic.com/v1/messages", json: body, headers: headers)
case resp["stop_reason"] do
"tool_use" ->
tool = Enum.find(resp["content"], &(&1["type"] == "tool_use"))
assistant_msg = %{role: "assistant", content: resp["content"]}
{:tool_use, tool["name"], tool["input"], assistant_msg, tool["id"]}
_ ->
text = resp["content"] |> Enum.find(&(&1["type"] == "text")) |> Map.get("text", "")
{:done, text}
end
end
defp dispatch_tool(name, input) do
IO.puts("🔧 #{name}(#{inspect(input)})")
case name do
"read_file" -> input["path"] |> File.read() |> then(fn {:ok, c} -> c; {:error, r} -> "Error: #{r}" end)
"list_files" -> (input["path"] || ".") |> File.ls!() |> Enum.join("\n")
"write_file" -> File.write!(input["path"], input["content"]); "ok"
_ -> "Unknown tool: #{name}"
end
end
endThat’s it. No framework. No agent SDK. The init callback
sets up the conversation with the user’s goal and sends the first
:step message. Each :step calls the LLM,
checks whether it wants to use a tool, and either dispatches the tool
and loops or prints the final answer and stops.
Two things to notice about the message format. The assistant message
(%{role: "assistant", content: resp["content"]}) carries
the raw content blocks from the API response — including both text and
tool_use blocks. The tool_use_id is passed separately and
only used to build the tool_result message back. This keeps
the messages clean: everything sent back to the API on the next turn
matches the Messages
API schema exactly.
Running it
To see this work, define your tools using Anthropic’s tool use schema
and call run/2:
tools = [
%{
name: "read_file",
description: "Read the contents of a file at the given path.",
input_schema: %{
type: "object",
properties: %{
path: %{type: "string", description: "Absolute or relative file path"}
},
required: ["path"]
}
},
%{
name: "list_files",
description: "List all files in a directory.",
input_schema: %{
type: "object",
properties: %{
path: %{type: "string", description: "Directory path. Defaults to current directory."}
},
required: []
}
}
]
Agent.Loop.run("Read my mix.exs and tell me what Elixir version this project uses.", tools)You’ll see output like:
🔧 read_file(%{"path" => "mix.exs"})
✅ Agent finished: Based on your mix.exs, this project uses Elixir ~> 1.18.
The agent read the file, interpreted it, and answered — two turns,
one tool call, done. For a more complex goal, it’ll chain multiple tool
calls automatically. Ask it to “find all test files and summarize what
they test” and you’ll watch it call list_files, then
read_file on each result, then synthesize. The loop just
keeps going until the model is satisfied.
What you get for free from OTP
If you’ve built agent loops in Python or TypeScript, you’ve probably added code for:
- Crash isolation — if a tool call throws, the agent shouldn’t crash your whole application.
- Supervision — if the agent process dies, something should notice and optionally restart it.
- Concurrency — running multiple agents simultaneously without blocking.
- Process monitoring — knowing when an agent finishes or dies.
In Elixir, you have all of these before you write a single line. The
GenServer is a process. If it crashes, only that process
dies. Wrap it in a Supervisor and it restarts
automatically. Start ten agents and they run concurrently on the BEAM
scheduler — no threads, no async/await, no event loop. Call
Process.monitor/1 on the agent’s PID to get a message when
it finishes.
This is what I mean when I say the BEAM is the runtime AI agents want. The primitives that agent frameworks bolt on top of Python — Celery for durability, threading for concurrency, signal handlers for cleanup — are the default in OTP. You start with them.
The three guardrails that separate a toy from a tool
The 50-line version works, but it’ll happily loop forever if the LLM gets confused, burn through your API budget on a runaway task, or hang indefinitely on a slow model call. Here are the three guardrails that make it safe for real use.
1. Max iterations
The simplest protection: a counter.
def init({goal, tools}) do
send(self(), :step)
{:ok, %{
messages: [%{role: "user", content: goal}],
tools: tools,
step: 0,
max_steps: 30
}}
end
def handle_info(:step, %{step: step, max_steps: max} = state) when step >= max do
IO.puts("⛔ Agent hit step limit (#{max}). Stopping.")
{:stop, :normal, state}
end
def handle_info(:step, state) do
# ... same loop logic, plus:
{:noreply, %{state | step: state.step + 1, messages: updated_messages}}
endThirty steps is generous for most tasks. A runaway agent hitting this limit is a signal that the goal was too vague or the tools are insufficient — both things you want to know about rather than papering over with more iterations.
2. Token budget
Model calls cost money. Track cumulative usage and kill the run when it crosses a threshold.
defp call_llm(messages, tools) do
# ... same Req.post call ...
usage = %{
input: resp["usage"]["input_tokens"],
output: resp["usage"]["output_tokens"]
}
case resp["stop_reason"] do
"tool_use" ->
tool = Enum.find(resp["content"], &(&1["type"] == "tool_use"))
assistant_msg = %{role: "assistant", content: resp["content"]}
{:tool_use, tool["name"], tool["input"], assistant_msg, tool["id"], usage}
_ ->
text = resp["content"] |> Enum.find(&(&1["type"] == "text")) |> Map.get("text", "")
{:done, text, usage}
end
endThen in handle_info:
def handle_info(:step, state) do
case call_llm(state.messages, state.tools) do
{:tool_use, name, input, assistant_msg, tool_use_id, usage} ->
total = state.tokens_used + usage.input + usage.output
if total > state.token_budget do
IO.puts("💸 Token budget exhausted (#{total} / #{state.token_budget})")
{:stop, :normal, state}
else
result = dispatch_tool(name, input)
tool_msg = %{role: "user", content: [%{type: "tool_result", tool_use_id: tool_use_id, content: result}]}
send(self(), :step)
{:noreply, %{state | tokens_used: total, messages: state.messages ++ [assistant_msg, tool_msg]}}
end
{:done, answer, _usage} ->
IO.puts("\n✅ Agent finished: #{answer}")
{:stop, :normal, state}
end
endA sensible default for Sonnet is 100,000 tokens per run. That’s roughly $0.80 — enough for a substantial task, cheap enough that a runaway won’t surprise you on the bill.
3. Timeout
LLM API calls hang sometimes. The tempting fix is
Process.send_after(self(), :timeout, 60_000) — but that
won’t fire while the process is blocked inside a synchronous HTTP call.
The message just sits in the mailbox until Req.post
returns, which defeats the point.
The real fix is simpler: tell Req itself to enforce the deadline.
defp call_llm(messages, tools) do
body = %{model: "claude-sonnet-4-20250514", max_tokens: 4096, messages: messages}
body = if tools == [], do: body, else: Map.put(body, :tools, tools)
case Req.post("https://api.anthropic.com/v1/messages",
json: body,
headers: [
{"x-api-key", System.get_env("ANTHROPIC_API_KEY")},
{"anthropic-version", "2023-06-01"}
],
receive_timeout: 60_000
) do
{:ok, %{status: 200, body: resp}} ->
# ... parse response as before ...
{:error, %Req.TransportError{reason: :timeout}} ->
{:error, :timeout}
{:error, reason} ->
{:error, reason}
end
endThen handle it in the loop:
{:error, :timeout} ->
IO.puts("⏱️ LLM call timed out after 60s. Stopping.")
{:stop, :normal, state}Sixty seconds is conservative. Most Sonnet calls return in 3–15 seconds. If you’re waiting sixty, the API is having a bad day and burning more retries won’t help.
Where to go from here
You now own the primitive. Every agent framework is this loop with more stuff on top. Here’s what that “more stuff” looks like when you’re ready:
Multi-tool registries. The
dispatch_tool function in the example is a hard-coded
case statement. For a real system, you’d define a behaviour
— @callback execute(map()) :: String.t() — and register
modules dynamically. Each tool becomes its own module with its own
tests.
Durable runs that survive deploys. The GenServer version loses everything if the node restarts. For long-running agents (tens of minutes, dozens of steps), you want the conversation state in Postgres, not in process memory. I wrote a full walkthrough of this pattern: Oban as a Durable AI Agent Runtime in Elixir.
Streaming. The example waits for the full response before acting. For interactive use — showing the user what the agent is thinking in real-time — you’d stream tokens back through a LiveView socket. The pattern is covered in Streaming LLM Tokens in LiveView, the 2026 Way.
Memory and context management. After enough tool calls, the conversation history exceeds the model’s context window. Production agents prune old tool results, summarize prior steps, or use a sliding window. This is where the frameworks genuinely earn their weight.
But you don’t need any of that to start. You need the loop, three guardrails, and a goal. Fifty lines. One GenServer. The rest is iteration.