Build an AI Agent Loop in 50 Lines of Elixir

Jared Smith June 21, 2026 10 min read AI agents Elixir

Every AI agent framework runs the same loop: observe, decide, act, repeat. Here it is in 50 lines of Elixir — no framework, just a GenServer.

TL;DR: Claude Code, Cursor, Devin, Codex — they all run the same pattern under the hood. The LLM proposes an action. Your code executes it. You feed the result back. The LLM proposes the next action. Repeat until done. That’s it. In Elixir, this is a GenServer with a recursive message loop. Fifty lines. No framework, no SDK beyond an HTTP client. Once you see the pattern, you can’t unsee it — and you’ll never need to treat agent frameworks as black boxes again.

What every agent framework is hiding from you

Open the source of any AI agent framework — LangChain, CrewAI, Anthropic’s own Claude Code — and strip away the plugin registries, the YAML configs, the abstract base classes, the middleware stacks. What’s left is always the same thing:

loop do
  response = call_llm(messages)

  if response.wants_to_use_a_tool?
    result = run_tool(response.tool_name, response.tool_args)
    messages = messages ++ [response, result]
  else
    break response.final_answer
  end
end

That’s the agent loop. The entire intellectual content of “agentic AI” is a while loop with an LLM call inside it.

The frameworks add real value on top — tool registries, memory management, multi-agent routing, streaming, token tracking — but none of those are the core. The core is this loop. And if you don’t understand the loop, you can’t debug the framework. You can’t reason about costs. You can’t explain why your agent got stuck in an infinite cycle or burned $40 on a task that should’ve cost $0.50.

So let’s build it.

The pattern: observe → decide → act → loop

Before we write code, here’s the pattern with proper names. It’s called a ReAct loop in the literature, but you don’t need the paper. It’s four steps:

Observe — gather the current state. On the first turn, this is the user’s goal. On subsequent turns, it’s the result of the last tool call.
Decide — send the accumulated context to the LLM. It either proposes a tool call or returns a final answer.
Act — if it proposed a tool call, execute it. If it returned a final answer, you’re done.
Loop — append the tool result to the conversation and go back to step 2.

That’s it. The LLM is the decision engine. Your code is the executor. The conversation history is the state.

In Elixir, this maps perfectly to a GenServer. The conversation history is the process state. Each iteration is a handle_info callback that sends itself the next :step message. Supervision, crash isolation, and process monitoring come free from OTP — the things agent frameworks in other languages spend thousands of lines reimplementing.

The 50-line agent

Here it is. The full module — GenServer, API call, tool dispatch, loop. Count the lines yourself.

defmodule Agent.Loop do
  use GenServer

  def run(goal, tools \\ []), do: GenServer.start_link(__MODULE__, {goal, tools})

  @impl true
  def init({goal, tools}) do
    send(self(), :step)
    {:ok, %{messages: [%{role: "user", content: goal}], tools: tools}}
  end

  @impl true
  def handle_info(:step, state) do
    case call_llm(state.messages, state.tools) do
      {:tool_use, name, input, assistant_msg, tool_use_id} ->
        result = dispatch_tool(name, input)

        tool_msg = %{
          role: "user",
          content: [%{type: "tool_result", tool_use_id: tool_use_id, content: result}]
        }

        send(self(), :step)
        {:noreply, %{state | messages: state.messages ++ [assistant_msg, tool_msg]}}

      {:done, answer} ->
        IO.puts("\n✅ Agent finished: #{answer}")
        {:stop, :normal, state}
    end
  end

  defp call_llm(messages, tools) do
    body = %{model: "claude-sonnet-4-20250514", max_tokens: 4096, messages: messages}
    body = if tools == [], do: body, else: Map.put(body, :tools, tools)

    headers = [{"x-api-key", System.get_env("ANTHROPIC_API_KEY")}, {"anthropic-version", "2023-06-01"}]
    {:ok, %{status: 200, body: resp}} = Req.post("https://api.anthropic.com/v1/messages", json: body, headers: headers)

    case resp["stop_reason"] do
      "tool_use" ->
        tool = Enum.find(resp["content"], &(&1["type"] == "tool_use"))
        assistant_msg = %{role: "assistant", content: resp["content"]}
        {:tool_use, tool["name"], tool["input"], assistant_msg, tool["id"]}

      _ ->
        text = resp["content"] |> Enum.find(&(&1["type"] == "text")) |> Map.get("text", "")
        {:done, text}
    end
  end

  defp dispatch_tool(name, input) do
    IO.puts("🔧 #{name}(#{inspect(input)})")

    case name do
      "read_file" -> input["path"] |> File.read() |> then(fn {:ok, c} -> c; {:error, r} -> "Error: #{r}" end)
      "list_files" -> (input["path"] || ".") |> File.ls!() |> Enum.join("\n")
      "write_file" -> File.write!(input["path"], input["content"]); "ok"
      _ -> "Unknown tool: #{name}"
    end
  end
end

That’s it. No framework. No agent SDK. The init callback sets up the conversation with the user’s goal and sends the first :step message. Each :step calls the LLM, checks whether it wants to use a tool, and either dispatches the tool and loops or prints the final answer and stops.

Two things to notice about the message format. The assistant message (%{role: "assistant", content: resp["content"]}) carries the raw content blocks from the API response — including both text and tool_use blocks. The tool_use_id is passed separately and only used to build the tool_result message back. This keeps the messages clean: everything sent back to the API on the next turn matches the Messages API schema exactly.

Running it

To see this work, define your tools using Anthropic’s tool use schema and call run/2:

tools = [
  %{
    name: "read_file",
    description: "Read the contents of a file at the given path.",
    input_schema: %{
      type: "object",
      properties: %{
        path: %{type: "string", description: "Absolute or relative file path"}
      },
      required: ["path"]
    }
  },
  %{
    name: "list_files",
    description: "List all files in a directory.",
    input_schema: %{
      type: "object",
      properties: %{
        path: %{type: "string", description: "Directory path. Defaults to current directory."}
      },
      required: []
    }
  }
]

Agent.Loop.run("Read my mix.exs and tell me what Elixir version this project uses.", tools)

You’ll see output like:

🔧 read_file(%{"path" => "mix.exs"})
✅ Agent finished: Based on your mix.exs, this project uses Elixir ~> 1.18.

The agent read the file, interpreted it, and answered — two turns, one tool call, done. For a more complex goal, it’ll chain multiple tool calls automatically. Ask it to “find all test files and summarize what they test” and you’ll watch it call list_files, then read_file on each result, then synthesize. The loop just keeps going until the model is satisfied.

What you get for free from OTP

If you’ve built agent loops in Python or TypeScript, you’ve probably added code for:

Crash isolation — if a tool call throws, the agent shouldn’t crash your whole application.
Supervision — if the agent process dies, something should notice and optionally restart it.
Concurrency — running multiple agents simultaneously without blocking.
Process monitoring — knowing when an agent finishes or dies.

In Elixir, you have all of these before you write a single line. The GenServer is a process. If it crashes, only that process dies. Wrap it in a Supervisor and it restarts automatically. Start ten agents and they run concurrently on the BEAM scheduler — no threads, no async/await, no event loop. Call Process.monitor/1 on the agent’s PID to get a message when it finishes.

This is what I mean when I say the BEAM is the runtime AI agents want. The primitives that agent frameworks bolt on top of Python — Celery for durability, threading for concurrency, signal handlers for cleanup — are the default in OTP. You start with them.

The three guardrails that separate a toy from a tool

The 50-line version works, but it’ll happily loop forever if the LLM gets confused, burn through your API budget on a runaway task, or hang indefinitely on a slow model call. Here are the three guardrails that make it safe for real use.

1. Max iterations

The simplest protection: a counter.

def init({goal, tools}) do
  send(self(), :step)

  {:ok, %{
    messages: [%{role: "user", content: goal}],
    tools: tools,
    step: 0,
    max_steps: 30
  }}
end

def handle_info(:step, %{step: step, max_steps: max} = state) when step >= max do
  IO.puts("⛔ Agent hit step limit (#{max}). Stopping.")
  {:stop, :normal, state}
end

def handle_info(:step, state) do
  # ... same loop logic, plus:
  {:noreply, %{state | step: state.step + 1, messages: updated_messages}}
end

Thirty steps is generous for most tasks. A runaway agent hitting this limit is a signal that the goal was too vague or the tools are insufficient — both things you want to know about rather than papering over with more iterations.

2. Token budget

Model calls cost money. Track cumulative usage and kill the run when it crosses a threshold.

defp call_llm(messages, tools) do
  # ... same Req.post call ...

  usage = %{
    input: resp["usage"]["input_tokens"],
    output: resp["usage"]["output_tokens"]
  }

  case resp["stop_reason"] do
    "tool_use" ->
      tool = Enum.find(resp["content"], &(&1["type"] == "tool_use"))
      assistant_msg = %{role: "assistant", content: resp["content"]}
      {:tool_use, tool["name"], tool["input"], assistant_msg, tool["id"], usage}

    _ ->
      text = resp["content"] |> Enum.find(&(&1["type"] == "text")) |> Map.get("text", "")
      {:done, text, usage}
  end
end

Then in handle_info:

def handle_info(:step, state) do
  case call_llm(state.messages, state.tools) do
    {:tool_use, name, input, assistant_msg, tool_use_id, usage} ->
      total = state.tokens_used + usage.input + usage.output

      if total > state.token_budget do
        IO.puts("💸 Token budget exhausted (#{total} / #{state.token_budget})")
        {:stop, :normal, state}
      else
        result = dispatch_tool(name, input)
        tool_msg = %{role: "user", content: [%{type: "tool_result", tool_use_id: tool_use_id, content: result}]}
        send(self(), :step)
        {:noreply, %{state | tokens_used: total, messages: state.messages ++ [assistant_msg, tool_msg]}}
      end

    {:done, answer, _usage} ->
      IO.puts("\n✅ Agent finished: #{answer}")
      {:stop, :normal, state}
  end
end

A sensible default for Sonnet is 100,000 tokens per run. That’s roughly $0.80 — enough for a substantial task, cheap enough that a runaway won’t surprise you on the bill.

3. Timeout

LLM API calls hang sometimes. The tempting fix is Process.send_after(self(), :timeout, 60_000) — but that won’t fire while the process is blocked inside a synchronous HTTP call. The message just sits in the mailbox until Req.post returns, which defeats the point.

The real fix is simpler: tell Req itself to enforce the deadline.

defp call_llm(messages, tools) do
  body = %{model: "claude-sonnet-4-20250514", max_tokens: 4096, messages: messages}
  body = if tools == [], do: body, else: Map.put(body, :tools, tools)

  case Req.post("https://api.anthropic.com/v1/messages",
         json: body,
         headers: [
           {"x-api-key", System.get_env("ANTHROPIC_API_KEY")},
           {"anthropic-version", "2023-06-01"}
         ],
         receive_timeout: 60_000
       ) do
    {:ok, %{status: 200, body: resp}} ->
      # ... parse response as before ...

    {:error, %Req.TransportError{reason: :timeout}} ->
      {:error, :timeout}

    {:error, reason} ->
      {:error, reason}
  end
end

Then handle it in the loop:

{:error, :timeout} ->
  IO.puts("⏱️  LLM call timed out after 60s. Stopping.")
  {:stop, :normal, state}

Sixty seconds is conservative. Most Sonnet calls return in 3–15 seconds. If you’re waiting sixty, the API is having a bad day and burning more retries won’t help.

Where to go from here

You now own the primitive. Every agent framework is this loop with more stuff on top. Here’s what that “more stuff” looks like when you’re ready:

Multi-tool registries. The dispatch_tool function in the example is a hard-coded case statement. For a real system, you’d define a behaviour — @callback execute(map()) :: String.t() — and register modules dynamically. Each tool becomes its own module with its own tests.

Durable runs that survive deploys. The GenServer version loses everything if the node restarts. For long-running agents (tens of minutes, dozens of steps), you want the conversation state in Postgres, not in process memory. I wrote a full walkthrough of this pattern: Oban as a Durable AI Agent Runtime in Elixir.

Streaming. The example waits for the full response before acting. For interactive use — showing the user what the agent is thinking in real-time — you’d stream tokens back through a LiveView socket. The pattern is covered in Streaming LLM Tokens in LiveView, the 2026 Way.

Memory and context management. After enough tool calls, the conversation history exceeds the model’s context window. Production agents prune old tool results, summarize prior steps, or use a sliding window. This is where the frameworks genuinely earn their weight.

But you don’t need any of that to start. You need the loop, three guardrails, and a goal. Fifty lines. One GenServer. The rest is iteration.

All writing