Build an LLM Wiki Your Coding Agent Actually Reads

A markdown knowledge base your AI agent reads and maintains — the three-layer architecture, the ingest/query/lint loop, and the rules that stop it rotting.

TL;DR: Andrej Karpathy’s “LLM wiki” idea is just a knowledge base written for a reader who is a model, not a person. You don’t need a vector database to start — you need a directory of markdown with three layers (immutable sources, model-written synthesis, and an index), three operations (ingest, query, lint), and two rules that stop it from rotting. Here’s the architecture I run, generically, and the discipline that keeps it useful past the first week.

Andrej Karpathy described keeping a personal “LLM wiki” — a knowledge base he writes and curates so that a model can read it. The framing stuck with a lot of people because it inverts the usual question. We spend enormous effort making models read our documents. The wiki asks the opposite: what does a document look like when its primary reader is an agent?

I’ve argued before that your CLAUDE.md is the onboarding doc you never wrote — the file the agent reads at the start of every session. That post is about the front door. This one is about the house behind it: the larger, structured knowledge base the agent reaches into when the answer isn’t on the welcome mat. If CLAUDE.md is “here’s how we work,” the wiki is “here’s everything we’ve learned, indexed so you can find it.”

You can build the whole thing with a text editor. No infrastructure required. The interesting part isn’t the tooling — it’s the shape.

The shape: three layers, not one folder

The failure mode for a knowledge base is a single flat folder of notes that slowly turns into a junk drawer. The fix is to separate things by who owns them and whether they’re allowed to change. Three layers:

wiki/
  raw/          # sources, captured verbatim — immutable
  synthesis/    # notes the model writes — concepts, comparisons, overviews
  maps/         # the index and the maps of content — read first

raw/ is what you captured. An article you clipped, a transcript, a paper, a config dump, a thread you didn’t want to lose. It goes in verbatim and it does not get edited. This is your ground truth. When a synthesized note and a source disagree, the source wins, and you can always tell which is which because they live in different folders.

synthesis/ is what the model made of it. This is where the agent earns its keep: a concept note that distills three sources into one explanation, a comparison note (“X versus Y, when to reach for each”), an overview that stitches a topic together. These are derived artifacts. They’re allowed to be wrong, allowed to be overwritten, allowed to go stale — because they’re cheap to regenerate from raw/.

maps/ is how anything gets found. At minimum, an index. Ideally a few “maps of content” — short hub notes that link out to everything on a topic. This is the layer the agent reads first, every time, before it opens anything else.

The point of the split is that it makes the dangerous operation — editing — safe. The model can churn the synthesis layer freely. It can never touch the sources.

Two files carry most of the weight

Inside maps/, two plain files do more than any fancy graph view ever will.

index.md — one line per page. A catalog the agent reads to decide what to open. Not the content, just the pointers:

# Index

- [retry-on-429](../synthesis/retry-on-429.md) — backoff + jitter for rate-limited API calls
- [streaming-vs-batch](../synthesis/streaming-vs-batch.md) — when to stream tokens, when to wait
- [auth-token-lifecycle](../synthesis/auth-token-lifecycle.md) — where tokens are minted, stored, rotated

This is the trick that lets you skip embeddings for a long time. The agent reads ~one screen of index, picks the two or three notes that matter, and opens only those. You’re doing retrieval with a table of contents instead of cosine similarity — and for a few hundred notes, a table of contents is more precise, not less.

log.md — append-only history. A dated record of what changed and why:

# Log

- [2026-06-28] Added comparison note on streaming vs batch; superseded the old "always stream" claim.
- [2026-06-22] Captured the rate-limit thread into raw/; wrote retry-on-429 synthesis.
- [2026-06-15] Marked auth-token-lifecycle 🟡 — provider changed rotation window, note may be stale.

The log is the thing people skip, and it’s the thing that makes the wiki trustworthy six months in. It’s the difference between a note you believe and a note you have to re-verify. Append only — you never rewrite history, you only add to it.

Three operations, not a hundred

A wiki you have to remember how to use is a wiki you stop using. Collapse everything into three verbs and wire each one to a command or skill so the agent runs it the same way every time.

Ingest. Pull a source into raw/ verbatim → write or update a synthesis note → append a line to log.md → add the pointer to index.md. Four steps, one command. The discipline is that ingestion is never “just save the link” — it always produces a synthesis note, because an unread source is indistinguishable from a source you don’t have.

Query. Read index.md → open the two or three relevant notes → answer, citing them. The agent should be able to say which notes it used. If it can’t, the index isn’t doing its job and you’ve got orphan notes nothing links to.

Lint. This is the operation almost nobody builds, and it’s why most knowledge bases rot. A health-check pass over the whole wiki that flags:

  • Notes whose updated: date is older than the source they cite (stale).
  • Two synthesis notes that contradict each other.
  • Orphans — files in synthesis/ that nothing in index.md points to.
  • Any sign a source in raw/ was edited (a cardinal sin — see below).

Give each note a status the linter can set: 🟢 current, 🟡 suspect, 🔴 contradicted. Run it on a schedule, or before any session where you’re going to lean on the wiki for something that matters. A knowledge base without a linter is a knowledge base that’s quietly lying to you by month three.

The two rules that stop the rot

Everything above is structure. These two rules are what keep the structure honest.

1. Sources are immutable. raw/ is read-only to the model. The moment you let an agent “tidy up” a source, you’ve destroyed the one thing that made it ground truth — you can no longer tell what was captured from what was inferred. This sounds obvious and is violated constantly, because asking a model to summarize-in-place feels efficient. It isn’t. Summarize into a new file in synthesis/. Leave the original alone.

2. The synthesis layer is the model’s to own. The flip side. Don’t hand-curate the synthesis notes into precious artifacts you’re afraid to lose — they’re regenerable from raw/. Let the agent overwrite them, merge them, split them. The value is in the sources and the index; the synthesis is a cache. Treating it as disposable is what lets you ingest aggressively without fear.

Together these two rules draw a clean line: humans (or capture tools) own raw/, the model owns synthesis/, and maps/ is the contract between them.

Conventions that make it machine-readable

Small things, but they’re what let the linter and the agent reason about the wiki instead of just reading it:

---
updated: 2026-06-28
status: 🟢
sources:
  - ../raw/rate-limit-thread.md
---

# Retry on 429
...

Frontmatter on every synthesis note: when it was last touched, its status, and which sources it derives from. That sources list is what lets the linter catch a stale note automatically — if any source is newer than updated:, flag it. You get a self-checking knowledge base out of three lines of YAML.

Partition by domain so contexts don’t bleed

Once a wiki covers more than one area, give each domain its own index. A working setup might keep entirely separate trees for, say, infrastructure notes, a research reading pile, and a side project — each with its own index.md and log.md, each ingested and queried independently.

The reason is practical, not tidy: when the agent loads an index, you want it loading the relevant index, not a merged megafile where infrastructure trivia dilutes the research notes. Partitioning keeps each query’s context small and on-topic. One wiki, several front doors.

When to reach for a real database

This is a directory of markdown read through file globs or an MCP server. That’s the right tool from your first note to somewhere in the hundreds of pages. It’s deterministic — the agent opens the exact file you indexed, not the nearest vector — and it’s debuggable, because you can read the whole thing yourself.

Past that scale, the index stops fitting in a sensible context window and retrieval-by-table-of-contents breaks down. That’s the signal to add embeddings — and only then. I’ve written about building deterministic RAG in Phoenix with pgvector for when you get there, and about when a markdown file beats your vector database for the decision itself. The mistake is starting with the database. Start with the directory; graduate to the database when the directory tells you to.

The wiki reads itself

Karpathy’s framing — a knowledge base written for a model — lands because it changes what “good documentation” means. A good human doc is narrative and persuasive. A good agent doc is indexed, dated, sourced, and linted. It’s less a book and more a small codebase: the index is the entry point, the synthesis notes are the modules, the sources are the vendored dependencies you don’t edit, and the linter is CI.

You don’t need infrastructure to start. You need a folder, three layers, three operations, and the discipline to never edit a source. Wire the operations to commands, run the linter on a cadence, and the thing you build in an afternoon will still be telling you the truth a year from now — which is more than most knowledge bases can say. The same shift I described for CLAUDE.md as a load-bearing artifact applies here: the wiki is only worth keeping if you treat it as version-controlled, executable, and maintained — not as a place notes go to die.