Nadella Is Right About AI and the Firm. Mostly.

Jared Smith June 15, 2026 13 min read AI engineering leadership founders

Satya Nadella's 'token capital' framing is right that AI amplifies human judgment — but it's enterprise advice that skips the small teams who feel it first.

TL;DR: Satya Nadella just published the clearest case I’ve seen from a major platform CEO that AI amplifies human judgment instead of replacing it — and it confirms what I’ve argued on this site for a while. His “human capital plus token capital” framing is the right map. But it’s drawn from the 30,000-foot enterprise chair, and three things get glossed that matter most to the people doing the work and the small teams who feel this shift first. The learning loop doesn’t translate cleanly to a four-person team, “amplified judgment” reads very differently from the employee’s seat, and “token capital” is an idea that happens to be extremely good for the company selling the tokens. None of that makes him wrong. It just makes the essay incomplete for the readers I write for.

In a June 2026 post on X, Satya Nadella, Microsoft’s CEO, argued that we’re at a genuinely new kind of platform shift. Not another tool to bolt onto the org chart, but something stranger: “the first time we can create a real cognitive loop between people and digital systems.” He’s right, and I want to start there, because I think the essay is mostly correct and worth taking seriously rather than dunking on.

He’s confirming the thing I’ve been saying: the human is the point

The center of Nadella’s argument is a distinction. Every company, he writes, “is going to have to build what I think of as human capital and token capital.” Human capital is “the knowledge, judgment, relationships, ingenuity, and pattern recognition of its people.” Token capital is “the firm’s AI capability it builds and owns.”

And then the line that matters: “human capital does not become less valuable as token capital grows. It only becomes more valuable!”

If you’ve read much of what I write here, that should sound familiar. The entire premise of AI won’t shrink your team is that the binding constraint on a software business was never typing speed — it was judgment, taste, and the ability to decide what’s worth building. Make the doing cheaper and you don’t need fewer people exercising judgment. You need more surface area for that judgment to act on. Nadella’s version — “Without human direction, you have compute running in circles” — is the same claim in a sharper sentence. I’ll happily steal it.

There’s a second place where the essay lands almost exactly on an argument I’ve made. Nadella says the real work is building “a learning loop on top of models,” and that “private evals should capture whether a model is actually improving against outcomes that matter to the business (not just external benchmarks!).”

That parenthetical is the whole game. I made this case already in your AI product needs telemetry before a better model: the benchmark leaderboard is not your business, and chasing the next model release before you can measure whether the current one helps your users is backwards. You cannot improve against “outcomes that matter” if you never instrumented the outcomes. Nadella, from the top of the company that sells the models, is telling you the model is not the differentiator. Believe him on that one. It’s true, and it costs him something to say it.

And the best line in the whole essay is one I wish I’d written: “You can offload a task, or even a job, but you can never offload your learning.” That is durable, it’s scale-free, and I’ll come back to it at the end because it’s the part worth keeping.

So: credit where it’s due. As a piece of strategy for a large enterprise, this is a good essay. Now let me sharpen it, because there are three things the enterprise view misses, and they’re the things my readers feel first.

What the enterprise view misses

1. “Private RL environments and a hill-climbing machine” is a Fortune-500 program, not a startup plan

Here’s where the altitude shows. Nadella describes the learning loop concretely, and it’s a large-company concrete: “Private reinforcement learning environments should let models grow stronger on real traces from inside the organization.” There’s a “knowledge base” that “makes institutional memory queryable.” He calls the whole thing “a hill climbing machine” that “compounds.”

That is a program. It assumes a platform team, a data org, an ML function, and enough proprietary trace volume that reinforcement learning on internal data is even a coherent idea. A four-person pre-Series-A team has none of that, and shouldn’t. If you’re seed-stage and you spin up a private RL environment, you have made a serious mistake about what your company is for.

So the honest question — the one the essay doesn’t answer because it isn’t written for you — is: what does the learning loop actually look like at startup scale? It’s a real question, and it has a real answer. It just isn’t infrastructure.

At small-team scale, the learning loop is mostly discipline and encoded convention:

Your “private RL environment” is a CLAUDE.md-style conventions file (or a cursorrules, or an AGENTS.md — whatever your tools read). It’s the place where you write down, in plain language, how this codebase does things: the patterns you’ve settled on, the mistakes you keep correcting, the boundaries the agent keeps crossing. Every time you fix the same thing twice, it goes in the file. That is the hill-climbing machine for a small team — a flat file that gets a little smarter every week.
Your “private evals” are: does it pass review, and does it pass the tests? You don’t need a benchmark harness. You need the existing gate — code review, CI, a human who knows the product — to be the eval. The signal Nadella wants (“outcomes that matter, not external benchmarks”) is already sitting in your PR queue. The discipline is treating it as signal: noticing which kinds of work the agent gets right unsupervised and which it doesn’t, and routing accordingly. I wrote the long version of this in when to trust an agent and when to step in.
Your “queryable institutional memory” is docs, decision records, and good prompts. The judgment that lives in your founding engineer’s head is the asset. Capturing even a fraction of it — why you chose this database, why you rejected that abstraction, what “done” means here — into text the agent can read is the entire move. It’s unglamorous. It’s also tractable on a Tuesday afternoon, which the RL environment is not.
And the part everyone skips: actually measure the outcome. Prove the return is a whole essay on this, so I’ll be brief — the loop only compounds if you close it. Did the thing you shipped move the number you cared about? If you never check, you don’t have a learning loop. You have a faster way to ship things you can’t evaluate.

This is the constructive heart of the critique. Nadella’s loop is correct in shape and wrong in scale for most of the people reading this. The scaled-down version isn’t a watered-down RL pipeline. It’s a different and frankly more achievable thing: encoded conventions, the review gate as your eval, captured judgment in docs and prompts, and the habit of measuring results. If you run a small team, that’s your token capital. Start there. The same logic applies whether you’re four people or fourteen — see managing a four-person engineering team for how thin the process layer can be and still work.

2. From the worker’s chair, “your expertise becomes replicable in systems” is a different sentence

Read the second-to-last paragraph closely. Nadella writes that “Employees will see their expertise amplified and their judgment become part of systems that make it replicable and scalable.”

From the firm’s chair, that’s a promise: your people get amplified. From the employee’s chair, “my judgment becomes part of a system that makes it replicable” can read as: I am encoding myself into the thing that makes me optional. Those are the same sentence said from two chairs, and the essay only sits in one of them.

I want to be careful here, because the doomer version of this point is lazy and I don’t believe it. My consistent position on this site is that AI changes the job and expands capacity — it doesn’t, on net, delete the worker. I still think that’s right. The founding engineer who encodes her judgment into a CLAUDE.md and a set of evals doesn’t make herself redundant; she makes herself the person who owns the loop, which is more leverage, not less. Nadella’s “amplified” is genuinely available.

But “stable equilibrium” — his closing words — is doing a lot of work to paper over a transition cost, and somebody pays it. The senior engineer whose tacit knowledge gets captured this year is in a strong position. The mid-level engineer whose job was executing well-specified tasks — the work that’s now most automatable — is the one absorbing the change, and “your expertise will be amplified” is cold comfort if your expertise was mostly execution. The honest framing isn’t “everyone wins in equilibrium.” It’s: the value of judgment goes up, the value of pure execution goes down, and the transition between those two states is not free, and it’s not evenly distributed.

For a founder, this isn’t a reason for guilt — it’s a reason for clarity. If you’re building the loop Nadella describes, even the small-team version, you are changing what your people’s jobs are. Say so. Tell your team the work is shifting from doing to directing and verifying, and that you’re going to invest in getting them to the judgment side of that line rather than leaving them on the execution side as it erodes. That’s the difference between amplification and quietly training your own replacement, and it’s a choice the firm makes, not a law of physics. I’ve written more on what that workflow shift actually feels like in AI-assisted engineering is a new workflow.

3. Follow the incentives: “token capital” is a conveniently platform-serving idea

Now the uncomfortable one. “Token capital” is a good frame. It’s also an idea that is spectacularly convenient for a company whose business is renting you tokens.

Walk the logic. If every firm in every sector becomes convinced it must build proprietary “token capital” — private evals, private RL environments, queryable knowledge bases, a compounding learning loop — then every firm in every sector becomes a heavier, stickier, longer-term consumer of model inference and cloud infrastructure. Whose cloud? Well. The essay is, among other things, a beautifully argued reason to spend more on the platform that published it.

I want to be precise, because the cynical version of this point is as lazy as the doomer version of the last one. The argument can be simultaneously true and self-serving. Those aren’t in tension. It is genuinely good advice for a firm to own its learning loop rather than cede all its value to a handful of foundation models — Nadella even makes the macro case himself, warning against “a world where every company across every sector is ceding value to a few models that eat everything they see,” and invoking the way “entire industrial economies were hollowed out by outsourcing.” I think he means it. I also think the prescription that follows from it (“build token capital, on a frontier ecosystem”) routes an enormous amount of spend toward the ecosystem he’s selling. Both things are real.

The practical takeaway for a founder isn’t “ignore him.” It’s “separate the diagnosis from the prescription, and price the prescription yourself.” The diagnosis — don’t let the model commoditize your knowledge — is sound and free to act on. The prescription — build heavy proprietary AI infrastructure — has a vendor’s thumb on it, and you should adopt only the slice that survives your own cost-benefit math. For a four-person team, that slice is the cheap, durable stuff in critique #1, not a six-figure cloud commitment.

There’s an adjacent, larger critique floating around right now about the “loopification” of AI financing — the circular arrangements where model providers, cloud providers, and chipmakers fund and buy from each other in ways that can make demand look more organic than it is. I’m not going to claim Nadella’s essay is about that; it isn’t, and stretching it that far would be the kind of overreach I’m criticizing. But it’s worth keeping in your peripheral vision: when the entire supply chain has a stake in convincing you that you must accumulate token capital, treat “you must accumulate token capital” as a claim to verify, not a given.

The version worth keeping

Strip the enterprise scaffolding away and Nadella’s best line stands on its own at any scale: “You can offload a task, or even a job, but you can never offload your learning.”

That’s the whole thing. The model is rented and commoditizing by the month. What compounds — what is actually yours — is the judgment your team accumulates about your problem, your users, your codebase, and the discipline of encoding that judgment somewhere the agents can use it. The job shifts from doing to directing and verifying. The moat is compounding judgment, not the model. Everything else in the essay is implementation detail, and most of the implementation detail is sized for a company a thousand times larger than yours.

What a founder or small team can do Monday

You don’t need a platform team to start the loop. You need four habits:

Write the conventions down. Start a CLAUDE.md (or equivalent) today. Every time you correct the agent on the same thing twice, the correction goes in the file. This is your scaled-down “hill-climbing machine,” and it costs nothing.
Make your review gate the eval. Stop reaching for the next model release. Treat your existing PR review and tests as the measure of whether the AI is actually helping, and pay attention to which tasks it gets right unsupervised. Route work accordingly.
Capture judgment, not just code. Spend an hour a week turning what’s in your senior people’s heads — why this choice, what “done” means, which mistakes recur — into text. That’s your queryable institutional memory, no vector database required.
Close the loop by measuring the outcome. Pick the one number a change was supposed to move, and check it. A learning loop you never measure is just a faster way to ship things you can’t evaluate.

And one thing to do with your team, not your tooling: say out loud that the work is moving from execution to judgment, and that you’re going to help everyone get to the judgment side of that line. Nadella calls the end state a “stable equilibrium.” Maybe — but equilibrium is something a firm builds deliberately, by deciding who it carries through the transition. That’s a leadership choice, not an emergent property of the technology. Make it on purpose.

Nadella’s right about the map. Just remember it was drawn from an altitude where you can’t see the four-person teams who feel the weather first.

Postscript — July 23, 2026

Five weeks after this essay, Nadella published a follow-up on X that makes the routing half of this argument explicit, and first-party. Microsoft is now sending traffic across GitHub Copilot, Excel, and Outlook to its own MAI models wherever they match or beat the frontier alternatives, keeping OpenAI and Anthropic models in the orchestration for the frontier work. And he names the design goal outright: your evals “should continue to hill climb even when any given model has been removed” — with the harness, memory, context, and skills deliberately externalized outside the model.

That is habit 2 above, stated as strategy by the largest software company on earth. The small-team version still needs none of the enterprise machinery: the conventions file is your externalized harness, the test suite is your eval, and a one-page task-to-model table is your router. The one thing worth adding since June is the motive inversion. For Microsoft, model independence is offense — route the traffic, keep the margin. For you it’s defense: when a model you depend on gets deprecated, repriced, or quietly degraded, everything you encoded outside it is what keeps compounding.

All writing