Prompt Skills, Not Claude: Four Rules from Anthropic's Engineers

Jared Smith May 26, 2026 10 min read AI agents AI tools

Most engineers prompt Claude one sentence at a time. Anthropic's own engineers don't — they prompt skills. Four rules from their recent talks, with the operator nuance the talks left out.

I sat through the most recent Anthropic engineering talks last week and walked away with an uncomfortable observation. Most of the engineers I work with are still prompting Claude one sentence at a time. The people who actually built Claude Code aren't. They're prompting skills — and the difference between those two patterns is the difference between a useful tool and a compounding one.

Four rules emerged from listening to Barry, Eric, and the rest of the Anthropic engineering team talk about how they actually work. None of them are clever. All of them are the kind of operational discipline that separates engineers who get real leverage from agents from engineers who burn tokens generating the same script every Tuesday morning.

Here's the breakdown — and where my own practice has confirmed each rule, plus the one or two places it needs more nuance than the talks gave it.

Why prompt-engineering moved from the chat to the folder

The first rule is also the mental shift everything else hangs on: prompt skills, not Claude. When you write a prompt in the chat, it vanishes the moment you close the tab. When you write a skill, it persists — and the next time you face a task that fits it, Claude reaches for it automatically.

This sounds obvious until you watch your own behavior. I caught myself last month rewriting the same "draft this in my voice, here's the lede pattern I use, here's the close pattern, here's how 'Read this next' works" preamble three times in one week before I admitted I was being lazy and codified it as a skill. Now I just type /blog-post draft 1,800 words on AI agent supervision and Claude reaches for the skill automatically — no preamble, no re-explaining the voice constraints, no token spend on the warm-up.

The framing the Anthropic team uses is a three-layer stack: models at the bottom (the AI itself), agents and prompts in the middle (how most people interact today), and skills at the top (the application layer where leverage actually lives). The takeaway: stop writing custom one-shot prompts every time. Identify the pattern, name it, and turn it into a skill.

Skills are a contract, not a folder

The Anthropic team describes skills as "folders that package composable procedural knowledge for agents." That's technically correct and also a meaningful undersell. Calling a skill a folder is like calling a function a file. Sure — but it misses the point of why it works.

A skill is a contract. You're declaring: this task is stable enough that I'm willing to codify how I want it done. The folder is just the storage mechanism. The contract has three parts, and most engineers only write the first one:

Description. What Claude reads to decide whether to invoke the skill at all. If your description is vague — "writing helper" — Claude won't reliably reach for it. If it's specific — "draft a code review comment in my voice, focused on security and naming, never approve without reading every line" — Claude will invoke it on the right tasks without you having to call it explicitly.
Instructions. The step-by-step process Claude follows once the skill is invoked. This is the playbook. Most people stop here.
Tools. The scripts, API calls, and reference files the skill can use. This is the layer where leverage lives, and it's the layer most people skip entirely.

Eric from the Anthropic team made the funniest point in the talks: people put enormous effort into the prompt portion and then write the tools as if it were 1998 — bare, undocumented, parameters named a and b, no examples. They obsess over the conversation Claude has with itself and ignore the actual hands they're giving it.

My own version of this lesson came when I built /seo-audit — the skill that runs a full-site SEO audit, the same one I ran on this site recently. The instructions weren't the hard part; "score the site against E-E-A-T, validate schema, find over-length meta descriptions" is a few paragraphs. The hard part was the tools the skill calls. One that hits the live URL and pulls every <script type="application/ld+json"> block. One that validates each block against the Schema.org spec. One that diffs the current sitemap against an earlier baseline. One that runs a Playwright capture to surface above-the-fold rendering issues. Without those tools, the skill is a checklist that says "look for problems." With them, it's a system that returns a 0–100 health score per category, a per-page issue list, and a prioritized action plan in under ninety seconds.

Composability is the move — but you're now maintaining a library

The third rule from Anthropic: build composable skills, not custom skills. Small, focused, reusable units that Claude can chain together as needed, rather than one monster skill that does everything.

I learned this the expensive way. My first attempt at codifying my content and growth work was a single sprawling "do all my marketing" skill — blog post drafting, LinkedIn post drafting, SEO audits, competitor research, email sequencing, all inside one folder. One skill, a million possibilities. The moment I tried to update how the skill handled lede formatting for long-form posts, I broke how it handled the LinkedIn opening hooks, and I didn't notice for a week. The Anthropic team's framing is right: composability gives you three things.

Failures are scoped. When a focused skill breaks, you know exactly which one. When a giant skill misbehaves, you don't even know which of its ten jobs went sideways.
Improvements compound. Sharpening one small skill upgrades every workflow that calls it. Sharpening a giant skill upgrades nothing reliably, because the change might fix one of its jobs and silently degrade another.
Reuse becomes free. A focused /product-research can plug into a blog-post workflow today (gather competitor positioning before I draft the essay) and a LinkedIn workflow tomorrow (find the three contrarian threads on this topic) without modification.

That last point is the unlock. My current library is four focused skills — /blog-post, /linkedin-post, /seo-audit, and /product-research — and they call each other constantly. /blog-post invokes /product-research before it drafts. /seo-audit invokes /product-research when it surfaces a missing competitor mention. /linkedin-post pulls from the most recent /blog-post draft when I'm threading. None of those workflows existed when each skill was a standalone monster.

But here's the honest tradeoff the talks didn't quite name. Once you commit to composability, you're not maintaining a prompt. You're maintaining a library. Libraries have versioning concerns, dependency concerns, naming conventions that have to stay coherent across the team, and a curation cost that grows with size. I don't see this as a reason not to do it — composability is clearly the right call past about five skills — but I do see it as the moment to stop treating skill creation as a casual side activity and start treating it like a small codebase. Which it is.

The two flags most engineers don't know about

Buried inside the skill spec are two configuration flags that Anthropic shipped specifically because senior teams asked for them. Almost nobody I work with knows they exist. They're the right tool when you need to control who can invoke what.

user_invocable: false hides the skill from the slash menu — meaning you can't invoke it directly, only Claude can. This sounds backwards until you realize it's exactly what you want for utility skills that exist to support other skills, not to be called from the chat. /product-research in my library is the canonical example. I never want to invoke it by hand — when I have a research question I just ask the question directly. It only earns its keep as a subroutine that /blog-post and /linkedin-post reach for automatically before they start drafting. Setting user_invocable: false removes it from my slash menu so I don't accidentally summon it standalone.

disable_model_invocation: true does the opposite. Only the human can run the skill; the model can't reach for it. This is the rule for skills with real-world consequences, but also for voice-sensitive ones. Both /linkedin-post and /blog-post in my library have this flag set. The model is excellent at drafting in my voice when I ask. It's also occasionally over-eager to volunteer a draft mid-conversation when I haven't asked for one. The flag closes that loop: drafts only happen when I explicitly invoke. Voice integrity stays intact — which matters when the byline on the output is mine, not Claude's.

And the meta-pattern under both flags: save scripts inside skills. When Claude writes a piece of code that you'd otherwise re-derive every session — a JSON-LD validator, a domain availability check, a CSV-to-Markdown transformer, anything — drop it in the skill folder and reference it as a tool. You're trading nondeterministic AI tokens for deterministic code compute. The tradeoff is essentially always worth it: code is cheaper, faster, and reproducible. Tokens are none of those things.

Skills only compound if you actually update them

The fourth rule is the one the Anthropic team is most evangelical about: skills get smarter every session. The framing is right and I want to push it harder, because in practice this is where almost every skill library I've audited goes wrong.

The pitch is simple: every time you run a skill and the output isn't quite what you wanted, you have two choices. You can edit the output for this one task and move on. Or you can ask why the output was off and update the skill so the same drift doesn't happen next time. Eric's framing: "Claude on day 30 of working with you is going to be a lot better than Claude on day one."

The catch — and the rule I want to add to the list — is that this compounding effect only happens if you actually do the update step. Most teams don't. They run the skill, fix the output by hand, ship the thing, and move on. The skill stays the same shape it had on day one, the same drift happens next time, and the supposedly-compounding system is just an expensive prompt with persistence.

The discipline I've landed on: when I close out a session that involved meaningful editing of a skill's output, I ask one question before I move on. Was this a one-time fix or should it live in the skill? If it's one-time, fine, ship it. If it's a pattern — a tone correction, an edge case I keep forgetting, a constraint I never wrote down — I update the skill before I close the tab. It takes about ninety seconds. It's the difference between a skill library that compounds and one that rots.

What to do this week

If you've been treating Claude as a chat interface, three concrete steps move you up the stack fast:

Audit your last ten chats. What did you ask Claude to do that you'd want done the same way next time? Anything that repeats is a candidate for a skill. Anything that doesn't probably isn't.
Pick one and write the contract. Description first (the harder part), then instructions, then identify which tools the skill needs. Don't skip the tools — that's where the leverage is.
Run it three times, edit the skill twice. The first run will surface the obvious gaps; the second will surface the subtle ones; by the third run the skill should mostly stay out of your way.

If you're scoping AI engineering practice for a pre-Series-A team and want a second pair of eyes on the skill library, the autonomy boundaries, or the supervisory layer that turns skills into something you can hand to junior engineers without losing your weekends — that's the kind of work I do as a fractional engineering lead. The discipline above isn't optional once your agent surface area gets past a handful of tasks; it's the difference between an AI-augmented team and one that just spends more on tokens.