How I Prompt Claude as a Staff Engineer (50 Prompts I Actually Use)

May 9, 2026 7 min read AI tools developer workflows AI

Fifty prompts I use to ship production AI features, debug distributed systems, and write docs that don't rot. Code review, debugging, refactoring, system design, and PR-quality writing — with five full examples.

Prompt vault card — 50 senior engineer prompts for code review, debugging, and system design — $39 Founders Edition

I've used Claude every day for three years. I've shipped production AI features for 100K+ users at Lavender. I run code review and design at Staff level. The single biggest accelerant in my work — bigger than Copilot, bigger than Cursor's autocomplete, bigger than any cohort course — is a small library of prompts I've refined into something I trust.

Most AI prompts read like junior devs wrote them. They ask for "best practices." They get back generic answers. They produce code that compiles, tests that pass, and reviews that miss what matters.

These don't.

This post is the long version of why I built the Prompt Vault. The Vault is 50 of these prompts plus drop-in .cursorrules and CLAUDE.md starters. This post walks the shape of the prompts and includes five of them in full, so you can decide if the full pack is worth $39.

The pattern

Every prompt I trust has the same shape:

Tell the model who it is. "Act as a Staff engineer reviewing this for blast radius." Not "review this code."
Tell it what to focus on. A walkthrough or a checklist. Not "anything you notice."
Tell it what to skip. "Skip generic advice. Skip style nits unless they obscure intent."
Tell it the output format. A template, a verdict, a structured list.

The wrappers matter as much as the ask. They're what stops the model from giving you a confident, wrong, generic answer.

When I show people this pattern they sometimes push back: "isn't this just longer prompting?" Yes. And longer prompting is the difference between a tool and a team member.

Where I actually use them

The 50 prompts split into five categories:

Code Review. Blast radius, security, concurrency, error handling, API contracts, test gaps, naming, dependency boundaries, migration safety, PR descriptions.
Debugging. Hypothesis-driven, distributed traces, memory leaks, race conditions, slow queries, flaky tests, incident triage, stack traces, config drift, time zones.
Refactoring. Extracting domain models, reducing complexity, untangling god classes, value objects, making illegal states unrepresentable, eliminating flag arguments, strangler-fig migrations, decomposing long functions, async modernization, feature-flag cleanup.
System Design. Capacity planning, queue selection, multi-region, idempotency, rate limiting, distributed transactions, cache invalidation, schema migrations, webhook delivery, risky rollouts.
Docs & PRs. PR descriptions, ADRs, runbooks, API references, changelogs, postmortems, READMEs, onboarding docs, tech specs, migration notices.

The categories aren't arbitrary. They're the five surfaces where I do most of my Staff-level work and where AI assistance actually pays for itself.

Five prompts, in full

Here are five I use almost daily. One per category. Copy them. Try them. If they're useful, the Vault has 45 more plus the bonuses.

1. Blast radius analysis (code review)

Act as a Staff engineer reviewing this diff for blast radius before merge.

For each changed file, answer:
1. What systems, services, or callers depend on this code path?
2. What's the worst realistic failure mode if this ships broken?
3. Is the change reversible without a backfill, replay, or schema rollback?
4. What monitoring would I check in the first 30 minutes after deploy?

End with a single sentence verdict: SAFE TO MERGE, MERGE WITH MITIGATION,
or HOLD. If MERGE WITH MITIGATION, list the mitigations in priority order.

Diff:
<paste diff or attach file>

The single-sentence verdict is the trick. It forces the model to commit instead of giving you a wishy-washy "it depends" that wastes your review time. The four-question structure stops the model from nitpicking syntax when there's a real systems-level problem to flag.

2. Hypothesis-driven debugging

I'm debugging a problem. Reset me. Walk me through this rigorously.

Symptom (what I observe):
<describe>

What should happen:
<describe>

What I've already ruled out:
<list>

Now:
1. Generate 5 hypotheses ordered by likelihood given typical failure modes
   in this stack.
2. For each hypothesis, write the cheapest test that would falsify it.
3. Identify which test gives me the most information per minute spent.
4. If two hypotheses are entangled, propose the ordering that disentangles
   them fastest.

Don't suggest fixes yet. We're isolating, not fixing.

Most debugging time is wasted on the first plausible explanation. Forcing five hypotheses + cheapest-falsification rebuilds the scientific method when fatigue has eroded it. The "don't suggest fixes yet" line is the hardest one for the model to obey and the most important.

3. Make illegal states unrepresentable (refactoring)

This data model allows states that should never exist. Help me redesign so
the type system rejects them.

Type:
<paste>

Do this:
1. Enumerate the legal states (combinations of fields).
2. Enumerate the illegal states the current shape allows.
3. Propose a redesign — usually a sum type / discriminated union /
   tagged enum — that makes illegal states unrepresentable.
4. Show how callers change. Some will get simpler (no nil checks).
   Some will need to pattern match.

Language: <Go / Rust / TypeScript / Elixir / Ruby>
Pick the idiomatic encoding for that language.

This is the single highest-leverage refactor in static-typed code. The model knows the patterns by language and the prompt forces enumeration. I've used this to kill entire bug classes in payments code and identity flows.

4. Pick the right queue (system design)

Help me choose a queue/streaming technology for this workload.

Workload:
<describe: throughput, message size, latency, ordering needs, retention,
consumers>

Operational context:
<cloud, team experience, existing infra, budget>

Compare these options for THIS workload (not in the abstract):
- Kafka (managed: MSK, Confluent, Aiven)
- Cloud-native pub/sub (SQS, GCP Pub/Sub, Azure Service Bus)
- Broker (RabbitMQ, NATS)
- DB-backed (Postgres SKIP LOCKED, river, oban, sidekiq)

For each: fit (1-10), the one thing that makes it right or wrong here,
monthly ops cost.
End with a single recommendation and what would make you change your mind.

Queue choices get litigated forever. Forcing a per-option fit score and a single recommendation cuts the bikeshedding. The "what would change my mind" line is what separates a useful answer from a confident one.

5. Architecture Decision Record (docs)

Write an Architecture Decision Record for this decision.

Context I have:
<describe the situation, the options considered, the choice made>

Use this structure:
# ADR-NNN: <one-line decision>

## Status
Proposed | Accepted | Superseded by ADR-XXX

## Context
The forces in play. Constraints. What's NOT being decided.

## Decision
The actual choice. One paragraph. Imperative voice ("we will").

## Consequences
Positive. Negative. Neutral. All three sections, even if one is short.

## Alternatives Considered
Each alternative + the specific reason it lost.

Tone: matter-of-fact, no salesmanship. Future engineers are the reader.

ADRs are great when written, never written when needed. A constrained template makes them quick. The "Alternatives Considered" with reasons stops the "obviously the only choice" framing that erases context. Future you will thank present you.

What I learned writing 50 of these

A few patterns kept showing up while I built the pack.

The output format does as much work as the prompt. "Give me a verdict" beats "give me your thoughts." A template beats freeform. A scored comparison beats a paragraph.

"Skip" instructions are as important as "do" instructions. Telling the model what not to do — generic advice, style nits, hedging — protects you from the wasted tokens that hide the real signal.

One prompt per surface. The "ten things this prompt does" prompts perform worse than five sharp prompts each doing one thing. The Vault is structured this way.

Stack-specific is better than language-agnostic. "Postgres on a 50M row table under live write load" produces better answers than "the database." Constraints make the model smarter.

The model is fast at things engineers are slow at, and slow at things engineers are fast at. It's faster than me at enumerating boundary conditions. I'm faster than it at intuiting which path is real. The prompts are designed to lean on its strengths.

What ships in the Vault

Beyond the 50 prompts, the Vault includes:

Five .cursorrules files — one per category. Drop into any project root. They encode the constraints from the prompts as project-level defaults so you stop pasting and start operating.
Three CLAUDE.md starters — for Go HTTP services, Elixir umbrella apps, and AI agent projects. The conventions and "what I do/don't want from Claude" are pre-wired.
A Notion blueprint — the database structure I use to track which prompts I actually rely on and which ones I've adapted.
Lifetime updates — every future revision is free.

Pricing

$39 for the Founders Edition (first 50 buyers). $49 after that. No subscription, no app, no expiry. PDF + CSV + plain text files.

Get the Vault on Gumroad →

If you've already built your own prompt library you trust, this isn't for you. If you're tired of trial-and-error and want to skip to what works, it is.

Read this next

Questions? Email me at jared@sublimecoding.com. I read every note.

All writing