← Back to writing

My Daily Agentic AI Workflow

A walkthrough of how I run 4–7 agent sessions in parallel through a normal engineering day. Morning background tasks, mid-morning pair programming, afternoon reviews, end-of-day ops. The interaction modes that work, the handoff protocol, and the trap that makes most agent workflows produce slop.

I run four to seven agent sessions in parallel through a normal engineering day. Here's what they do, what they don't, and how I keep the work coherent.

The defining shift in engineering work over the past two years isn't that AI writes code faster. It's that you can have multiple agents working on multiple things at the same time, and your job moves from "writing code" to "directing work." This is qualitatively different from autocomplete, copilot-style assistance, or any prior way of using AI in development.

What follows is a walkthrough of a typical engineering day for me as of late 2025, running on Claude Code as the primary tool, OpenAI Codex for shell-shaped tasks, and a few custom agents wired into Slack and the command line. The point isn't the tools — pick whichever you like. The point is the patterns.

The core idea

Agents take work off your plate but still need you in the loop. The trap most teams fall into: treating agents as fire-and-forget background workers. The result is generated code that compiles, looks reasonable, and is subtly wrong in ways the agent has no way to detect on its own.

The mental model I use: an agent is a fast, talented junior engineer with infinite patience and zero context outside what you give them. Your job is to give them the context, scope the work tightly, and review the output before it goes anywhere production-shaped.

Walkthrough of a typical day

Morning: 2 background agents on long tasks

Before I sit down at my desk, I usually have two agent sessions running. The pattern: tasks that take a long time, don't need real-time feedback, and have well-defined success criteria.

Examples from a recent week:

  • "Audit our codebase for places we're calling external APIs without retry logic, and produce a markdown report with recommendations."
  • "Read the last fifty PRs merged to main, identify recurring code-review feedback themes, and write an internal style-guide draft based on them."
  • "Generate test cases for the authentication module covering all edge cases listed in the threat model document at docs/auth-threats.md."

I kick these off, walk away, and come back to a draft in 30–60 minutes. Critically, the output is always a draft. I read it, I edit it, I push the parts I trust into the codebase. The agent never commits unsupervised.

Mid-morning: foreground pair-programming with one agent

This is the bulk of my actual coding. I open a fresh agent session for the hardest problem of the day and we work it together — me driving, the agent acting as a peer. The interaction is conversational, not delegating.

Concretely: I describe the problem, the agent asks clarifying questions, we sketch an approach, I write some code, the agent reviews, I push back on suggestions I don't like, we iterate. By the time the function is committed, both of us have looked at every line.

The mistake to avoid in this mode: letting the agent write the code while you watch. That's still autocomplete, just slightly fancier. The point of pair-programming with an agent is that you are still doing the engineering — the agent is checking your work, raising things you might miss, and accelerating the parts that don't require taste.

Afternoon: review-mode agents on PRs

By afternoon I've usually written or merged some code, and I have PRs to review (mine and the team's). I run a review-mode agent on each PR before I read it myself.

The instruction is consistent: "Review this diff like an adversarial senior engineer who hates my work. Find bugs, race conditions, security issues, and unclear naming. Don't be polite." The agent produces a list. I read the list, dismiss the noise (typically 60–70%), and the rest becomes my review comments — credited to me, of course, but with the agent doing the first pass.

The leverage here is significant. The agent catches a real bug or smell about 30% of the time. The other 70% is dismissable noise that I'd have generated mentally anyway. Net: I write better PR reviews in less time, and my human reviewers catch things they otherwise wouldn't have.

End-of-day: ops agents on deploy + summary

Late in the day, two more agents come into play.

The deploy agent: a custom Slack bot wired to our deployment pipeline. I tell it "deploy main to staging" or "deploy 4f3a2b1 to production behind feature flag ai_v2" and it executes the relevant commands, watches the deploy, and reports back. It does not, ever, have permission to do production deploys without a confirmation. But staging deploys, log queries, and rollback prep — yes, autonomously.

The summary agent: at end-of-day it reads the day's commits, the day's PR comments, the day's Slack threads in our team channel, and produces a one-paragraph "what happened today" summary. Useful for me; useful for async teammates; surprisingly useful when I come back on Monday to remember what we were working on Friday.

The three interaction modes

Boil all of the above down and there are really three modes I use agents in. Each one has a different signature.

  • Delegate. Long-running task, well-defined output, light supervision. Background mode. The success criterion is whether the deliverable is useful when I come back to it.
  • Collaborate. Real-time pair programming. The success criterion is whether the code I commit at the end is meaningfully better than what I'd have written alone.
  • Verify. Adversarial review of my work or the team's. The success criterion is whether real bugs get caught before they ship.

The mistake teams most often make is using the wrong mode. Trying to delegate something that actually needs collaboration produces unusable code. Trying to collaborate when verify-mode is what's needed produces echo-chamber agreement instead of real review. Pick the mode deliberately.

The handoff protocol

The single discipline that keeps multi-agent work coherent: explicit handoff context between sessions. When one agent's work feeds into another's, you don't trust them to figure it out. You write a one-paragraph context dump and paste it into the next session.

For example: morning audit agent produces a list of 14 places where retry logic is missing. I review the list, decide which 6 are worth fixing, and write a paragraph: "We're going to add retry logic to these 6 functions: [list]. Use exponential backoff with jitter, max 3 retries, log each retry at warn level. Match the pattern in lib/external/retry.ex." That paragraph goes into a fresh agent session for the implementation work.

The agent doesn't see the original audit. It sees the curated context. This is the core of working with agents at scale: you are the context router, deciding what each session needs to know.

The trap that produces slop

Most teams who report disappointing results from agentic AI are running into the same failure mode: agents that look productive but produce slop. The signature: lots of code is committed, the team feels productive, and three weeks later the production codebase is a mess of subtly broken patterns nobody can fully explain.

The cause is almost always one of three:

  1. No review discipline. Agent-generated code is going into the repo without a human pass.
  2. Mode mixing. Delegate-mode work being treated as collaborate-mode by the team, so nobody is closely engaged with the output.
  3. Context starvation. Agents being asked to do work without enough context to do it well, producing plausible-but-wrong code.

All three are solvable. None of them are solved by "use a better model." They're solved by team-level discipline about how AI work enters the codebase. Without that discipline, more agents produces more slop. With it, the throughput gain is real and durable.

The takeaway

Agentic AI is a force multiplier in engineering when treated as a workflow change rather than a tool swap. Four to seven sessions in parallel sounds like a lot until you recognize that most of them are running asynchronously while you do other work — and that your role across all of them is the same: provide context, scope tightly, review carefully.

The teams shipping 40–55% faster aren't typing more. They're directing more. That's the new bar. Most engineers will get there in 2026. The ones who get there first will have a meaningful, compounding advantage for the next two or three years before everyone catches up.

Tooling and cost

The economics of running 4–7 agent sessions a day are easy to get wrong. A few practical notes:

  • Pay for the paid tier. Free-tier rate limits will produce flow-state interruptions multiple times a day. The $200–600 per month for unlimited Claude Code, ChatGPT Pro, and Codex usage is the highest-ROI line item on your engineering bill at this stage.
  • Don't run the same task in two tools "for comparison." Pick one. Comparison runs sound disciplined and produce cognitive overhead that erodes the throughput gain.
  • Track cost per outcome, not cost per session. A $4 agent session that produces a working feature is cheaper than a $0.40 session that produces noise. Most teams track the wrong number.
  • Agent context is the expensive resource, not tokens. Spend more time on writing good prompts and feeding the right files. Don't optimize for shorter prompts; optimize for clearer ones.

One closing note on team adoption. The workflow above is what I run as an individual. Scaling it to a team adds a coordination problem — multiple engineers spawning agent sessions in shared codebases, occasionally producing conflicting changes if not careful. The pattern that's worked: agents work on isolated branches, never main, and the human engineer is the one merging up. Treat agent sessions like junior engineers' branches, with the same review discipline.