Ruflo (formerly Claude Flow): An Honest Deep Dive on the Multi-Agent Orchestration Platform
Ruflo (formerly Claude Flow) is a hive-mind orchestration layer for Claude Code and friends. 45,000+ GitHub stars, 700,000+ npm downloads, three queen-types...
TL;DR: Ruflo (formerly Claude Flow) is a hive-mind orchestration layer for Claude Code and friends. 45,000+ GitHub stars, 700,000+ npm downloads, three queen-types coordinating eight worker-types via Raft, Byzantine, and Gossip consensus, cross-session memory backed by HNSW vector search, multi-provider routing across Claude, GPT, Gemini, Cohere, and Ollama, and Ed25519-signed release manifests verified by
ruflo verify. It rewards teams hitting the single-agent ceiling. It punishes anyone who installs it on a Tuesday afternoon expecting a drop-in upgrade.
I dispatched a Ruflo swarm earlier today against a Phoenix codebase I work on. Four agents, in parallel: one auditing security, one chasing performance regressions, one finding test gaps, one walking SEO and accessibility. While they worked, I made coffee. By the time I sat back down, the security agent had flagged a CSP report endpoint I'd just wired up the previous evening (it wanted me to harden a specific path), the performance agent had found a query that was hot in :observer but I'd been ignoring, and the test-gap agent had drafted half a dozen ExUnit cases against the LiveView I shipped last week.
That's the pitch for multi-agent. Not "agents are smarter." Agents in parallel are a team. Ruflo is the most serious attempt I've used at making that team coordinate instead of just run concurrently.
This post is the honest version: what's real, what's marketing, what hurt to set up, and whether you should bother. If you want the broader landscape (skills, MCP servers, the rest of the agent-tooling ecosystem), that lives in our Claude Code resource roundup. This post stays focused on Ruflo.
The single-agent ceiling
Vanilla Claude Code is exceptional at one thing at a time. The ceiling shows up the moment your task is genuinely fan-out shaped.
Three flavors of "I've outgrown this":
- Context exhaustion on real codebases. A 200-file refactor doesn't fit. You end up summarizing, re-summarizing, and re-feeding the agent the parts of the codebase it forgot. Each round drops fidelity.
- Sequential bottlenecks. Frontend changes, backend changes, migration scripts, and tests are independent in the dependency graph but the agent does them serially because that's what a single conversation is.
- No cross-session memory. Every new session starts from zero. The agent re-learns your project's conventions, your naming, your weird in-house DSL, every single time.
The git-worktrees-plus-tmux pattern (and tools like Claude Squad) solves the parallelization problem. Three terminals, three independent agents, three branches. Beautiful for trivially parallel work. But those agents don't talk. They can't reconcile when their changes overlap. They have no shared notion of "what's the architectural decision we're collectively making here."
That's the orchestration gap. Parallelization is N agents going faster. Coordination is N agents converging on a shared answer. Ruflo is built for the second problem.
What Ruflo actually is
Ruflo is the rebranded V3 of Claude Flow, the project Reuven Cohen (rUv) and the Agentics Foundation started in mid-2025. The repo at github.com/ruvnet/ruflo was created on June 2, 2025, and the first public npm release of claude-flow@1.0.0 shipped June 10, 2025. The rebrand to Ruflo was announced January 15, 2026 (issue #945) and the user-facing rename shipped under the Ruflo name in v3.5.x in March 2026.
Why rebrand? The honest reason, from rUv's own announcement and the dev.to coverage, is that the project had outgrown its name. "Claude Flow" implied Claude-only. Ruflo routes across Claude, GPT, Gemini, Cohere, and Ollama. The branding was actively misleading. There's a community theory that Anthropic trademark friction forced the move; I haven't seen primary-source evidence for that, so I'm setting it aside.
What you actually install is a TypeScript CLI on top of a Rust engine. The WASM kernel and the ruvector HNSW store are Rust. The agent coordination, the MCP server, and the plugin loader are Node. Three install paths exist, all live, all publishing in lockstep at v3.7.0-alpha.11 with v3.6.30 as the current stable. The packages are claude-flow, ruflo, and @claude-flow/cli, and they all resolve.
Numbers as of today (May 7, 2026):
- 45,956 GitHub stars on
ruvnet/ruflo, with 5,083 forks. - 710,854 lifetime npm downloads of the
claude-flowpackage, with 58,803 in the last 30 days. - Latest stable: v3.6.30 (May 5, 2026). The 3.7 alpha line ships almost daily; today's
latestdist-tag is3.7.0-alpha.11.
Stars and downloads aren't quality signals on their own, but the trajectory matters here. The project has roughly doubled its stargazer count since the rebrand and is shipping faster than most projects in the agent-tooling ecosystem.
The hive-mind architecture
This is where Ruflo earns its keep, and where the design choices look distributed-systems-shaped rather than influencer-shaped.
There are three queen types in the hive-mind skill (.claude/skills/hive-mind-advanced/SKILL.md):
- Strategic queens plan high-level objectives, choose topology, and assign roles.
- Tactical queens drive execution, resolve conflicts between agents, and manage dependencies between tasks.
- Adaptive queens monitor the swarm in flight and reconfigure topology if work isn't converging.
Underneath the queens, eight worker types do the actual work: Researcher, Coder, Analyst, Tester, Architect, Reviewer, Optimizer, Documenter. Note that there is no "security" worker type; security gets its own dedicated plugins (ruflo-security-audit and ruflo-aidefence), which is the right call since security review has very different prompt and tool surface than coding.
When agents disagree (and they will), Ruflo runs consensus. The repo documents Raft, Byzantine fault-tolerant, Gossip, plus two more (CRDT and Quorum) for different swarm sizes and trust models. Raft has a real implementation at v3/@claude-flow/swarm/src/consensus/raft.ts (not a wrapper around someone else's library). The point isn't that you ever read the consensus log. The point is that a five-agent swarm can vote on which approach to a refactor to take and converge instead of one agent silently winning because it spoke last.
If you've spent any time on real distributed systems, this is recognizable. If you haven't, the load-bearing claim is: there's no single point of hallucination. A hallucinating agent gets outvoted.
Here's the request flow at a level where it's actually useful:
+----------------------------------+
| User |
| (claude code / cli / ide / cron)|
+----------------+-----------------+
|
v
+----------------------------------+
| Ruflo Core |
| policy / topology / wizard |
+----------------+-----------------+
|
v
+----------------------------------+
| Provider Router |
| Claude / GPT / Gemini / Ollama |
| cheap-tier first, escalate up |
+----------------+-----------------+
|
v
+----------------------------------+
| Hive-mind Swarm |
| queens (strategic/tactical/adaptive)
| workers (8 types) |
| consensus (raft/byzantine/...) |
+----------------+-----------------+
| |
v v
+-------------------+ +---------------------+
| Memory Layer | | LLM call |
| HNSW vectors | | (cached, signed) |
| knowledge graph | +---------------------+
| ReasoningBank |
+-------------------+
The two things that matter in that diagram are the router (cheap-first routing is where the cost story lives) and the memory layer (cross-session is where the productivity story lives).
Memory and learning
This is the part most agent frameworks skip, and it's the reason I keep coming back.
Ruflo's memory layer is HNSW-backed vector storage with sub-millisecond retrieval in the project's own micro-benchmarks (the ~61µs figure from the RuVector/Postgres bridge in issue #963 is the cleanest number; it's a synthetic bench, not an end-to-end SLO). Treat it as "fast enough that retrieval isn't the bottleneck," not as a guarantee.
Layered on top:
- Knowledge graph. Entities and relationships extracted from your codebase as agents work. Useful when an agent asks "what touches this module" and the answer involves traversal, not search.
- ReasoningBank. Successful solution patterns get stored. The next time a similar task shows up, the strategic queen can prime workers with the prior pattern instead of starting blank.
- SONA learning loop. Performance signals route future tasks to the worker types and provider tiers that have done well on similar work. It's not magical. It's a multi-armed bandit with a memory.
The practical outcome is the agents stop re-learning your project every session. If you've named your background-job module the way you've named it across the codebase, the swarm picks that up after one or two sessions and stops asking. If your test pyramid is shaped a certain way, the testgen worker stops generating top-of-pyramid garbage and starts generating the kind of test you actually merge.
This is the single biggest operational win compared to vanilla Claude Code. Single-agent Claude Code with a good CLAUDE.md gets you partway. Cross-session memory gets you the rest.
Multi-provider routing and the cost story
Ruflo routes across Claude (Sonnet 4.6, Haiku 4.5), GPT, Gemini 2.5 Pro and Flash, Cohere, and Ollama. The router is what makes the cost story possible: simple edits go to a cheap tier, complex reasoning escalates to a flagship model, and a WASM-based local kernel handles trivial transformations without a network round-trip at all. Background workers using local retrieval don't burn your Claude subscription on lookups.
Now the caveat, which the project does not say loudly enough.
The project markets up to 75% API cost savings versus single-agent flagship-model usage. The mechanism is real. Cheap-tier handling for the long tail of small tasks plus WASM-local for trivial work plus local retrieval for memory is exactly how you'd architect a cost-efficient agent system. But:
- The 75% number is not benchmarked publicly. There is no methodology document, no A/B comparison, no per-task cost report I could find.
- Multi-agent orchestration adds per-agent context and coordination tokens. Naive multi-agent setups can spend more than single-agent Claude Code on the same task, because every agent is paying context overhead that a single agent paid once.
- Whether Ruflo nets out positive on cost depends entirely on your workload. A lot of small parallelizable tasks: yes, you'll save money. A small number of long context-heavy tasks: maybe not.
So: the savings are plausible, the architecture is consistent with real savings, and I would not quote 75% to my finance team. I'd run a one-week measurement on my actual workload and decide from there. The project's framing as "up to 75%" is fair. Anyone repeating it as "75%" without context is selling something.
Security model
Ruflo takes supply chain seriously, which is unusual in this corner of the ecosystem.
The flagship feature is Ed25519-signed release manifests verified by ruflo verify. Implementation lives at v3/@claude-flow/cli/src/commands/verify.ts. The CLI fetches the verification.md.json witness manifest, re-derives the Ed25519 public key from the manifest's git commit, and verifies the signature. If you've ever wondered whether the npm package you just installed corresponds to the git commit it claims to, this answers the question. Most agent frameworks ship nothing comparable.
Beyond signing, the security plugins (ruflo-security-audit, ruflo-aidefence) cover:
- Prompt injection detection on inbound content (think: README files in dependencies, pasted issue text).
- Path traversal blocking when an agent tries to read or write outside the repo root.
- Command injection guards on shell-using workers.
- PII detection for credential and secret handling.
- Anti-drift swarm config to keep long-running unattended jobs from quietly redefining the goal.
The "anti-drift" piece matters more than it sounds. If you run an overnight swarm and one agent's interpretation of the task slowly mutates over twelve hours of context turnover, you wake up to a different project than the one you went to bed on. The anti-drift config pins the original spec and surfaces deviation as a vote, not a fait accompli.
Install and first run
Three install paths. All three publish at v3.7.0-alpha.11. All three resolve as written.
# Option 1: Claude Code plugin (recommended)
/plugin marketplace add ruvnet/ruflo
/plugin install ruflo-core@ruflo
# Option 2: MCP server
claude mcp add ruflo -- npx -y @claude-flow/cli@latest
# Option 3: CLI direct
npx ruflo@latest init --wizard
I went with Option 1 today because I was already inside Claude Code and the plugin install gives you the slash commands and skills without any extra wiring. The wizard in Option 3 is the most opinionated path; it asks you about provider keys, swarm size, and persistence settings up front, and it writes a sensible ruflo.toml. Option 2 is the right answer if you already have an MCP-aware host and want Ruflo as one tool among many.
A first swarm looks like this. From inside Claude Code with the plugin installed:
/ruflo swarm-init --workers 4 --topology mesh --task "audit the auth module for security, performance, missing tests, and docs gaps"
Then watch it run with:
ruflo watch
Or, if you prefer the slash form inside Claude Code, /watch streams swarm events as they happen. It's the closest thing to htop for an agent fleet I've used.
When something goes sideways, and it will eventually, run:
ruflo doctor --fix
The doctor checks MCP connectivity, AgentDB integrity, plugin manifest signatures, and provider key availability. It mostly does the right thing. The first time I ran it, it caught a stale ~/.ruflo directory from an earlier Claude Flow install that I'd forgotten about and offered to migrate it.
Real-world use cases
Five shapes where I've seen this earn its setup overhead:
Full-stack feature build. Frontend, backend, migration, and tests in parallel, with a strategic queen reconciling the contract between them. The win isn't speed. It's that the API shape, the typescript types, the migration column names, and the test fixtures all stay in sync because there's a coordinator.
Large-codebase refactor with persistent memory. Renaming a primitive that touches 400 files isn't hard for any agent in isolation. It's hard because halfway through, the agent forgets why some occurrences were intentional exceptions. With cross-session memory and a knowledge graph, those exceptions get recorded the first time and respected on every subsequent pass.
Overnight batch jobs in daemon mode. This is what I dispatched today. Audit, optimize, testgen, and SEO/a11y agents running on a schedule against a Phoenix codebase. The cron-like scheduler in ruflo-loop-workers is real and works.
Bug investigation with competing-hypothesis agents. Spawn three workers each with a different theory of what's wrong. Vote on evidence. Drop the losing hypotheses. This is what consensus is actually for, and it's surprisingly satisfying when one agent's "it's a race condition in the cache" loses to another's "no, it's a clock skew on the staging worker" because the second agent produced a reproducer.
Standardizing TDD across a team via SPARC workflow. SPARC (Specification, Pseudocode, Architecture, Refinement, Completion) is a built-in workflow template. Pin it in ruflo.toml and every junior on the team gets the same structured approach without you policing it.
When to use Ruflo vs the alternatives
Honest table. No faux balance, no "everything is great in its own way."
| Dimension | Ruflo | Claude Squad | Vanilla Claude Code |
|---|---|---|---|
| Setup overhead | High (wizard, keys, AgentDB, plugin manifest) | Low (tmux + git worktrees) | Zero |
| Cross-session memory | Yes (HNSW + knowledge graph + ReasoningBank) | No | Per-session only (with CLAUDE.md priming) |
| Coordination model | Hive-mind with consensus across queens and workers | Independent agents on independent worktrees | Single agent, sequential |
| Cost story | Multi-provider routing, cheap-tier first, WASM-local for trivial work; savings unbenchmarked | One subscription per agent, no sharing | One subscription, predictable spend |
| When to reach for it | Coordinated multi-agent work, overnight pipelines, large refactors with memory | Trivially parallel tasks across independent branches | Single-file work, fast iteration, prototyping |
Reach for Ruflo when:
- You need agent coordination, not just parallelization.
- Cross-session memory is the bottleneck, not raw speed.
- You're running unattended or overnight workflows and need anti-drift.
- Cost optimization across providers genuinely matters and you're willing to measure it.
- You want a team-wide standard workflow (SPARC, TDD, ADR-driven).
Skip Ruflo and use Claude Squad when:
- You just need three to five agents on independent tasks.
- You want zero config.
- Shared memory isn't the constraint.
Skip both and use vanilla Claude Code when:
- You're working on a single file or a single sequential refactor.
- The setup overhead won't pay off in the time you have.
- You're prototyping and want feedback loop tightness over coordination.
The downsides
Real downsides, not faux balance. I hit all of these today.
The 3.7 line is alpha and ships daily. On the one hand, this is what active development looks like. On the other, latest on npm is 3.7.0-alpha.11 as I write this, and that means if you npx ruflo@latest you're getting alpha. Pin to v3.6.30 if you need stable. Most install instructions in the wild don't tell you to do this.
Documentation is fragmented. README, wiki, plugin READMEs, GitHub issues, dev.to posts, and the .claude/skills/*/SKILL.md files. Each one is internally accurate. Across them, the project's "current state" is a moving target. I had to read three sources to figure out the current canonical install command (it's the marketplace plugin) versus the older paths.
The learning curve is real. Queens, workers, consensus, topologies, AgentDB, ReasoningBank, SPARC, hive-mind versus swarm versus loop-workers. None of it is hard once you have the model in your head, but the model isn't a fifteen-minute read. Plan for a couple of hours of orientation before you ship anything load-bearing.
Setup overhead doesn't pay off on small projects. If your codebase is 5K lines and you're alone, the wizard is going to ask you questions you don't have answers to. Use vanilla Claude Code or Claude Squad. Come back to Ruflo when you have a team or a 50K-line codebase.
Community is active but smaller than vanilla Claude Code. When you Google an error, you'll find recent answers, but you'll find five of them, not five hundred. The Discord is helpful. rUv himself is unusually responsive in GitHub issues. But the long tail of "someone has hit this exact thing before" is shallower.
The verdict
Sublime rating: 9 out of 10.
Not 10, because the alpha cadence and the doc fragmentation make first-day setup harder than it needs to be. The friction is real and you should know that before you start.
But the wins are also real and I want to be specific about them. Today's swarm dispatched four agents against a Phoenix codebase, and the security agent caught something that I would not have caught on my own pass through the auth module today, because I would have been doing one thing at a time and security would have been on tomorrow's list. That's the actual value. Not "AI is faster," but "a coordinated team finds things that a sequential pass doesn't."
Install today if:
- You're a team scaling agent workflows past a single dev.
- You run overnight or unattended pipelines.
- You've hit the single-agent ceiling on context, on parallelism, or on memory.
- Multi-provider cost optimization is something you'd actually measure.
Wait if:
- You're solo on a small project where setup overhead won't pay off.
- You're allergic to alpha software in your toolchain.
- You haven't yet exhausted the vanilla Claude Code workflow.
Closing
Ruflo is the most serious open-source attempt I've used at making multi-agent coding actually coordinate instead of merely parallelize. The architecture is distributed-systems-shaped, the memory story is real, the security posture is unusually mature for this ecosystem, and the project ships fast. The cost-savings claim deserves skepticism until benchmarked on your workload. The alpha-cadence and doc fragmentation deserve patience.
If you want the broader ecosystem (skills, MCP servers, terminal multiplexers, the rest of the agent-tooling map), the Resource Bible covers it. If you want to follow Ruflo specifically, the repo lives at github.com/ruvnet/ruflo and rUv announces most things in GitHub issues before he announces them anywhere else.
Are you running multi-agent workflows yet? What's working, what isn't? Reply to the newsletter or open a discussion. I'm collecting real-world swarm patterns and the friction points are at least as interesting as the wins.