Where single-agent bots stop
Most chatbots of the last decade were a long prompt with FAQs and a sprinkle of retrieval on top. That makes a friendly search engine, not a product. The moment conversations branch — booking, complaint, escalation, a follow-up a week later — the model tips over. A single agent becomes a sparkler trying to be every rule at once.
Everyone knows the symptom: a 4,000-word system prompt that still leaks at every edge. The platform question is not “how do we make the prompt better”, it's “what is the right unit of decision”.
What a graph actually buys you
A multi-agent graph splits the problem into named roles. A triage agent reads intent. A support agent has a clear vocabulary and a small set of tools. A booking agent is the only one allowed to write to the calendar. Handoffs aren't magic — they're just role changes made visible.
Handoffs, tools, and the boring parts
It gets interesting when agents get tools. A webhook tool calls a real API. An MCP server brings a whole toolbox. A knowledge-base tool searches the company, not the internet. These three categories cover 90% of real needs — as long as the framework treats them as first-class citizens, not plugin habits.
Concrete: hotel concierge at 11pm
A guest writes “we won't arrive until 23:30, can we still get dinner?”. A classic bot quotes opening hours. A graph does this: triage catches “late arrival + food”. Concierge pulls the late-night policy from the knowledge base. Booking checks via a tool whether the kitchen is still open and reserves a cold platter. The guest gets a confirmation, not a link.
The difference isn't better English. The difference is that somebody — or something — on the other end actually did something.
When you don't need it
Multi-agent is not a religion. If your conversation is a single uniform task — rewriting a text, explaining a table, answering one step in a wizard — a single agent is more honest and cheaper. Graphs earn their keep where roles differ, tools differ, or responsibility needs to differ.
- Roles feel noticeably different (triage vs. domain)
- Tools are very different per role
- Responsibility needs to stay traceable
- A linear question-answer loop
- No external systems are in the loop
What it means for how you build
If you take multi-agent seriously as a platform capability, the system needs three things: a graph description humans can read; a tool registry that doesn't hide in the prompt; and an execution layer that knows which agent is speaking right now. Everything else is cosmetics — important, but replaceable.
ChatFlow is built on exactly that. Not because “multi-agent” is a nice phrase, but because it is the only shape in which agent products age in production without turning into prompt graveyards.
Appendix: what a tool actually looks like
One last look at the surface. A tool is not a prompt incantation, it's a contract. Here is how one is declared in ChatFlow:
@register_builtin("search_kb")
class SearchKnowledgeBase(FunctionTool):
"""Search the tenant knowledge base with contextual retrieval."""
async def __call__(self, query: str, top_k: int = 8) -> list[Chunk]:
results = await retrieval_service.search(
tenant_id=ctx.tenant_id,
query=query,
top_k=top_k,
)
return results