01

Where single-agent bots stop

Most chatbots of the last decade were a long prompt with FAQs and a sprinkle of retrieval on top. That makes a friendly search engine, not a product. The moment conversations branch — booking, complaint, escalation, a follow-up a week later — the model tips over. A single agent becomes a sparkler trying to be every rule at once.

Everyone knows the symptom: a 4,000-word system prompt that still leaks at every edge. The platform question is not “how do we make the prompt better”, it's “what is the right unit of decision”.

02

What a graph actually buys you

A multi-agent graph splits the problem into named roles. A triage agent reads intent. A support agent has a clear vocabulary and a small set of tools. A booking agent is the only one allowed to write to the calendar. Handoffs aren't magic — they're just role changes made visible.

User
Triage
Support
Booking
KB
API
Roles, tools, and responsibility are pulled apart.
03

Handoffs, tools, and the boring parts

It gets interesting when agents get tools. A webhook tool calls a real API. An MCP server brings a whole toolbox. A knowledge-base tool searches the company, not the internet. These three categories cover 90% of real needs — as long as the framework treats them as first-class citizens, not plugin habits.

04

Concrete: hotel concierge at 11pm

A guest writes “we won't arrive until 23:30, can we still get dinner?”. A classic bot quotes opening hours. A graph does this: triage catches “late arrival + food”. Concierge pulls the late-night policy from the knowledge base. Booking checks via a tool whether the kitchen is still open and reserves a cold platter. The guest gets a confirmation, not a link.

The difference isn't better English. The difference is that somebody — or something — on the other end actually did something.

05

When you don't need it

Multi-agent is not a religion. If your conversation is a single uniform task — rewriting a text, explaining a table, answering one step in a wizard — a single agent is more honest and cheaper. Graphs earn their keep where roles differ, tools differ, or responsibility needs to differ.

  • Roles feel noticeably different (triage vs. domain)
  • Tools are very different per role
  • Responsibility needs to stay traceable
  • A linear question-answer loop
  • No external systems are in the loop
06

What it means for how you build

If you take multi-agent seriously as a platform capability, the system needs three things: a graph description humans can read; a tool registry that doesn't hide in the prompt; and an execution layer that knows which agent is speaking right now. Everything else is cosmetics — important, but replaceable.

ChatFlow is built on exactly that. Not because “multi-agent” is a nice phrase, but because it is the only shape in which agent products age in production without turning into prompt graveyards.

07

Appendix: what a tool actually looks like

One last look at the surface. A tool is not a prompt incantation, it's a contract. Here is how one is declared in ChatFlow:

python
@register_builtin("search_kb")
class SearchKnowledgeBase(FunctionTool):
    """Search the tenant knowledge base with contextual retrieval."""

    async def __call__(self, query: str, top_k: int = 8) -> list[Chunk]:
        results = await retrieval_service.search(
            tenant_id=ctx.tenant_id,
            query=query,
            top_k=top_k,
        )
        return results
A tool is a contract, not a prompt trick.