ChatFlow Admin

01

The checkbox trap

When a product has been built exclusively against the OpenAI API, “self-hosting” is a euphemism for “we'll park the binary somewhere else”. The intelligent layer — the actual work — still sits in the cloud. The customer ends up with an expensive container and no real sovereignty. That isn't self-hosting, that's theatre.

Real self-hosting means: every model, every component that sees data, can be swapped for a locally running alternative without the product falling over. That's an architectural decision made up-front, not a feature retrofitted later.

02

What actually has to move

Five components see sensitive data when ChatFlow works: the embedding model, the reranker, the main LLM, ASR (voice in), TTS (voice out). For each of them we operate a self-hosted option in production.

Embeddings: open-weight model, its own micro-service, CPU-capable, GPU recommended
Reranker: open-weight model — lean in development, larger in production
LLM: anything that speaks OpenAI-compatible — common inference servers or an EU managed offering
Speech recognition: open-weight model, streamed, with voice-activity detection in front
Speech output: sentence-wise streaming — the caller hears sentence one while sentence two is produced

03

What's cheap, what's expensive

Embeddings and rerankers are affordable to self-host: a single mid-range GPU handles thousands of requests per day. The cost curve bends at the main LLM: a large model wants high-end hardware — and then your hosting price competes with the big providers' list prices, where the same capability often ships for less. That's the uncomfortable truth of self-hosting foundation models.

So the pragmatic recommendation is: embeddings + reranker + small models (context prefix, classification, triage) locally. The large main model, as long as it's defensible, via Azure OpenAI EU or an equivalent managed offering with EU data residency. That's not a compromise, that's economic honesty.

1×

GPU for embeddings + reranker

mid-range is enough

high-end

for the large LLM

where economics flip

hybrid

the pragmatic path

small local, large managed

~EU

Data residency

stays in house

04

The hybrid that works in practice

For most mid-market customers the sweet spot is a hybrid: every datastore and inference layer that sees customer data lives locally — databases, search index, embeddings, reranker, optional voice. Calls to a large foundation model leave the building — but only the question plus the selected chunks. No raw documents, no full conversations, no tenant context beyond what's needed.

Self-hosting isn't “everything at our place”. Self-hosting is “what has to stay with us, stays — and we know exactly what leaves”.

05

The pricing conversation

Self-hosting forces a different pricing conversation. Cloud SaaS is “per conversation” or “per token”. On-prem is usually a licence plus an operational fee. Neither is hardwired into ChatFlow — the platform supports both paths. That's intentional: if your customers need sovereignty, you cannot insist that the invoice look like a volume SaaS invoice.

But it also means: the self-hosted option cannot be a surcharge on top of the cloud version; it has to be the actual substance. Self-hosted customers get less convenient updates and more control. Cloud customers get more convenience and accept documented cross-border transfers. Both contracts are honest.

Self-hosting is not a checkbox feature.

The checkbox trap

What actually has to move

What's cheap, what's expensive

The hybrid that works in practice

The pricing conversation