Back to Blog

Why State Machines Are the Future of Voice AI

LLMs are great at language but terrible at following scripts. Here's why we built Prepatu around a deterministic state machine architecture.


The Problem with LLM-Driven Flows

If you’ve ever tried to build a multi-step voice agent using just an LLM, you know the pain. The model will:

  • Skip steps when it thinks it knows the answer
  • Invent transitions that don’t exist in your flow
  • Forget where it is mid-conversation
  • Hallucinate confirmations the user never gave

This isn’t a model quality issue — it’s an architectural one. LLMs are probabilistic by nature. They’re optimized for plausible next tokens, not correct control flow.

Separation of Concerns

The solution is surprisingly simple: don’t let the LLM control the flow.

At BusyTaal, we built Prepatu around a clean separation:

  1. YAML state machine — defines the flow declaratively
  2. LLM — operates inside each state for NLU, generation, and tool calls
  3. Engine — enforces transitions, blocks invalid moves, and manages state

The LLM becomes a guest in each state. It can understand language, extract entities, and call tools — but it cannot decide where to go next. That’s the engine’s job.

Why YAML?

We chose YAML for flow definitions because it’s:

  • Human-readable — product managers can review flows
  • Version-controllable — diffs are meaningful
  • Declarative — you describe what, not how
  • Validated at startup — the engine rejects invalid flows before they run

What This Means in Practice

A booking agent defined in Prepatu will always collect the required information before confirming. A support agent will always verify the customer’s identity before accessing account data. The engine guarantees it.

This is the difference between a demo that works 80% of the time and a production system you can trust.


Interested in building voice agents with state machines? Check out Prepatu on GitHub or read the documentation.