How It Works

Voiceblox converts a visual node graph into a running voice agent through a series of well-defined transformation steps.

Data flow

1. User builds graph on React Flow canvas
        ↓
2. serializeGraph(nodes, edges)          [lib/graph-serializer.ts]
        ↓
3. SimpleNode[] + SimpleEdge[]
        ↓
4. graphToConfig(nodes, edges, apiKeys)  [lib/agent/graph-to-config.ts]
        ↓
5. AgentConfig                           [lib/agent/models.ts]
        ↓
6. LiveKit: agent/livekit.ts builds LLM/TTS/STT and starts VoicebloxAgent
        ↓
7. StepWatcher manages step transitions  [lib/agent/step-watcher.ts]
        ↓
8. Live voice conversation

Key components

Graph serialization (step 2–3)

serializeGraph() converts React Flow’s node objects into compact SimpleNode[] — stripping all UI metadata (position, label, icon) and keeping only the user-configured parameters. This format is optimized for AI editing and JSON storage.

Config conversion (step 4–5)

graphToConfig() traverses the graph starting from the Framework node:

Finds connected components (Persona → instructions, LLM → config, TTS → config, STT → config)
Performs a DFS from the Start node to collect all conversation steps
Builds nextStepIds maps for branching nodes (If/Else, Categorize)
Returns a complete AgentConfig

Agent runtime (step 6)

The LiveKit agent (VoicebloxAgent) receives the AgentConfig and:

Instantiates LLM, TTS, STT from provider configs
Wires up MCP tools and Exa search
Creates a StepWatcher with all conversation steps
Calls StepWatcher.begin() to start the conversation

Step management (step 7)

StepWatcher is called on every user turn (onUserTurn()). It returns a StepDecision:

continue — let the LLM generate a natural reply
advance — move to the next step in the chain
end — terminate the conversation
evaluate — run an LLM evaluation for If/Else or Categorize
webhook — fire an HTTP POST

Two deployment modes

Worker mode (production): A standalone process connects to LiveKit, reads AgentConfig from room metadata, and handles multiple rooms concurrently. Local mode (playground): The Next.js server starts an in-process agent when you click Start Session, using the current canvas state directly.

Architecture

​How It Works

​Data flow

​Key components

​Graph serialization (step 2–3)

​Config conversion (step 4–5)

​Agent runtime (step 6)

​Step management (step 7)

​Two deployment modes