Skip to main content

How It Works

Voiceblox converts a visual node graph into a running voice agent through a series of well-defined transformation steps.

Data flow

1. User builds graph on React Flow canvas

2. serializeGraph(nodes, edges)          [lib/graph-serializer.ts]

3. SimpleNode[] + SimpleEdge[]

4. graphToConfig(nodes, edges, apiKeys)  [lib/agent/graph-to-config.ts]

5. AgentConfig                           [lib/agent/models.ts]

6. LiveKit: agent/livekit.ts builds LLM/TTS/STT and starts VoicebloxAgent

7. StepWatcher manages step transitions  [lib/agent/step-watcher.ts]

8. Live voice conversation

Key components

Graph serialization (step 2–3)

serializeGraph() converts React Flow’s node objects into compact SimpleNode[] — stripping all UI metadata (position, label, icon) and keeping only the user-configured parameters. This format is optimized for AI editing and JSON storage.

Config conversion (step 4–5)

graphToConfig() traverses the graph starting from the Framework node:
  • Finds connected components (Persona → instructions, LLM → config, TTS → config, STT → config)
  • Performs a DFS from the Start node to collect all conversation steps
  • Builds nextStepIds maps for branching nodes (If/Else, Categorize)
  • Returns a complete AgentConfig

Agent runtime (step 6)

The LiveKit agent (VoicebloxAgent) receives the AgentConfig and:
  • Instantiates LLM, TTS, STT from provider configs
  • Wires up MCP tools and Exa search
  • Creates a StepWatcher with all conversation steps
  • Calls StepWatcher.begin() to start the conversation

Step management (step 7)

StepWatcher is called on every user turn (onUserTurn()). It returns a StepDecision:
  • continue — let the LLM generate a natural reply
  • advance — move to the next step in the chain
  • end — terminate the conversation
  • evaluate — run an LLM evaluation for If/Else or Categorize
  • webhook — fire an HTTP POST

Two deployment modes

Worker mode (production): A standalone process connects to LiveKit, reads AgentConfig from room metadata, and handles multiple rooms concurrently. Local mode (playground): The Next.js server starts an in-process agent when you click Start Session, using the current canvas state directly.