How It Works
Voiceblox converts a visual node graph into a running voice agent through a series of well-defined transformation steps.Data flow
Key components
Graph serialization (step 2–3)
serializeGraph() converts React Flow’s node objects into compact SimpleNode[] — stripping all UI metadata (position, label, icon) and keeping only the user-configured parameters. This format is optimized for AI editing and JSON storage.
Config conversion (step 4–5)
graphToConfig() traverses the graph starting from the Framework node:
- Finds connected components (Persona → instructions, LLM → config, TTS → config, STT → config)
- Performs a DFS from the Start node to collect all conversation steps
- Builds
nextStepIdsmaps for branching nodes (If/Else, Categorize) - Returns a complete
AgentConfig
Agent runtime (step 6)
The LiveKit agent (VoicebloxAgent) receives the AgentConfig and:
- Instantiates LLM, TTS, STT from provider configs
- Wires up MCP tools and Exa search
- Creates a
StepWatcherwith all conversation steps - Calls
StepWatcher.begin()to start the conversation
Step management (step 7)
StepWatcher is called on every user turn (onUserTurn()). It returns a StepDecision:
continue— let the LLM generate a natural replyadvance— move to the next step in the chainend— terminate the conversationevaluate— run an LLM evaluation for If/Else or Categorizewebhook— fire an HTTP POST
Two deployment modes
Worker mode (production): A standalone process connects to LiveKit, readsAgentConfig from room metadata, and handles multiple rooms concurrently.
Local mode (playground): The Next.js server starts an in-process agent when you click Start Session, using the current canvas state directly.