Voice Agent

idle

0ms

Scenario

TTS Model

Voice

Conversation

Background Agent

Latency

About

The fastest150ms median time-to-first-audio, measured from end-of-user-turn to first audio byte at the client. voice agent in the world. TTFATime to first audio. as low as 150ms.

Runs on the edge using Cloudflare Workers
Speech-to-text, text generation, and text-to-speech are pipelined and streamed. The agent begins responding before you finish hearing silence
Processing starts speculatively, text generation starts when the user pauses, discarding the result if they continue
Tool calls are orchestrated in parallel by a separate agent so the user never waits for a reply

Built by Ahmad Adebowale

GitHub