How Voxa works

Voxa has three parts: your phone, your laptop, and Claude Code. Your laptop does all the real work; your phone is a beautiful, ringable client.

The pieces

  • Phone client: a native iPhone app (or a browser page) that captures your voice and plays audio back.
  • Laptop server: a small FastAPI server that bridges audio to a Gemini Live voice operator.
  • Voice operator: Gemini Live listens, talks, and decides which actions to run.
  • Claude Code controller: the operator drives Claude Code on your laptop to actually do the work.

The call flow

  1. You start a task by voice.
  2. Claude finishes or needs input, and the laptop sends a push to your phone.
  3. Your phone rings with a native call screen, even when locked.
  4. Answer to hear the update and reply by voice. Decline, and it rings on the next update.

What is supported today

The first version focuses on drive mode: pick a working directory, send spoken instructions, and hear the result read back. Voice folder browsing and barge-in interruption are on the roadmap.

Ready to try it yourself? See the setup guide.