Full duplex · Open source · Self-hosted · MIT

Your Claude Code session, on a thumb drive.

seevie is a voice remote for your agent, the size of a USB stick — 48 × 24 mm, lighter than your house key. Hold the button and dictate a prompt into Claude Code from anywhere in the room. Walk away — the tiny screen pulses with every tool call. When the turn finishes, seevie chirps. Tap once, and it reads you the answer.

Two hooks and you're live Watch a session

Overhead to start: 2 words
Hear the answer: 1 tap
Long replies: ≤35 words
Subscription: $0

dictation

Hold the button to dictate · tap when it chirps. Same 350 ms rule as the firmware — all 48 × 24 mm of it.

Header feed

* Bash·* Read live-session.ts·* Edit AppController.cpp·* Bash pio run·♪ done — tap to hear·"All 42 tests pass."·"Great, push it. Send it."· * Bash·* Read live-session.ts·* Edit AppController.cpp·* Bash pio run·♪ done — tap to hear·"All 42 tests pass."·"Great, push it. Send it."·

01 — The loop

Prompt. Pace. Chirp. Tap.

Agents work in minutes-long silences. The old way: sit there watching a spinner, or walk away and lose the thread. The stick closes the loop — your prompt goes in by voice, and the session's whole lifecycle comes back to your hand.

1 hold A

Dictate the prompt

"Run the integration suite and fix whatever breaks. Send it." — Whisper types it into Claude Code, Return pressed.

2 walk away

Watch it work

Every tool call pulses the header: * Bash, * Edit main.cpp. No more guessing whether it's still going.

3 ♪

Chirp when done

The turn finishes → one short chirp, and the full reply lands on the screen. It never talks over you uninvited.

4 tap A

Tap to hear it

Buffered audio plays instantly — long replies pre-summarized to ≤35 spoken words, full text stays on screen. Tap again to replay.

5 hold A

Answer back

"Great — commit it and open a PR. Send it." The conversation continues. You never sat down.

02 — Live session

Both ends of the room, at once.

Left: Claude Code at your desk. Right: the stick in your hand. Watch one real turn travel the full circuit — this is the actual wire traffic, replayed.

cmux — claude code · ~/dev/api idle

♪ chirp tap A

You're across the room. The header shows every move.

Pulses are silent header repaints — no TTS, no logging, dropped server-side unless the stick is in dictation mode.

03 — Notifications

It announces. You decide.

A response is a notification, not an interruption: it renders to speech immediately and waits, buffered, while a chirp and the on-screen text tell you it's there. Glance at it, play it, or ignore it — seevie never talks unless you asked it to.

≤250 chars
Spoken verbatim — the TTS call carries a read-exactly directive, because Gemini's promptable TTS will happily ad-lib bare text.
>250 chars
Pre-summarized to ≤35 spoken words by a one-shot flash call. Ears get the verdict; the screen keeps every word, paged with B.
style
Delivery is promptable: "at a fast, brisk pace", "calm and quiet" — pace and tone shift, content never does.
voice
Per-request from 31 voices. Give Claude Code a different voice than chat — or one per workspace — and know who's talking by ear.

Latest wins: a newer response replaces the buffer and chirps again. "exit", a reconnect, or the dashboard's Pause clears everything — the stick never talks from a drawer.

One response, three surfaces

CLAUDE CODE — final message · 487 chars

"All 42 tests pass. The flakiness was a missing backoff cap in retry.ts — clamped it, added a regression test, and committed. Lint and typecheck are clean; the branch is ready to push…"

🔈 spoken

"All 42 tests pass — fixed the flaky retry backoff, added a regression test, and committed." — 92 chars, Charon, brisk

▤ screen

Full 612 characters, word for word, paged with B — glance beats listen when you're close.

✓ audit

{"chars":487,"voice":"Charon","summarized":true,"spoken_chars":92} — byte counts in the tool log, never content.

04 — One button

The whole grammar fits on a thumb.

No modes to remember, no chords. Duration is the language — and the firmware makes the risky case impossible.

A hold ≥350 ms

Talk

Push-to-talk, mic hot from the first millisecond of the press. Release to send. Recording always outranks playback — holding mid-speech barges in.

A tap <350 ms

Play / replay

With a response pending, a tap plays it instantly from the server-side buffer. The subtle part: recording starts on press-down, so the tap's audio sliver is cancelled server-side — never transcribed, never pasted, no error. Your words are never lost to a tap, and a tap never leaks into your terminal.

B click / hold

Page / leave

Click pages the full response text on screen. Hold exits dictation — or just say "exit", which the server intercepts so it's never typed anywhere.

05 — Wiring

Two hooks. That's the integration.

Claude Code already announces everything this needs: a Stop hook fires when a turn finishes, PostToolUse fires after every tool call. Two small shell scripts forward those to your worker. Claude Code is never modified — and if the network dies, the hooks fail open and your session never notices.

your mac

Claude Code hooks

▸Stop → claude-speak.sh — jq's the final message from the transcript (settle-loop beats the flush race)
▸PostToolUse → claude-work-status.sh — tool name + file basename, fire-and-forget
▸both async, both exit 0 always, config in one speak.env

↓

POST /speak ▸

your cloudflare worker

LiveSession DO

▸gates: dictating? paused? mid-utterance? — else 204, silent
▸summarize if long (flash) → Gemini TTS, voice + style per request
▸buffers the PCM; chirps the device; streams on tap
▸status pulses skip all of that — pure header repaint

↓

chirp · pulses · PCM ▸ ◂ tap · dictation

your pocket

M5StickS3

▸header pulses while Claude works
▸chirp + full text when it's done
▸tap plays the buffered answer, instantly
▸hold answers back — 24/7 push-to-talk

One deliberately weird hop: plain-HTTP requests must never touch this Durable Object's WebSockets (it kills the DO), so /speak parks the work in a slot and the DO's alarm does the TTS and the sends. Boring on the outside, correct on the inside.

Anything can chirp you

$ curl -X POST https://your-worker.workers.dev/speak/m5s3-live \
    -H "X-History-Token: $TOKEN" \
    -d '{"text":"Deploy finished — all checks green.",
         "voice":"Puck", "style":"at a fast pace"}'
→ 200 accepted · chirp · tap to hear
→ 204 when you're not dictating — silence, by design

The server doesn't know what Claude Code is. CI, cron jobs, a long build, your smoke tests — any script with the token becomes a voice in the room.

Per-workspace voices claude-speak.sh

# which session finished? your ears already know
case "$(basename "$cwd")" in
  api)      SPEAK_VOICE=Puck ;;      # upbeat
  firmware) SPEAK_VOICE=Charon ;;    # informative
  infra)    SPEAK_VOICE=Algenib ;;   # gravelly
esac

Voice is a per-request parameter — no sessions, no reconnects. Run three Claude Code workspaces and tell them apart without looking. The toggle ships commented-out in the hook.

06 — Trust

A microphone you can reason about.

A device that types into your terminal and speaks your agent's output earns its place by being inspectable. There is no vendor cloud — just your Cloudflare account, your API key, and code you can read.

Deaf by default

Push-to-talk is hardware truth: the mic captures only while your thumb holds A. No wake word means nothing is ever listening for one.

Silent unless dictating

Every speak and pulse is gated on the stick being in dictation mode — requests at any other time return 204 and vanish. Buffers die with the mode.

One tap to mute it all

Screen share starting? The web dashboard's Pause gates transcripts, app switching, speech, and pulses server-side — the stick's header flips so you know.

Logs without content

The audit trail records byte counts, voices, durations — never message text. Status pulses aren't logged at all.

One token, one blast radius

A single bearer token gates speak, dictation delivery, and monitoring. Rotate it with one wrangler secret put if it ever leaks.

MIT, three codebases

Firmware (C++), worker (TypeScript), hooks (bash) in one repo, documented feature by feature. The whole speak-back protocol is five message types.

07 — Install

One icon in the menu bar.

Everything on the Mac side — the companion, the config, the Claude Code hooks — is seevie.app. Drag it in, pair once, click once. No terminal required. (The terminal remains available for those who enjoy it.)

◔ 82% Sat 9:41 AM

seevie Connected · m5s3-live

Stick Dictation · 82%
Companion Delivering ✓
Claude Code 2 hooks installed ✓
Voice Charon · brisk ▾
Last spoken "All 42 tests pass…" ▸
Pause delivery screen share

Wire up Claude Code ✓ Settings… · Quit

Pause flips the stick's header glyph the moment you toggle it — same server-side gate the dashboard uses.

1

Drag it in

seevie.app lands in the menu bar and starts the companion at login. The one macOS Accessibility prompt is explained before it's asked — that's what lets your words type.

2

Pair once

Paste your worker URL and token — seevie verifies the pipe end-to-end and the stick chirps hello when the pairing lands. That chirp is the whole test suite.

3

One click for Claude Code

"Wire up Claude Code" writes the config, registers both hooks (async, fail-open), and picks a voice that isn't your chat voice. Undo is the same click. Then say "dictate into cmux" — you're in the loop.

Download seevie.app macOS 14+

Underneath, it's the same two auditable shell scripts — right-click any setting to reveal what it wrote.

08 — FAQ

Asked while pacing.

Will it interrupt me while I'm thinking? +

No — that's the core design decision. Responses never auto-play: you get one short chirp and the text on screen, and the audio waits in a buffer until you tap. If you're mid-sentence when a response lands, even the chirp holds until you finish talking.

Claude's replies are long. Do I have to listen to all of it? +

Anything over 250 characters is pre-summarized to at most 35 spoken words — the verdict, not the essay. The full text is always on the stick's screen (page with B), and the terminal still has every word. Short replies are spoken verbatim.

What if I run several Claude Code sessions at once? +

Give each workspace its own voice — a five-line case block in the hook maps directory → voice, so the api repo sounds like Puck and firmware sounds like Charon. One pending slot means latest-wins: the newest finished session is the one a tap plays.

Does this change Claude Code's behavior? +

Zero. Both hooks run async and exit 0 unconditionally — a dead network, a missing config, or a down worker are all invisible to the session. Claude Code doesn't know the stick exists; it just finishes turns and calls tools like always.

Is it still the general voice assistant too? +

Yes — chat mode is untouched: speech-to-speech Gemini with 30+ tools, web search, notes, Discord routing, scheduled tasks, image generation. The Claude Code loop lives in dictation mode; "exit" drops you back to the assistant.

What does it cost to run? +

No subscription — there's no one to subscribe to. Cloudflare's free tiers carry a single device comfortably; TTS and summarization bill per-use on your own Gemini key, pennies at personal scale. Status pulses cost nothing at all — no model calls involved.

Stop watching
the spinner.

Your agent works in silences. Put the silence in your pocket — and let it chirp when there's something worth hearing.

Wire it up See the loop again