Your Claude Code session, on a thumb drive.
seevie is a voice remote for your agent, the size of a USB stick — 48 × 24 mm, lighter than your house key. Hold the button and dictate a prompt into Claude Code from anywhere in the room. Walk away — the tiny screen pulses with every tool call. When the turn finishes, seevie chirps. Tap once, and it reads you the answer.
- Overhead to start
- 2 words
- Hear the answer
- 1 tap
- Long replies
- ≤35 words
- Subscription
- $0
Hold the button to dictate · tap when it chirps. Same 350 ms rule as the firmware — all 48 × 24 mm of it.
01 — The loop
Prompt. Pace. Chirp. Tap.
Agents work in minutes-long silences. The old way: sit there watching a spinner, or walk away and lose the thread. The stick closes the loop — your prompt goes in by voice, and the session's whole lifecycle comes back to your hand.
Dictate the prompt
"Run the integration suite and fix whatever breaks. Send it." — Whisper types it into Claude Code, Return pressed.
Watch it work
Every tool call pulses the header: * Bash, * Edit main.cpp. No more guessing whether it's still going.
Chirp when done
The turn finishes → one short chirp, and the full reply lands on the screen. It never talks over you uninvited.
Tap to hear it
Buffered audio plays instantly — long replies pre-summarized to ≤35 spoken words, full text stays on screen. Tap again to replay.
Answer back
"Great — commit it and open a PR. Send it." The conversation continues. You never sat down.
02 — Live session
Both ends of the room, at once.
Left: Claude Code at your desk. Right: the stick in your hand. Watch one real turn travel the full circuit — this is the actual wire traffic, replayed.
You're across the room. The header shows every move.
Pulses are silent header repaints — no TTS, no logging, dropped server-side unless the stick is in dictation mode.
03 — Notifications
It announces. You decide.
A response is a notification, not an interruption: it renders to speech immediately and waits, buffered, while a chirp and the on-screen text tell you it's there. Glance at it, play it, or ignore it — seevie never talks unless you asked it to.
-
≤250 chars
Spoken verbatim — the TTS call carries a read-exactly directive, because Gemini's promptable TTS will happily ad-lib bare text.
-
>250 chars
Pre-summarized to ≤35 spoken words by a one-shot flash call. Ears get the verdict; the screen keeps every word, paged with B.
-
style
Delivery is promptable: "at a fast, brisk pace", "calm and quiet" — pace and tone shift, content never does.
-
voice
Per-request from 31 voices. Give Claude Code a different voice than chat — or one per workspace — and know who's talking by ear.
Latest wins: a newer response replaces the buffer and chirps again. "exit", a reconnect, or the dashboard's Pause clears everything — the stick never talks from a drawer.
One response, three surfaces
CLAUDE CODE — final message · 487 chars
"All 42 tests pass. The flakiness was a missing backoff cap in retry.ts — clamped it, added a regression test, and committed. Lint and typecheck are clean; the branch is ready to push…"
"All 42 tests pass — fixed the flaky retry backoff, added a regression test, and committed." — 92 chars, Charon, brisk
Full 612 characters, word for word, paged with B — glance beats listen when you're close.
{"chars":487,"voice":"Charon","summarized":true,"spoken_chars":92} — byte counts in the tool log, never content.
05 — Wiring
Two hooks. That's the integration.
Claude Code already announces everything this needs: a Stop hook fires when a turn finishes, PostToolUse fires after every tool call. Two small shell scripts forward those to your worker. Claude Code is never modified — and if the network dies, the hooks fail open and your session never notices.
Claude Code hooks
- ▸Stop → claude-speak.sh — jq's the final message from the transcript (settle-loop beats the flush race)
- ▸PostToolUse → claude-work-status.sh — tool name + file basename, fire-and-forget
- ▸both async, both exit 0 always, config in one speak.env
LiveSession DO
- ▸gates: dictating? paused? mid-utterance? — else 204, silent
- ▸summarize if long (flash) → Gemini TTS, voice + style per request
- ▸buffers the PCM; chirps the device; streams on tap
- ▸status pulses skip all of that — pure header repaint
M5StickS3
- ▸header pulses while Claude works
- ▸chirp + full text when it's done
- ▸tap plays the buffered answer, instantly
- ▸hold answers back — 24/7 push-to-talk
One deliberately weird hop: plain-HTTP requests must never touch this Durable Object's WebSockets (it kills the DO), so /speak parks the work in a slot and the DO's alarm does the TTS and the sends. Boring on the outside, correct on the inside.
$ curl -X POST https://your-worker.workers.dev/speak/m5s3-live \
-H "X-History-Token: $TOKEN" \
-d '{"text":"Deploy finished — all checks green.",
"voice":"Puck", "style":"at a fast pace"}'
→ 200 accepted · chirp · tap to hear
→ 204 when you're not dictating — silence, by design
The server doesn't know what Claude Code is. CI, cron jobs, a long build, your smoke tests — any script with the token becomes a voice in the room.
# which session finished? your ears already know
case "$(basename "$cwd")" in
api) SPEAK_VOICE=Puck ;; # upbeat
firmware) SPEAK_VOICE=Charon ;; # informative
infra) SPEAK_VOICE=Algenib ;; # gravelly
esac
Voice is a per-request parameter — no sessions, no reconnects. Run three Claude Code workspaces and tell them apart without looking. The toggle ships commented-out in the hook.
06 — Trust
A microphone you can reason about.
A device that types into your terminal and speaks your agent's output earns its place by being inspectable. There is no vendor cloud — just your Cloudflare account, your API key, and code you can read.
Deaf by default
Push-to-talk is hardware truth: the mic captures only while your thumb holds A. No wake word means nothing is ever listening for one.
Silent unless dictating
Every speak and pulse is gated on the stick being in dictation mode — requests at any other time return 204 and vanish. Buffers die with the mode.
One tap to mute it all
Screen share starting? The web dashboard's Pause gates transcripts, app switching, speech, and pulses server-side — the stick's header flips so you know.
Logs without content
The audit trail records byte counts, voices, durations — never message text. Status pulses aren't logged at all.
One token, one blast radius
A single bearer token gates speak, dictation delivery, and monitoring. Rotate it with one wrangler secret put if it ever leaks.
MIT, three codebases
Firmware (C++), worker (TypeScript), hooks (bash) in one repo, documented feature by feature. The whole speak-back protocol is five message types.
07 — Install
One icon in the menu bar.
Everything on the Mac side — the companion, the config, the Claude Code hooks — is seevie.app. Drag it in, pair once, click once. No terminal required. (The terminal remains available for those who enjoy it.)
- Stick Dictation · 82%
- Companion Delivering ✓
- Claude Code 2 hooks installed ✓
- Voice Charon · brisk ▾
- Last spoken "All 42 tests pass…" ▸
- Pause delivery screen share
Pause flips the stick's header glyph the moment you toggle it — same server-side gate the dashboard uses.
Drag it in
seevie.app lands in the menu bar and starts the companion at login. The one macOS Accessibility prompt is explained before it's asked — that's what lets your words type.
Pair once
Paste your worker URL and token — seevie verifies the pipe end-to-end and the stick chirps hello when the pairing lands. That chirp is the whole test suite.
One click for Claude Code
"Wire up Claude Code" writes the config, registers both hooks (async, fail-open), and picks a voice that isn't your chat voice. Undo is the same click. Then say "dictate into cmux" — you're in the loop.
Underneath, it's the same two auditable shell scripts — right-click any setting to reveal what it wrote.
08 — FAQ
Asked while pacing.
Will it interrupt me while I'm thinking? +
No — that's the core design decision. Responses never auto-play: you get one short chirp and the text on screen, and the audio waits in a buffer until you tap. If you're mid-sentence when a response lands, even the chirp holds until you finish talking.
Claude's replies are long. Do I have to listen to all of it? +
Anything over 250 characters is pre-summarized to at most 35 spoken words — the verdict, not the essay. The full text is always on the stick's screen (page with B), and the terminal still has every word. Short replies are spoken verbatim.
What if I run several Claude Code sessions at once? +
Give each workspace its own voice — a five-line case block in the hook maps directory → voice, so the api repo sounds like Puck and firmware sounds like Charon. One pending slot means latest-wins: the newest finished session is the one a tap plays.
Does this change Claude Code's behavior? +
Zero. Both hooks run async and exit 0 unconditionally — a dead network, a missing config, or a down worker are all invisible to the session. Claude Code doesn't know the stick exists; it just finishes turns and calls tools like always.
Is it still the general voice assistant too? +
Yes — chat mode is untouched: speech-to-speech Gemini with 30+ tools, web search, notes, Discord routing, scheduled tasks, image generation. The Claude Code loop lives in dictation mode; "exit" drops you back to the assistant.
What does it cost to run? +
No subscription — there's no one to subscribe to. Cloudflare's free tiers carry a single device comfortably; TTS and summarization bill per-use on your own Gemini key, pennies at personal scale. Status pulses cost nothing at all — no model calls involved.
Stop watching
the spinner.
Your agent works in silences. Put the silence in your pocket — and let it chirp when there's something worth hearing.