I've been using Claude Code's Telegram plugin a lot recently. It's a neat bridge — you message Claude via Telegram and it responds as if you were typing in the terminal. Great for when you're away from your desk.
But typing on a phone is slow. I kept wanting to just talk to it. So I built voice transcription into the plugin.
How it works
The flow is simple:
- You send a voice note (or audio file) via Telegram
- The plugin downloads the file from Telegram's servers
- It sends the audio to a Whisper-compatible transcription API
- The transcribed text gets passed to Claude as if you'd typed it
Claude also gets a voice_path attribute pointing to the original audio file, so it can reference it if needed.
The implementation
The core of it is a transcribeAudio function that hits a Whisper endpoint:
async function transcribeAudio(filePath: string): Promise<string | null> {
if (!WHISPER_API_KEY) return null
try {
const fileData = readFileSync(filePath)
const form = new FormData()
form.append('file', new Blob([fileData], { type: 'audio/ogg' }), 'voice.ogg')
form.append('model', WHISPER_MODEL)
const res = await fetch(WHISPER_ENDPOINT, {
method: 'POST',
headers: { 'Authorization': `Bearer ${WHISPER_API_KEY}` },
body: form,
})
if (!res.ok) {
process.stderr.write(`whisper API error ${res.status}: ${await res.text()}\n`)
return null
}
const json = await res.json() as { text?: string }
return json.text?.trim() || null
} catch (err) {
process.stderr.write(`transcription failed: ${err}\n`)
return null
}
}
It supports two backends out of the box:
- Groq — using
whisper-large-v3, which is fast and free-tier friendly - OpenAI — using
whisper-1
Configuration is just environment variables. Set GROQ_API_KEY or OPENAI_API_KEY in your .env file and you're done. If neither is set, voice messages still come through — you just get a fallback message instead of a transcription.
Setting it up
If you want to try it yourself:
- Clone the fork:
Then launch Claude Code with the plugin loaded directly:git clone https://github.com/edbyford/telegram-voice-fork.git
Alternatively, if the PR has been merged, you can just install the official plugin as normal:claude --plugin-dir /path/to/telegram-voice-fork/plugin install telegram@claude-plugins-official - Follow the standard Telegram plugin setup (create a bot via @BotFather, configure the token with
/telegram:configure, pair your account) - Add a Groq or OpenAI API key to your
.envfile:GROQ_API_KEY=gsk_your_key_here - That's it — send a voice note and it should transcribe automatically
One gotcha
Whisper can occasionally hallucinate the language. I sent a perfectly clear English voice note and got back Welsh. Something to be aware of if your transcriptions come back looking... unexpected.
What's next
I've submitted a pull request to the official plugin repo. In the meantime, the fork is available at telegram-voice-fork if you want to grab it now.
The whole thing was built in a single session, which says something about how productive you can be when you're pairing with Claude Code itself. Meta, I know.