I've been using Claude Code's Telegram plugin a lot recently. It's a neat bridge — you message Claude via Telegram and it responds as if you were typing in the terminal. Great for when you're away from your desk.

But typing on a phone is slow. I kept wanting to just talk to it. So I built voice transcription into the plugin.

How it works

The flow is simple:

  1. You send a voice note (or audio file) via Telegram
  2. The plugin downloads the file from Telegram's servers
  3. It sends the audio to a Whisper-compatible transcription API
  4. The transcribed text gets passed to Claude as if you'd typed it

Claude also gets a voice_path attribute pointing to the original audio file, so it can reference it if needed.

The implementation

The core of it is a transcribeAudio function that hits a Whisper endpoint:

async function transcribeAudio(filePath: string): Promise<string | null> {
  if (!WHISPER_API_KEY) return null
  try {
    const fileData = readFileSync(filePath)
    const form = new FormData()
    form.append('file', new Blob([fileData], { type: 'audio/ogg' }), 'voice.ogg')
    form.append('model', WHISPER_MODEL)
    const res = await fetch(WHISPER_ENDPOINT, {
      method: 'POST',
      headers: { 'Authorization': `Bearer ${WHISPER_API_KEY}` },
      body: form,
    })
    if (!res.ok) {
      process.stderr.write(`whisper API error ${res.status}: ${await res.text()}\n`)
      return null
    }
    const json = await res.json() as { text?: string }
    return json.text?.trim() || null
  } catch (err) {
    process.stderr.write(`transcription failed: ${err}\n`)
    return null
  }
}

It supports two backends out of the box:

  • Groq — using whisper-large-v3, which is fast and free-tier friendly
  • OpenAI — using whisper-1

Configuration is just environment variables. Set GROQ_API_KEY or OPENAI_API_KEY in your .env file and you're done. If neither is set, voice messages still come through — you just get a fallback message instead of a transcription.

Setting it up

If you want to try it yourself:

  1. Clone the fork:
    git clone https://github.com/edbyford/telegram-voice-fork.git
    
    Then launch Claude Code with the plugin loaded directly:
    claude --plugin-dir /path/to/telegram-voice-fork
    
    Alternatively, if the PR has been merged, you can just install the official plugin as normal:
    /plugin install telegram@claude-plugins-official
    
  2. Follow the standard Telegram plugin setup (create a bot via @BotFather, configure the token with /telegram:configure, pair your account)
  3. Add a Groq or OpenAI API key to your .env file:
    GROQ_API_KEY=gsk_your_key_here
    
  4. That's it — send a voice note and it should transcribe automatically

One gotcha

Whisper can occasionally hallucinate the language. I sent a perfectly clear English voice note and got back Welsh. Something to be aware of if your transcriptions come back looking... unexpected.

What's next

I've submitted a pull request to the official plugin repo. In the meantime, the fork is available at telegram-voice-fork if you want to grab it now.

The whole thing was built in a single session, which says something about how productive you can be when you're pairing with Claude Code itself. Meta, I know.