Skill

Owner: Jarvis · Team: Jarvis · Source: ~/.openclaw/workspace/skills/tg-voice-whisper/SKILL.md

Transcribe Telegram voice messages (.ogg) to text using local OpenAI Whisper. No API keys, fully offline. Use when the user sends a voice recording via Telegram.


Playbook (mirrored from disk)

TG Voice Whisper

Transcribes Telegram voice messages to text using local Whisper (small model — good Hebrew accuracy; tiny for speed, large-v3-turbo for max accuracy — all cached). Offline, private, no API keys.

Requirements

  • ffmpeg — audio conversion
  • whisper — transcription (openai-whisper pip package)

Installation

sudo apt-get install -y ffmpeg
pip3 install openai-whisper --break-system-packages

Usage

When a voice message arrives, it lands in ~/.openclaw/media/inbound/ as .ogg:

whisper /path/to/file.ogg --model small --language auto --output_format txt --output_dir /tmp/whisper
cat /tmp/whisper/*.txt

Then reply with the transcribed text.

Notes

  • First run: ~15s model download (~72MB)
  • After cache: <1s on 1vCPU
  • Auto-detect language works well for Hebrew + English
  • For better accuracy: use --model base or --model small