Generate speech audio using Deepdub and attach it as a MEDIA file (Telegram-compatible).
Text-to-Speech service using Doubao (Volcano Engine) API with 200+ voices, interactive voice selection, and multilingual support
High-quality voice synthesis with 18 personas, 32 languages, sound effects, batch processing, and voice design using ElevenLabs API.
Review and categorize uncategorized bank transactions, match them with invoices, and verify bookkeeping entries. Use when the user wants to review transactions, categorize expenses, do bookkeeping, or reconcile their bank account.
Transcribe audio/video with AssemblyAI (local upload or URL), plus subtitles + paragraph/sentence exports.
Fetch info, stats, transcript, and media from Feishu Minutes.
Local speech-to-text using faster-whisper. 4-6x faster than OpenAI Whisper with identical accuracy; GPU acceleration enables ~20x realtime transcription. SRT/VTT/TTML/CSV subtitles, speaker diarization, URL/YouTube input, batch processing with ETA, transcript search, chapter detection, per-file language map.
Encrypted Clawbot-to-Clawbot messaging. Send messages to friends' Clawbots with end-to-end encryption.
Floating avatar widget for AI agents showing emotions, actions, and visual effects. Give your OpenClaw a face! Use when the user wants visual feedback, a floating status window, or to see agent emotions while it works. Triggers on "show avatar", "uruchom avatara", "pokaż avatara", "agent face", "visual feedback".