Article TTS Skill
name: article-tts
by 54meteor · published 2026-04-01
$ claw add gh:54meteor/54meteor-article-tts---
name: article-tts
description: "拍照或文字转音频:文章照片 OCR 提取文字,或直接接收文字,生成 Microsoft Edge TTS 语音,支持中英文、自动转写、语速调节、逐句拆分。| Capture article photos (OCR) or plain text, generate natural audio via Edge TTS. Bilingual support (EN/ZH), configurable speed, voice, and sentence splitting."
requires:
binaries:
- tesseract # OCR engine
- uvx # uvx runner (from uv package)
- edge-tts # Microsoft Edge TTS (via uvx, no install needed)
runtime:
- python3 (with PIL/Pillow)
- tessdata (language models)
install: |
# Install Tesseract OCR + Chinese language pack
apt-get update && apt-get install -y tesseract-ocr tesseract-ocr-chi-sim
# English language pack is included by default
# If needed: apt-get install tesseract-ocr-eng
# uvx / uv will auto-download edge-tts on first run (no manual install)
credentials:
# OpenClaw handles channel authentication via its own plugin system.
# The agent will automatically detect which channel is active and use
# the appropriate credentials already configured in OpenClaw.
# No extra env vars needed — the skill just calls message(...).
#
# Supported channels (via OpenClaw message tool):
# feishu — Feishu bot (app_id/app_secret from Feishu Open Platform)
# telegram — Telegram bot (bot token from BotFather)
# discord — Discord bot (bot token + guild)
# whatsapp — WhatsApp Business API / linked device
# signal — Signal (phone number + signal-cli)
# imessage — iMessage (via macOS/icloud)
# openclaw-weixin — WeChat Work / 个人微信
#
# If the target channel is not configured, the skill saves files locally
# and notifies the user of the output path.
---
# Article TTS Skill
Default Configuration
| 参数 | 默认值 | 说明 |
|------|--------|------|
| `lang` | `en` | 语言:`en` 或 `zh` |
| `skipConfirmation` | `false` | 是否跳过文字确认步骤 |
| `speed` | `90%` | TTS 语速(`--rate=-10%` = 90%) |
| `voice` | `en-US-EmmaNeural`(英文)/ `zh-CN-XiaoxiaoNeural`(中文) | TTS 声音 |
| `splitSentences` | `false` | 是否生成按句拆分的音频 |
Supported Languages
| 语言 | OCR 语言包 | TTS Voice |
|------|-----------|-----------|
| `en` | `eng`(预装) | `en-US-EmmaNeural` |
| `zh` | `chi_sim`(需安装) | `zh-CN-XiaoxiaoNeural` |
> **中文 OCR 语言包安装:**
> - Linux(WSL/Debian/Ubuntu):`apt-get install tesseract-ocr-chi-sim`
> - macOS:`brew install tesseract-lang`(自带中文)
> - Windows:下载 `chi_sim.traineddata` 放入 Tesseract 安装目录的 `tessdata` 文件夹
Workflow
Input Types
Standard Flow(默认,需确认)
图片 → OCR 提取文字 → 展示给用户确认 → 用户确认 → 生成 TTS → 发送
文字 → 直接生成 TTS → 发送Skip-Confirmation Flow ⚠️
用户说"不需要确认"或"直接生成"时,跳过确认步骤。
> **⚠️ 安全提示**:skipConfirmation 会跳过文字确认步骤,OCR 提取的文本(可能包含敏感信息)会直接转为音频并发送。适用于可信来源、低敏感内容。建议默认关闭(`skipConfirmation: false`)。
OCR Step
# 图片预处理
from PIL import Image, ImageOps
img = Image.open(image_path)
img = ImageOps.autocontrast(img.convert('L'), cutoff=10)
w, h = img.size
img = img.resize((w*4, h*4), Image.LANCZOS)
img.save('/tmp/ocr_input.jpg', quality=99)# 英文
tesseract /tmp/ocr_input.jpg stdout -l eng --psm 4
# 中文
tesseract /tmp/ocr_input.jpg stdout -l chi_sim --psm 4TTS Step
全文字频
uvx edge-tts \
-t "FULL TEXT" \
-v en-US-EmmaNeural \
--rate=-10% \
--write-media OUTPUT_DIR/full_article.mp3
# 中文
uvx edge-tts \
-t "中文文字内容" \
-v zh-CN-XiaoxiaoNeural \
--rate=-10% \
--write-media OUTPUT_DIR/full_article.mp3按句拆分(仅 splitSentences=true)
import subprocess, re
def split_sentences(text, lang='en'):
if lang == 'zh':
# 中文按句号/感叹号/问号拆分
sentences = re.split(r'(?<=[。!?])\s*', text)
else:
# 英文按 .!? 拆分
sentences = re.split(r'(?<=[.!?])\s+', text)
return [s.strip() for s in sentences if s.strip()]
sentences = split_sentences(text, lang=lang)
for i, sentence in enumerate(sentences, 1):
num = str(i).zfill(2)
voice = 'zh-CN-XiaoxiaoNeural' if lang == 'zh' else 'en-US-EmmaNeural'
subprocess.run([
"uvx", "edge-tts",
"-t", sentence,
"-v", voice,
"--rate=-10%",
"--write-media", f"OUTPUT_DIR/sentence_{num}.mp3"
])Output Directory
/mnt/d/wslspace/workspace/articles/YYYY-MM-DD-article-slug/
├── original_text.md
├── full_article.mp3
└── sentence_01.mp3 ...Sending via Message Channel
The agent detects the active channel from the runtime context and calls `message(...)` accordingly. No hardcoded channel — the agent uses whichever channel the user is currently chatting through.
# Detect active channel automatically (from runtime inbound metadata)
# channel is inferred: feishu / telegram / discord / whatsapp / signal / imessage / openclaw-weixin
# 发送全文
message(action="send", channel="{active_channel}",
message="📄 全文音频",
media="PATH/full_article.mp3",
filename="full_article.mp3")
# 发送每句
for i, sentence in enumerate(sentences, 1):
num = str(i).zfill(2)
message(action="send", channel="{active_channel}",
message=f"📝 {num}: {sentence}",
media=f"PATH/sentence_{num}.mp3",
filename=f"sentence_{num}.mp3")Channel Behavior Notes
| Channel | 音频支持 | 备注 |
|---------|---------|------|
| Feishu | ✅ | 直接发送 mp3 |
| Telegram | ✅ | 直接发送 mp3 |
| Discord | ✅ | 作为附件发送 |
| WhatsApp | ✅ | 直接发送 mp3 |
| Signal | ⚠️ | 取决于信号强度,可能不支持 |
| iMessage | ⚠️ | 通过 macOS 发送,mp3 兼容性一般 |
| WeChat Work | ✅ | 同 Feishu |
If the channel does not support audio, the agent saves the file to `OUTPUT_DIR` and sends the file path as a text message instead.
Available TTS Voices
English
`en-US-EmmaNeural`, `en-US-BrianNeural`, `en-GB-LibbyNeural`, ...
Chinese
`zh-CN-XiaoxiaoNeural`(女声), `zh-CN-YunxiNeural`(男声), `zh-CN-YunyangNeural`(新闻男声), ...
查看完整列表:`uvx edge-tts -l | grep "zh-CN"`
Notes
More tools from the same signal band
Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).
Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.
The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...