⚡

// Skill profile

Model Studio Qwen TTS

Name: Model Studio Qwen TTS
Author: cinience

name: alicloud-ai-audio-tts

by cinience · published 2026-03-22

API集成自动化任务

Total installs

Stars

★ 0

Last updated

2026-03

// Install command

$ claw add gh:cinience/cinience-alicloud-ai-audio-tts

View on GitHub

// Full documentation

---

name: alicloud-ai-audio-tts

description: Generate human-like speech audio with Model Studio DashScope Qwen TTS models (qwen3-tts-flash, qwen3-tts-instruct-flash). Use when converting text to speech, producing voice lines for short drama/news videos, or documenting TTS request/response fields for DashScope.

version: 1.0.0

---

Category: provider

# Model Studio Qwen TTS

Validation

mkdir -p output/alicloud-ai-audio-tts
python -m py_compile skills/ai/audio/alicloud-ai-audio-tts/scripts/generate_tts.py && echo "py_compile_ok" > output/alicloud-ai-audio-tts/validate.txt

Pass criteria: command exits 0 and `output/alicloud-ai-audio-tts/validate.txt` is generated.

Output And Evidence

Save generated audio links, sample audio files, and request payloads to `output/alicloud-ai-audio-tts/`.

Keep one validation log per execution.

Critical model names

Use one of the recommended models:

`qwen3-tts-flash`

`qwen3-tts-instruct-flash`

`qwen3-tts-instruct-flash-2026-01-26`

Prerequisites

Install SDK (recommended in a venv to avoid PEP 668 limits):

python3 -m venv .venv
. .venv/bin/activate
python -m pip install dashscope

Set `DASHSCOPE_API_KEY` in your environment, or add `dashscope_api_key` to `~/.alibabacloud/credentials` (env takes precedence).

Normalized interface (tts.generate)

Request

`text` (string, required)

`voice` (string, required)

`language_type` (string, optional; default `Auto`)

`instruction` (string, optional; recommended for instruct models)

`stream` (bool, optional; default false)

Response

`audio_url` (string, when stream=false)

`audio_base64_pcm` (string, when stream=true)

`sample_rate` (int, 24000)

`format` (string, wav or pcm depending on mode)

Quick start (Python + DashScope SDK)

import os
import dashscope

# Prefer env var for auth: export DASHSCOPE_API_KEY=...
# Or use ~/.alibabacloud/credentials with dashscope_api_key under [default].
# Beijing region; for Singapore use: https://dashscope-intl.aliyuncs.com/api/v1
dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"

text = "Hello, this is a short voice line."
response = dashscope.MultiModalConversation.call(
    model="qwen3-tts-instruct-flash",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    text=text,
    voice="Cherry",
    language_type="English",
    instruction="Warm and calm tone, slightly slower pace.",
    stream=False,
)

audio_url = response.output.audio.url
print(audio_url)

Streaming notes

`stream=True` returns Base64-encoded PCM chunks at 24kHz.

Decode chunks and play or concatenate to a pcm buffer.

The response contains `finish_reason == "stop"` when the stream ends.

Operational guidance

Keep requests concise; split long text into multiple calls if you hit size or timeout errors.

Use `language_type` consistent with the text to improve pronunciation.

Use `instruction` only when you need explicit style/tone control.

Cache by `(text, voice, language_type)` to avoid repeat costs.

Output location

Default output: `output/alicloud-ai-audio-tts/audio/`

Override base dir with `OUTPUT_DIR`.

Workflow

1) Confirm user intent, region, identifiers, and whether the operation is read-only or mutating.

2) Run one minimal read-only query first to verify connectivity and permissions.

3) Execute the target operation with explicit parameters and bounded scope.

4) Verify results and save output/evidence files.

References

`references/api_reference.md` for parameter mapping and streaming example.

Realtime mode is provided by `skills/ai/audio/alicloud-ai-audio-tts-realtime/`.

Voice cloning/design are provided by `skills/ai/audio/alicloud-ai-audio-tts-voice-clone/` and `skills/ai/audio/alicloud-ai-audio-tts-voice-design/`.

Source list: `references/sources.md`

// Comments

// Related skills

More tools from the same signal band

Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).

Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.

The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...

日历管理数据处理

1 installs★ 0