⚡

// Skill profile

Model Studio Qwen ASR (Non-Realtime)

Name: Model Studio Qwen ASR (Non-Realtime)
Author: cinience

name: alicloud-ai-audio-asr

by cinience · published 2026-03-22

数据处理API集成

Total installs

Stars

★ 0

Last updated

2026-03

// Install command

$ claw add gh:cinience/cinience-alicloud-ai-audio-asr

View on GitHub

// Full documentation

---

name: alicloud-ai-audio-asr

description: Transcribe non-realtime speech with Alibaba Cloud Model Studio Qwen ASR models (`qwen3-asr-flash`, `qwen-audio-asr`, `qwen3-asr-flash-filetrans`). Use when converting recorded audio files to text, generating transcripts with timestamps, or documenting DashScope/OpenAI-compatible ASR request and response fields.

version: 1.0.0

---

Category: provider

# Model Studio Qwen ASR (Non-Realtime)

Validation

mkdir -p output/alicloud-ai-audio-asr
python -m py_compile skills/ai/audio/alicloud-ai-audio-asr/scripts/transcribe_audio.py && echo "py_compile_ok" > output/alicloud-ai-audio-asr/validate.txt

Pass criteria: command exits 0 and `output/alicloud-ai-audio-asr/validate.txt` is generated.

Output And Evidence

Store transcripts and API responses under `output/alicloud-ai-audio-asr/`.

Keep one command log or sample response per run.

Use Qwen ASR for recorded audio transcription (non-realtime), including short audio sync calls and long audio async jobs.

Critical model names

Use one of these exact model strings:

`qwen3-asr-flash`

`qwen-audio-asr`

`qwen3-asr-flash-filetrans`

Selection guidance:

Use `qwen3-asr-flash` or `qwen-audio-asr` for short/normal recordings (sync).

Use `qwen3-asr-flash-filetrans` for long-file transcription (async task workflow).

Prerequisites

Install SDK dependencies (script uses Python stdlib only):

python3 -m venv .venv
. .venv/bin/activate

Set `DASHSCOPE_API_KEY` in environment, or add `dashscope_api_key` to `~/.alibabacloud/credentials`.

Normalized interface (asr.transcribe)

Request

`audio` (string, required): public URL or local file path.

`model` (string, optional): default `qwen3-asr-flash`.

`language_hints` (array<string>, optional): e.g. `zh`, `en`.

`sample_rate` (number, optional)

`vocabulary_id` (string, optional)

`disfluency_removal_enabled` (bool, optional)

`timestamp_granularities` (array<string>, optional): e.g. `sentence`.

`async` (bool, optional): default false for sync models, true for `qwen3-asr-flash-filetrans`.

Response

`text` (string): normalized transcript text.

`task_id` (string, optional): present for async submission.

`status` (string): `SUCCEEDED` or submission status.

`raw` (object): original API response.

Quick start (official HTTP API)

Sync transcription (OpenAI-compatible protocol):

curl -sS --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
  --header "Authorization: Bearer $DASHSCOPE_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "qwen3-asr-flash",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "input_audio",
            "input_audio": {
              "data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
            }
          }
        ]
      }
    ],
    "stream": false,
    "asr_options": {
      "enable_itn": false
    }
  }'

Async long-file transcription (DashScope protocol):

curl -sS --location 'https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription' \
  --header "Authorization: Bearer $DASHSCOPE_API_KEY" \
  --header 'X-DashScope-Async: enable' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "qwen3-asr-flash-filetrans",
    "input": {
      "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
    }
  }'

Poll task result:

curl -sS --location "https://dashscope.aliyuncs.com/api/v1/tasks/<task_id>" \
  --header "Authorization: Bearer $DASHSCOPE_API_KEY"

Local helper script

Use the bundled script for URL/local-file input and optional async polling:

python skills/ai/audio/alicloud-ai-audio-asr/scripts/transcribe_audio.py \
  --audio "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3" \
  --model qwen3-asr-flash \
  --language-hints zh,en \
  --print-response

Long-file mode:

python skills/ai/audio/alicloud-ai-audio-asr/scripts/transcribe_audio.py \
  --audio "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3" \
  --model qwen3-asr-flash-filetrans \
  --async \
  --wait

Operational guidance

For local files, use `input_audio.data` (data URI) when direct URL is unavailable.

Keep `language_hints` minimal to reduce recognition ambiguity.

For async tasks, use 5-20s polling interval with max retry guard.

Save normalized outputs under `output/alicloud-ai-audio-asr/transcripts/`.

Output location

Default output: `output/alicloud-ai-audio-asr/transcripts/`

Override base dir with `OUTPUT_DIR`.

Workflow

1) Confirm user intent, region, identifiers, and whether the operation is read-only or mutating.

2) Run one minimal read-only query first to verify connectivity and permissions.

3) Execute the target operation with explicit parameters and bounded scope.

4) Verify results and save output/evidence files.

References

`references/api_reference.md`

`references/sources.md`

Realtime synthesis is provided by `skills/ai/audio/alicloud-ai-audio-tts-realtime/`.

// Comments

// Related skills

More tools from the same signal band

Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).

Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.

The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...

日历管理数据处理

1 installs★ 0