HomeBrowseUpload
← Back to registry
// Skill profile

Audio Transcription with Sber Salute Speech

name: salute-speech

by chorus12 · published 2026-03-22

开发工具数据处理
Total installs
0
Stars
★ 0
Last updated
2026-03
// Install command
$ claw add gh:chorus12/chorus12-salute-speech
View on GitHub
// Full documentation

---

name: salute-speech

description: >

Transcribe audio files using Sber Salute Speech async API.

Russian-first STT with support for ru-RU, en-US, kk-KZ, ky-KG, uz-UZ.

metadata: { "openclaw": { "requires": { "bins": ["uv"], "env": ["SALUTE_AUTH_DATA"] }, "primaryEnv": "SALUTE_AUTH_DATA" } }

---

# Audio Transcription with Sber Salute Speech

Transcribe audio/video files to text with timestamps via Salute Speech async REST API.

Requirements

  • **API Key**: Environment variable `SALUTE_AUTH_DATA` must be set (Base64-encoded `client_id:client_secret` or raw authorization key from https://developers.sber.ru/studio/).
  • **SSL note**: The script disables SSL verification by default (`verify_ssl=False`) because Sber's certificate chain is non-standard. This is expected.
  • Supported formats & encodings

    | Audio encoding | Content-Type | Typical extensions |

    |---------------|-------------|--------------------|

    | `MP3` | `audio/mpeg` | `.mp3` |

    | `PCM_S16LE` | `audio/wav` | `.wav` |

    | `OPUS` | `audio/ogg` | `.ogg`, `.opus` |

    | `FLAC` | `audio/flac` | `.flac` |

    | `ALAW` | `audio/alaw` | `.alaw` |

    | `MULAW` | `audio/mulaw` | `.mulaw` |

    Supported languages

    `ru-RU`, `en-US`, `kk-KZ` (Kazakh), `ky-KG` (Kyrgyz), `uz-UZ` (Uzbek).

    Workflow

    1. **Identify input files** — from user request.

    2. **Read API key** from host environment.

    3. **Run transcription** — execute `salute_transcribe.py` with `uv` and appropriate arguments.

    4. **Deliver results** — present to user human-readable transcript with timestamps to the user and give a direct link to files.

    Usage

    uv run --with requests {baseDir}/salute_transcribe.py \
      --file /path/to/audio.mp3 \
      --output_dir ~/.openclaw/workspace/transcriptions \
      --lang ru-RU

    Arguments

    | Argument | Required | Default | Description |

    |----------|----------|---------|-------------|

    | `--file` | **Yes** | — | Path to audio/video file |

    | `--output_dir` | No | `~/.openclaw/workspace/transcribations` | Output directory for results |

    | `--lang` | No | `ru-RU` | Language code: `ru-RU`, `en-US`, `kk-KZ`, `ky-KG`, `uz-UZ` |

    | `--audio-encoding` | No | `MP3` | Codec: `MP3`, `PCM_S16LE`, `OPUS`, `FLAC`, `ALAW`, `MULAW` |

    | `--model` | No | `general` | Recognition model: `general` or `callcenter` |

    | `--hyp-count` | No | `1` | Number of alternative hypotheses: `1` or `2` |

    | `--max-wait-time` | No | `300` | Max seconds to wait for async result |

    | `--print` | No | off | Also print transcription to stdout |

    Content-Type mapping

    When the file extension doesn't match `audio/mpeg`, adjust `content_type` in the script or add logic. Current default is `audio/mpeg` (MP3). For `.wav` files use `audio/wav`, etc.

    Output files

    For input file `meetingABC.mp3` the script produces:

    | File | Description |

    |------|-------------|

    | `meetingABC_recognition_orig.json` | Raw API response (full JSON with all hypotheses, timing, confidence) |

    | `meetingABC_pretty.txt` | Formatted human-readable transcript with timestamps |

    Output text format

    [00:01 - 00:20]:
    Ну, даже если сосредоточиться на идее узкой щели.
    
    [00:20 - 00:45]:
    Следующий фрагмент текста здесь.

    Notes

  • Token is valid for ~30 minutes; the script fetches a new one each run.
  • Large files (>1 hour) may need `--max-wait-time` increased beyond 300s.
  • The `callcenter` model is optimized for telephony audio (8kHz, mono).
  • Profanity filter is disabled by default (`enable_profanity_filter=False`).
  • The script uses **normalized text** by default (numbers as digits, abbreviations expanded). Raw text is also available in the JSON output.
  • // Comments
    Sign in with GitHub to leave a comment.
    // Related skills

    More tools from the same signal band