⚡

// Skill profile

MiniMax Tools

Name: MiniMax Tools
Author: cytwyatt

name: minimax-tools

by cytwyatt · published 2026-03-22

开发工具图像生成

Total installs

Stars

★ 0

Last updated

2026-03

// Install command

$ claw add gh:cytwyatt/cytwyatt-minimax-tools-skill

View on GitHub

// Full documentation

---

name: minimax-tools

description: "Direct MiniMax API integration for speech synthesis (TTS), voice cloning, image generation, video generation, and music generation using local Python scripts instead of MCP. Use when you want reliable script-based MiniMax workflows inside OpenClaw for: (1) text-to-speech with built-in Chinese/English defaults or explicit voice IDs, (2) voice cloning with upload + preview flows, (3) text-to-image or reference-image generation, (4) text-to-video, image-to-video, or first/last-frame video generation with async polling/download, and (5) music generation from prompts and lyrics."

metadata:

{

"openclaw":

{

"requires": { "bins": ["python3"], "env": ["MINIMAX_API_KEY"] },

"primaryEnv": "MINIMAX_API_KEY",

"homepage": "https://github.com/cytwyatt/minimax-tools-skill"

}

---

# MiniMax Tools

Use this skill to call MiniMax multimodal APIs directly through local Python wrappers instead of relying on an external MCP server.

Overview

This skill currently supports:

Speech synthesis (TTS)

Voice cloning

Image generation

Video generation

Music generation

All wrappers are exposed through a single entrypoint script:

python3 scripts/minimax.py <subcommand> ...

Read `references/api-notes.md` only when you need endpoint details or parameter reminders.

Prerequisites

Expect these environment variables to be available before running the scripts:

`MINIMAX_API_KEY`

Optional:

`MINIMAX_BASE_URL` if you need to override the default API host

Python dependency:

`requests`

Routing guide

Use `tts` for speech synthesis

Use `voice` for uploading clone inputs, creating cloned voices, and optionally downloading preview audio

Use `image` for text-to-image or reference-image generation

Use `video` for text-to-video, image-to-video, or first/last-frame video workflows

Use `music` for song or instrumental generation

TTS defaults

Default model: `speech-2.8-turbo`

Default format: `mp3`

Default sample rate: `32000`

Default bitrate: `128000`

Default Chinese voice: `Chinese (Mandarin)_Lyrical_Voice`

Default English voice: `English_Graceful_Lady`

If `--voice` is omitted, the script uses `--voice-lang zh|en` and defaults to `zh`

Voice cloning notes

Clone source audio constraints:

- `mp3`, `m4a`, or `wav`

- 10 seconds to 5 minutes

- <= 20 MB

Optional prompt audio constraints:

- `mp3`, `m4a`, or `wav`

- under 8 seconds

- <= 20 MB

If cloning succeeds, the returned `voice_id` can be used immediately in TTS

MiniMax documentation notes cloned voices are temporary unless used in real TTS within 7 days

Video support

Supported modes:

text-to-video: `video create`

image-to-video: `video i2v`

first/last-frame video: `video fl2v`

Video creation is asynchronous. Use `video query`, `video wait`, and `video download` for task follow-up.

File handling rules

Prefer saving outputs locally and returning file paths

Local image inputs for image/video wrappers can be converted to Data URLs automatically

Prefer URL-based output when MiniMax returns temporary files, then download immediately

Avoid tight polling loops for async video jobs

Resources

`scripts/minimax.py` - unified CLI entrypoint

`scripts/minimax_tts.py` - TTS wrapper

`scripts/minimax_voice.py` - voice cloning wrapper

`scripts/minimax_image.py` - image generation wrapper

`scripts/minimax_video.py` - video generation wrapper

`scripts/minimax_music.py` - music generation wrapper

`references/api-notes.md` - focused API notes and constraints

// Comments

// Related skills

More tools from the same signal band

Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).

Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.

The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...

日历管理数据处理

1 installs★ 0