Audio Speaker Tools
name: audio-speaker-tools
by cmfinlan · published 2026-03-22
$ claw add gh:cmfinlan/cmfinlan-audio-speaker-tools---
name: audio-speaker-tools
description: "Speaker separation, voice comparison, and audio processing tools. Use when working with multi-speaker audio, voice cloning, or speaker verification tasks including: (1) separating speakers from audio files via Demucs and pyannote diarization, (2) comparing voice samples for speaker verification or voice clone quality assessment using Resemblyzer, (3) extracting audio segments, (4) preparing samples for ElevenLabs voice cloning, or (5) validating speaker diarization results."
---
# Audio Speaker Tools
Tools for speaker separation, voice comparison, and audio processing using Demucs, pyannote, and Resemblyzer.
Overview
This skill provides three main workflows:
1. **Speaker separation** - Extract per-speaker audio from multi-speaker recordings
2. **Voice comparison** - Measure speaker similarity between two audio files
3. **Audio processing** - Segment extraction and voice isolation
Prerequisites
Setup Virtual Environment
Run once to create the venv and install dependencies:
bash scripts/setup_venv.shDefault venv location: `./.venv`
**Requirements:**
Scripts
1. Speaker Separation: `diarize_and_slice_mps.py`
Separate speakers from multi-speaker audio:
# Basic usage
HF_TOKEN=<your-hf-token> \
/path/to/venv/bin/python scripts/diarize_and_slice_mps.py \
--input audio.mp3 \
--outdir /path/to/output \
--prefix MyShow
# With speaker constraints
HF_TOKEN=$TOKEN python scripts/diarize_and_slice_mps.py \
--input audio.mp3 \
--outdir ./out \
--min-speakers 2 \
--max-speakers 5 \
--pad-ms 100**Process:**
1. Converts input to 16kHz mono WAV
2. Runs Demucs vocal/background separation (optional, for cleaner input)
3. Runs pyannote speaker diarization (MPS-accelerated)
4. Extracts concatenated per-speaker WAV files
**Output:**
**Important:**
2. Voice Comparison: `compare_voices.py`
Measure similarity between two voice samples using Resemblyzer:
# Basic comparison
python scripts/compare_voices.py \
--audio1 sample1.wav \
--audio2 sample2.wav
# JSON output
python scripts/compare_voices.py \
--audio1 reference.wav \
--audio2 clone.wav \
--threshold 0.85 \
--json
# Exit code = 0 if pass, 1 if fail**Scores:**
**Use cases:**
**See:** `references/scoring-guide.md` for detailed interpretation
3. Audio Trimming
Use `ffmpeg` directly for segment extraction:
# Extract 10-second segment starting at 5 seconds
ffmpeg -i input.mp3 -ss 5 -t 10 -c copy output.mp3
# Extract vocals only with Demucs (before diarization)
demucs --two-stems vocals --out ./separated input.mp3Workflows
Workflow 1: Extract Clean Voice Sample for Cloning
**Goal:** Get a clean, single-speaker sample for ElevenLabs voice cloning
# 1. Separate speakers
HF_TOKEN=<your-hf-token> python scripts/diarize_and_slice_mps.py \
--input podcast.mp3 --outdir ./out --prefix Podcast
# 2. Review speaker files (out/Podcast_speaker1.wav, etc.)
# 3. Select best sample (5-30s, clean speech)
ffmpeg -i out/Podcast_speaker2.wav -ss 10 -t 20 -c copy sample.wav
# 4. Upload to ElevenLabs as instant voice clone**See:** `references/elevenlabs-cloning.md` for best practices
Workflow 2: Validate Voice Clone Quality
**Goal:** Measure how well a cloned voice matches the original
# 1. Generate test audio with ElevenLabs clone
# (done via ElevenLabs web UI or API)
# 2. Compare clone vs. reference
python scripts/compare_voices.py \
--audio1 original_sample.wav \
--audio2 elevenlabs_clone.wav \
--threshold 0.85 \
--json
# 3. Interpret score:
# 0.85+ = excellent, publish-ready
# 0.80-0.84 = acceptable, may need tweaking
# < 0.80 = poor, try different sample or settings**See:** `references/scoring-guide.md` for troubleshooting low scores
Workflow 3: Multi-Speaker Conversation Analysis
**Goal:** Separate and identify speakers in a conversation
# 1. Run diarization
HF_TOKEN=$TOKEN python scripts/diarize_and_slice_mps.py \
--input meeting.mp3 --outdir ./out --prefix Meeting
# 2. Check detected speakers (meta.json)
cat out/meta.json
# 3. Compare speaker pairs to confirm separation
python scripts/compare_voices.py \
--audio1 out/Meeting_speaker1.wav \
--audio2 out/Meeting_speaker2.wav
# Expected: < 0.75 if separation worked correctlyTechnical Notes
Device Acceleration
To force CPU for diarization: `--device cpu`
Audio Formats
HuggingFace Token
Sample Quality Tips
References
Common Issues
"Missing HF token" error
Low voice comparison scores for same speaker
Diarization not detecting all speakers
MPS/Metal acceleration not working
More tools from the same signal band
Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).
Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.
The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...