YouTube AnyCaption Summarizer
name: youtube-anycaption-summarizer
by arthurli202602-commits · published 2026-04-01
$ claw add gh:arthurli202602-commits/arthurli202602-commits-youtube-anycaption-summarizer---
name: youtube-anycaption-summarizer
description: "Turn YouTube videos into dependable markdown transcripts and polished summaries — even when caption coverage is messy. This skill works with manual closed captions (CC), auto-generated subtitles, or no usable subtitles at all by using subtitle-first extraction with local Whisper fallback. Supports private/restricted videos via cookies, batch processing, transcript cleanup, language backfill, source-language or user-selected summary language, and end-to-end completion reporting. Ideal for YouTube research, technical walkthroughs, founder content, tutorials, private/internal uploads, and batch video summarization workflows."
metadata: {"openclaw":{"homepage":"https://github.com/arthurli202602-commits/youtube-anycaption-summarizer","requires":{"bins":["yt-dlp","ffmpeg","whisper-cli","python3"]},"install":[{"id":"brew-yt-dlp","kind":"brew","formula":"yt-dlp","bins":["yt-dlp"],"label":"Install yt-dlp (brew)"},{"id":"brew-ffmpeg","kind":"brew","formula":"ffmpeg","bins":["ffmpeg"],"label":"Install ffmpeg (brew)"},{"id":"brew-whisper-cpp","kind":"brew","formula":"whisper-cpp","bins":["whisper-cli"],"label":"Install whisper.cpp CLI (brew)"}]}}
---
# YouTube AnyCaption Summarizer
**The YouTube summarizer that still works when captions are broken, missing, or inconsistent.**
Outputs: raw markdown transcript + polished markdown summary + session-ready result block.
Unlike caption-only tools, this skill still works when subtitles are missing by falling back to local Whisper transcription.
Generate a raw transcript markdown file and a polished summary markdown file from one or more YouTube videos.
This skill is self-contained. It does not require any other YouTube summarizer skill or prior workflow context.
Best for
Why choose this over simpler transcript skills?
Install dependencies
For a fresh macOS setup, new users should be able to copy-paste the following exactly:
brew install yt-dlp ffmpeg whisper-cpp
MODELS_DIR="$HOME/.openclaw/workspace"
MODEL_PATH="$MODELS_DIR/ggml-medium.bin"
mkdir -p "$MODELS_DIR"
if [ ! -f "$MODEL_PATH" ]; then
curl -L https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-medium.bin \
-o "$MODEL_PATH.part" && mv "$MODEL_PATH.part" "$MODEL_PATH"
else
echo "Model already exists at $MODEL_PATH — leaving it unchanged."
fi
command -v python3 yt-dlp ffmpeg whisper-cli
ls -lh "$MODEL_PATH"What this does:
If you want to store models elsewhere, pass `--models-dir /path/to/models` when running the workflow.
Example requests
Quick start
Single video
python3 scripts/run_youtube_workflow.py "https://www.youtube.com/watch?v=VIDEO_ID"This creates a dedicated per-video folder, writes the raw transcript markdown, creates the summary placeholder markdown, and prints JSON describing the outputs plus the exact follow-up commands/prompts needed to finish the summary step.
Important: the workflow script alone is not the finished deliverable. The current OpenClaw session must still:
1. infer/backfill the language if the workflow left it as `unknown`
2. overwrite the placeholder `Summary.md` with a real polished summary
3. run `scripts/complete_youtube_summary.py` to validate/finalize the result
Force simplified Chinese summary
python3 scripts/run_youtube_workflow.py "https://www.youtube.com/watch?v=VIDEO_ID" \
--summary-language zh-CNRestricted video with cookies
python3 scripts/run_youtube_workflow.py "https://www.youtube.com/watch?v=VIDEO_ID" \
--cookies /path/to/cookies.txtor
python3 scripts/run_youtube_workflow.py "https://www.youtube.com/watch?v=VIDEO_ID" \
--cookies-from-browser chromeBatch / queue mode
See `references/batch-input-format.md`.
python3 scripts/run_youtube_workflow.py --batch-file ./youtube-urls.txtWhy this skill stands out
This skill is designed to keep working across the messy reality of YouTube:
That makes it materially more reliable than caption-only workflows. It works well for caption-rich videos, caption-poor videos, and private/internal uploads where subtitle coverage is inconsistent.
Core capabilities:
What this skill produces
For each video, create exactly one dedicated output folder containing these final deliverables:
By default, delete only the known intermediate media, subtitle, and WAV files created by the workflow. Do not wipe unrelated files that may already exist in the per-video folder.
Required local tools
Verify these tools exist before running the workflow:
The workflow also requires a supported Whisper ggml model file in the configured models directory.
Bundled scripts
Use these scripts directly:
Useful references:
Defaults
Public workflow overview
At a high level, the skill does this:
1. fetch metadata first and create safe output paths
2. try manual subtitles, then auto-captions, then local Whisper fallback
3. write `SANITIZED_VIDEO_NAME_transcript_raw.md`
4. create `SANITIZED_VIDEO_NAME_Summary.md` as a placeholder
5. have the current OpenClaw session overwrite the placeholder with a real summary
6. run `scripts/complete_youtube_summary.py` to validate completion, backfill language if needed, and emit a session-ready result block
What counts as completion
For a normal end-to-end request, completion means all of the following are true:
1. the workflow script succeeded
2. if language was initially `unknown`, the language was backfilled into both markdown files
3. the placeholder summary file was overwritten with a real summary
4. `scripts/complete_youtube_summary.py` was run successfully
5. the user received the resulting output paths and timing/result status
If the workflow script succeeded but the summary/completion step did not happen yet, describe the state as partial/in-progress rather than complete.
When to read the deeper references
Read these as needed:
Practical public promise
This skill is optimized for dependable end-to-end output, not just quick transcript extraction:
More tools from the same signal band
Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).
Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.
The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...