HomeBrowseUpload
← Back to registry
// Skill profile

Scrapling Skill

name: scrapling

by damirikys · published 2026-03-22

开发工具数据处理
Total installs
0
Stars
★ 0
Last updated
2026-03
// Install command
$ claw add gh:damirikys/damirikys-scrapling-fetcher
View on GitHub
// Full documentation

---

name: scrapling

description: "Web scraping using Scrapling — a Python framework with anti-bot bypass (Cloudflare Turnstile, fingerprint spoofing), adaptive element tracking, stealth headless browser, and full CSS/XPath extraction. Use when web_fetch fails (Cloudflare, JS-rendered pages), or when extracting structured data from websites (prices, articles, lists). Supports HTTP, stealth, and full browser modes. Source: github.com/D4Vinci/Scrapling (PyPI: scrapling). Only use on sites you have permission to scrape."

license: MIT

metadata:

source: https://github.com/D4Vinci/Scrapling

pypi: https://pypi.org/project/scrapling/

---

# Scrapling Skill

**Source:** https://github.com/D4Vinci/Scrapling (open source, MIT-like license)

**PyPI:** `scrapling` — install before first use (see below)

> ⚠️ Only scrape sites you have permission to access. Respect `robots.txt` and Terms of Service. Do not use stealth modes to bypass paywalls or access restricted content without authorization.

Installation (one-time, confirm with user before running)

pip install scrapling[all]
patchright install chromium  # required for stealth/dynamic modes
  • `scrapling[all]` installs `patchright` (a stealth fork of Playwright, bundled as a PyPI package — not a typo), `curl_cffi`, MCP server deps, and IPython shell.
  • `patchright install chromium` downloads Chromium (~100 MB) via patchright's own installer (same mechanism as `playwright install chromium`).
  • Confirm with user before running — installs ~200 MB of dependencies and browser binaries.
  • Script

    `scripts/scrape.py` — CLI wrapper for all three fetcher modes.

    # Basic fetch (text output)
    python3 ~/skills/scrapling/scripts/scrape.py <url> -q
    
    # CSS selector extraction
    python3 ~/skills/scrapling/scripts/scrape.py <url> --selector ".class" -q
    
    # Stealth mode (Cloudflare bypass) — only on sites you're authorized to access
    python3 ~/skills/scrapling/scripts/scrape.py <url> --mode stealth -q
    
    # JSON output
    python3 ~/skills/scrapling/scripts/scrape.py <url> --selector "h2" --json -q

    Fetcher Modes

  • **http** (default) — Fast HTTP with browser TLS fingerprint spoofing. Most sites.
  • **stealth** — Headless Chrome with anti-detect. For Cloudflare/anti-bot.
  • **dynamic** — Full Playwright browser. For heavy JS SPAs.
  • When to Use Each Mode

  • `web_fetch` returns 403/429/Cloudflare challenge → use `--mode stealth`
  • Page content requires JS execution → use `--mode dynamic`
  • Regular site, just need text/data → use `--mode http` (default)
  • Python Inline Usage

    For custom logic beyond the CLI, write inline Python. See `references/patterns.md` for:

  • Adaptive scraping (`auto_save` / `adaptive` — saves element fingerprints locally)
  • Session/cookie handling
  • Async usage
  • XPath, find_similar, attribute extraction
  • Notes

  • **MCP server** (`scrapling mcp`): starts a local network service for AI-native scraping. Only start if explicitly needed and trusted — it exposes a local HTTP server.
  • **`auto_save=True`**: persists element fingerprints to disk for adaptive re-scraping. Creates local state in working directory.
  • Stealth/dynamic modes use Chromium headless — no `xvfb-run` needed.
  • For large-scale crawls, use the Spider API (see Scrapling docs).
  • // Comments
    Sign in with GitHub to leave a comment.
    // Related skills

    More tools from the same signal band