Scrapling Web Scraping Skill
name: scrapling
by cryptos3c · published 2026-03-22
$ claw add gh:cryptos3c/cryptos3c-openclaw-scrapling---
name: scrapling
description: Advanced web scraping with anti-bot bypass, JavaScript support, and adaptive selectors. Use when scraping websites with Cloudflare protection, dynamic content, or frequent UI changes.
homepage: https://github.com/D4Vinci/Scrapling
version: 1.0.0
metadata:
clawdbot:
emoji: 🕷️
requires:
bins: [python3, pip]
python_packages: [scrapling]
category: web-scraping
author: OpenClaw Community
---
# Scrapling Web Scraping Skill
Use Scrapling to scrape modern websites, including those with anti-bot protection, JavaScript-rendered content, and adaptive element tracking.
When to Use This Skill
Commands
All commands use the `scrape.py` script in this skill's directory.
Basic HTTP Scraping (Fast)
python scrape.py \
--url "https://example.com" \
--selector ".product" \
--output products.json**Use when:** Static HTML, no JavaScript, no bot protection
Stealth Mode (Bypass Anti-Bot)
python scrape.py \
--url "https://nopecha.com/demo/cloudflare" \
--stealth \
--selector "#content" \
--output data.json**Use when:** Cloudflare protection, bot detection, fingerprinting
**Features:**
Dynamic/JavaScript Content
python scrape.py \
--url "https://spa-website.com" \
--dynamic \
--selector ".loaded-content" \
--wait-for ".loaded-content" \
--output data.json**Use when:** React/Vue/Angular apps, lazy-loaded content, AJAX
**Features:**
Adaptive Selectors (Survives Website Changes)
# First time - save the selector pattern
python scrape.py \
--url "https://example.com" \
--selector ".product-card" \
--adaptive-save \
--output products.json
# Later, if website structure changes
python scrape.py \
--url "https://example.com" \
--adaptive \
--output products.json**Use when:** Website frequently redesigns, need robust scraping
**How it works:**
Session Management (Login Required)
# Login and save session
python scrape.py \
--url "https://example.com/dashboard" \
--stealth \
--login \
--username "user@example.com" \
--password "password123" \
--session-name "my-session" \
--selector ".protected-data" \
--output data.json
# Reuse saved session (no login needed)
python scrape.py \
--url "https://example.com/another-page" \
--stealth \
--session-name "my-session" \
--selector ".more-data" \
--output more_data.json**Use when:** Content requires authentication, multi-step scraping
Extract Specific Data Types
**Text only:**
python scrape.py \
--url "https://example.com" \
--selector ".content" \
--extract text \
--output content.txt**Markdown:**
python scrape.py \
--url "https://docs.example.com" \
--selector "article" \
--extract markdown \
--output article.md**Attributes:**
# Extract href links
python scrape.py \
--url "https://example.com" \
--selector "a.product-link" \
--extract attr:href \
--output links.json**Multiple fields:**
python scrape.py \
--url "https://example.com/products" \
--selector ".product" \
--fields "title:.title::text,price:.price::text,link:a::attr(href)" \
--output products.jsonAdvanced Options
**Proxy support:**
python scrape.py \
--url "https://example.com" \
--proxy "http://user:pass@proxy.com:8080" \
--selector ".content"**Rate limiting:**
python scrape.py \
--url "https://example.com" \
--selector ".content" \
--delay 2 # 2 seconds between requests**Custom headers:**
python scrape.py \
--url "https://api.example.com" \
--headers '{"Authorization": "Bearer token123"}' \
--selector "body"**Screenshot (for debugging):**
python scrape.py \
--url "https://example.com" \
--stealth \
--screenshot debug.pngPython API (For Custom Scripts)
You can also use Scrapling directly in Python scripts:
from scrapling.fetchers import Fetcher, StealthyFetcher, DynamicFetcher
# Basic HTTP request
page = Fetcher.get('https://example.com')
products = page.css('.product')
for product in products:
title = product.css('.title::text').get()
price = product.css('.price::text').get()
print(f"{title}: {price}")
# Stealth mode (bypass anti-bot)
page = StealthyFetcher.fetch('https://protected-site.com', headless=True)
data = page.css('.content').getall()
# Dynamic content (full browser)
page = DynamicFetcher.fetch('https://spa-app.com', network_idle=True)
items = page.css('.loaded-item').getall()
# Sessions (login)
from scrapling.fetchers import StealthySession
with StealthySession(headless=True) as session:
# Login
login_page = session.fetch('https://example.com/login')
login_page.fill('#username', 'user@example.com')
login_page.fill('#password', 'password123')
login_page.click('#submit')
# Access protected content
protected_page = session.fetch('https://example.com/dashboard')
data = protected_page.css('.private-data').getall()Output Formats
Selector Types
Scrapling supports multiple selector formats:
**CSS selectors:**
--selector ".product"
--selector "div.container > p.text"
--selector "a[href*='product']"**XPath selectors:**
--selector "//div[@class='product']"
--selector "//a[contains(@href, 'product')]"**Pseudo-elements (like Scrapy):**
--selector ".product::text" # Text content
--selector "a::attr(href)" # Attribute value
--selector ".price::text::strip" # Text with whitespace removed**Combined selectors:**
--selector ".product .title::text" # Nested elementsTroubleshooting
**Issue: "Element not found"**
**Issue: "Cloudflare blocking"**
**Issue: "Login not working"**
**Issue: "Selector broke after website update"**
Examples
Scrape Hacker News Front Page
python scrape.py \
--url "https://news.ycombinator.com" \
--selector ".athing" \
--fields "title:.titleline>a::text,link:.titleline>a::attr(href)" \
--output hn_stories.jsonScrape Protected Site with Login
python scrape.py \
--url "https://example.com/data" \
--stealth \
--login \
--username "user@example.com" \
--password "secret" \
--session-name "example-session" \
--selector ".data-table tr" \
--output protected_data.jsonMonitor Price Changes
# Save initial selector pattern
python scrape.py \
--url "https://store.com/product/123" \
--selector ".price" \
--adaptive-save \
--output price.txt
# Later, check price (even if page redesigned)
python scrape.py \
--url "https://store.com/product/123" \
--adaptive \
--output price_new.txtScrape Dynamic JavaScript App
python scrape.py \
--url "https://react-app.com/data" \
--dynamic \
--wait-for ".loaded-content" \
--selector ".item" \
--fields "name:.name::text,value:.value::text" \
--output app_data.jsonNotes
Dependencies
Installed automatically when skill is installed:
Skill Structure
scrapling/
├── SKILL.md # This file
├── scrape.py # Main CLI script
├── requirements.txt # Python dependencies
├── sessions/ # Saved browser sessions
├── selector_cache.json # Adaptive selector patterns
└── examples/ # Example scripts
├── basic.py
├── stealth.py
├── dynamic.py
└── adaptive.pyAdvanced: Custom Python Scripts
For complex scraping tasks, you can create custom Python scripts in this directory:
# custom_scraper.py
from scrapling.fetchers import StealthyFetcher
from scrapling.spiders import Spider, Response
import json
class MySpider(Spider):
name = "custom"
start_urls = ["https://example.com/page1"]
async def parse(self, response: Response):
for item in response.css('.product'):
yield {
"title": item.css('.title::text').get(),
"price": item.css('.price::text').get()
}
# Follow pagination
next_page = response.css('.next-page::attr(href)').get()
if next_page:
yield response.follow(next_page)
# Run spider
result = MySpider().start()
with open('output.json', 'w') as f:
json.dump(result.items, f, indent=2)Run with:
python custom_scraper.py---
**Questions?** Check Scrapling docs: https://scrapling.readthedocs.io
More tools from the same signal band
Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).
Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.
The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...