Web Reader Pro - OpenClaw Skill
name: web-reader-pro
by 0xcjl · published 2026-04-01
$ claw add gh:0xcjl/0xcjl-web-reader-pro---
name: web-reader-pro
description: "Advanced web content extraction skill for OpenClaw using multi-tier fallback strategy (Jina → Scrapling → WebFetch) with intelligent routing, caching, quality scoring, and domain learning. Use when: reading article content, extracting web page text, scraping dynamic JS-heavy pages, or fetching WeChat official account articles."
metadata:
author: 0xcjl
version: "1.0.0"
---
# Web Reader Pro - OpenClaw Skill
Overview
Web Reader Pro is an advanced web content extraction skill for OpenClaw that uses a multi-tier fallback strategy with intelligent routing, caching, and quality assessment.
Features
1. Three-Tier Fallback Strategy
2. Jina Quota Monitoring
3. Smart Cache Layer
4. Extraction Quality Scoring
5. Domain-Level Routing Learning
6. Retry with Exponential Backoff
Installation
# Install dependencies
pip install -r requirements.txt
# Install Scrapling (requires Node.js)
./scripts/install_scrapling.sh
# Or install Scrapling manually
npm install -g @scrapinghub/scraplingUsage
Basic Usage
from scripts.web_reader_pro import WebReaderPro
reader = WebReaderPro()
result = reader.fetch("https://example.com")
print(result['title'])
print(result['content'])Advanced Configuration
reader = WebReaderPro(
jina_api_key="your-jina-key", # Optional: set via env JINA_API_KEY
cache_ttl=3600, # Cache TTL in seconds (default: 3600)
quality_threshold=200, # Min word count for quality (default: 200)
max_retries=3, # Max retries per tier (default: 3)
enable_learning=True, # Enable domain learning (default: True)
scrapling_path="/usr/local/bin/scrapling" # Path to scrapling binary
)Result Format
{
"title": "Page Title",
"content": "Extracted content in markdown...",
"url": "https://example.com",
"tier_used": "jina|scrapling|webfetch",
"quality_score": 85,
"cached": False,
"domain_learned_tier": "jina",
"extracted_at": "2024-01-01T00:00:00Z"
}Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| `JINA_API_KEY` | Jina Reader API key | Required for Tier 1 |
| `WEB_READER_CACHE_DIR` | Cache directory path | `~/.openclaw/cache/web-reader-pro/` |
| `WEB_READER_LEARNING_DB` | Learning database path | `~/.openclaw/data/web-reader-pro/routes.json` |
| `WEB_READER_JINA_QUOTA` | Jina quota limit | `100000` |
API Reference
WebReaderPro.fetch(url, force_refresh=False)
Fetch and extract content from a URL.
**Parameters:**
**Returns:** Dict with title, content, metadata
WebReaderPro.fetch_with_tier(url, preferred_tier)
Fetch using a specific tier (bypassing automatic selection).
**Parameters:**
WebReaderPro.get_jina_status()
Get current Jina API quota usage.
**Returns:** Dict with count, limit, percentage, warnings
WebReaderPro.clear_cache(url=None)
Clear cache for specific URL or all URLs.
**Parameters:**
WebReaderPro.get_domain_routes()
Get learned domain-to-tier mappings.
**Returns:** Dict of domain -> preferred tier
Tier Comparison
| Tier | Speed | JS Rendering | Best For | Cost |
|------|-------|--------------|----------|------|
| Jina | Fast | No | Static pages, articles | API calls |
| Scrapling | Medium | Yes | SPAs, dynamic content | CPU |
| WebFetch | Fastest | No | Simple pages, fallbacks | Free |
License
MIT
More tools from the same signal band
Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).
Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.
The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...