docs-scraper
name: scraper
by chrisling-dev · published 2026-03-22
$ claw add gh:chrisling-dev/chrisling-dev-links-to-pdfs---
name: scraper
description: Scrape documents from Notion, DocSend, PDFs, and other sources into local PDF files. Use when the user needs to download, archive, or convert web documents to PDF format. Supports authentication flows for protected documents and session persistence via profiles. Returns local file paths to downloaded PDFs.
---
# docs-scraper
CLI tool that scrapes documents from various sources into local PDF files using browser automation.
Installation
npm install -g docs-scraperQuick start
Scrape any document URL to PDF:
docs-scraper scrape https://example.com/documentReturns local path: `~/.docs-scraper/output/1706123456-abc123.pdf`
Basic scraping
**Scrape with daemon** (recommended, keeps browser warm):
docs-scraper scrape <url>**Scrape with named profile** (for authenticated sites):
docs-scraper scrape <url> -p <profile-name>**Scrape with pre-filled data** (e.g., email for DocSend):
docs-scraper scrape <url> -D email=user@example.com**Direct mode** (single-shot, no daemon):
docs-scraper scrape <url> --no-daemonAuthentication workflow
When a document requires authentication (login, email verification, passcode):
1. Initial scrape returns a job ID:
```bash
docs-scraper scrape https://docsend.com/view/xxx
# Output: Scrape blocked
# Job ID: abc123
```
2. Retry with data:
```bash
docs-scraper update abc123 -D email=user@example.com
# or with password
docs-scraper update abc123 -D email=user@example.com -D password=1234
```
Profile management
Profiles store session cookies for authenticated sites.
docs-scraper profiles list # List saved profiles
docs-scraper profiles clear # Clear all profiles
docs-scraper scrape <url> -p myprofile # Use a profileDaemon management
The daemon keeps browser instances warm for faster scraping.
docs-scraper daemon status # Check status
docs-scraper daemon start # Start manually
docs-scraper daemon stop # Stop daemonNote: Daemon auto-starts when running scrape commands.
Cleanup
PDFs are stored in `~/.docs-scraper/output/`. The daemon automatically cleans up files older than 1 hour.
Manual cleanup:
docs-scraper cleanup # Delete all PDFs
docs-scraper cleanup --older-than 1h # Delete PDFs older than 1 hourJob management
docs-scraper jobs list # List blocked jobs awaiting authSupported sources
---
Scraper Reference
Each scraper accepts specific `-D` data fields. Use the appropriate fields based on the URL type.
DirectPdfScraper
**Handles:** URLs ending in `.pdf`
**Data fields:** None (downloads directly)
**Example:**
docs-scraper scrape https://example.com/document.pdf---
DocsendScraper
**Handles:** `docsend.com/view/*`, `docsend.com/v/*`, and subdomains (e.g., `org-a.docsend.com`)
**URL patterns:**
**Data fields:**
| Field | Type | Description |
|-------|------|-------------|
| `email` | email | Email address for document access |
| `password` | password | Passcode/password for protected documents |
| `name` | text | Your name (required for NDA-gated documents) |
**Examples:**
# Pre-fill email for DocSend
docs-scraper scrape https://docsend.com/view/abc123 -D email=user@example.com
# With password protection
docs-scraper scrape https://docsend.com/view/abc123 -D email=user@example.com -D password=secret123
# With NDA name requirement
docs-scraper scrape https://docsend.com/view/abc123 -D email=user@example.com -D name="John Doe"
# Retry blocked job
docs-scraper update abc123 -D email=user@example.com -D password=secret123**Notes:**
---
NotionScraper
**Handles:** `notion.so/*`, `*.notion.site/*`
**Data fields:**
| Field | Type | Description |
|-------|------|-------------|
| `email` | email | Notion account email |
| `password` | password | Notion account password |
**Examples:**
# Public page (no auth needed)
docs-scraper scrape https://notion.so/Public-Page-abc123
# Private page with login
docs-scraper scrape https://notion.so/Private-Page-abc123 \
-D email=user@example.com -D password=mypassword
# Custom domain
docs-scraper scrape https://docs.company.notion.site/Page-abc123**Notes:**
---
LlmFallbackScraper
**Handles:** Any URL not matched by other scrapers (automatic fallback)
**Data fields:** Dynamic - determined by Claude analyzing the page
The LLM scraper uses Claude to analyze the page HTML and detect:
**Common dynamic fields:**
| Field | Type | Description |
|-------|------|-------------|
| `email` | email | Login email (if detected) |
| `password` | password | Login password (if detected) |
| `username` | text | Username (if login uses username) |
**Examples:**
# Generic webpage (no auth)
docs-scraper scrape https://example.com/article
# Webpage requiring login
docs-scraper scrape https://members.example.com/article \
-D email=user@example.com -D password=secret
# When blocked, check the job for required fields
docs-scraper jobs list
# Then retry with the fields the scraper detected
docs-scraper update abc123 -D username=myuser -D password=secret**Notes:**
---
Data field summary
| Scraper | email | password | name | Other |
|---------|-------|----------|------|-------|
| DirectPdf | - | - | - | - |
| DocSend | ✓ | ✓ | ✓ | - |
| Notion | ✓ | ✓ | - | - |
| LLM Fallback | ✓* | ✓* | - | Dynamic* |
*Fields detected dynamically from page analysis
Environment setup (optional)
Only needed for LLM fallback scraper:
export ANTHROPIC_API_KEY=your_keyOptional browser settings:
export BROWSER_HEADLESS=true # Set false for debuggingCommon patterns
**Archive a Notion page:**
docs-scraper scrape https://notion.so/My-Page-abc123**Download protected DocSend:**
docs-scraper scrape https://docsend.com/view/xxx
# If blocked:
docs-scraper update <job-id> -D email=user@example.com -D password=1234**Batch scraping with profiles:**
docs-scraper scrape https://site.com/doc1 -p mysite
docs-scraper scrape https://site.com/doc2 -p mysiteOutput
**Success**: Local file path (e.g., `~/.docs-scraper/output/1706123456-abc123.pdf`)
**Blocked**: Job ID + required credential types
Troubleshooting
More tools from the same signal band
Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).
Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.
The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...