⚡

// Skill profile

docs-scraper

Name: docs-scraper
Author: chrisling-dev

name: scraper

by chrisling-dev · published 2026-03-22

邮件处理数据处理

Total installs

Stars

★ 0

Last updated

2026-03

// Install command

$ claw add gh:chrisling-dev/chrisling-dev-links-to-pdfs

View on GitHub

// Full documentation

---

name: scraper

description: Scrape documents from Notion, DocSend, PDFs, and other sources into local PDF files. Use when the user needs to download, archive, or convert web documents to PDF format. Supports authentication flows for protected documents and session persistence via profiles. Returns local file paths to downloaded PDFs.

---

# docs-scraper

CLI tool that scrapes documents from various sources into local PDF files using browser automation.

Installation

npm install -g docs-scraper

Quick start

Scrape any document URL to PDF:

docs-scraper scrape https://example.com/document

Returns local path: `~/.docs-scraper/output/1706123456-abc123.pdf`

Basic scraping

**Scrape with daemon** (recommended, keeps browser warm):

docs-scraper scrape <url>

**Scrape with named profile** (for authenticated sites):

docs-scraper scrape <url> -p <profile-name>

**Scrape with pre-filled data** (e.g., email for DocSend):

docs-scraper scrape <url> -D email=user@example.com

**Direct mode** (single-shot, no daemon):

docs-scraper scrape <url> --no-daemon

Authentication workflow

When a document requires authentication (login, email verification, passcode):

1. Initial scrape returns a job ID:

```bash

docs-scraper scrape https://docsend.com/view/xxx

# Output: Scrape blocked

# Job ID: abc123

```

2. Retry with data:

```bash

docs-scraper update abc123 -D email=user@example.com

# or with password

docs-scraper update abc123 -D email=user@example.com -D password=1234

```

Profile management

Profiles store session cookies for authenticated sites.

docs-scraper profiles list     # List saved profiles
docs-scraper profiles clear    # Clear all profiles
docs-scraper scrape <url> -p myprofile  # Use a profile

Daemon management

The daemon keeps browser instances warm for faster scraping.

docs-scraper daemon status     # Check status
docs-scraper daemon start      # Start manually
docs-scraper daemon stop       # Stop daemon

Note: Daemon auto-starts when running scrape commands.

Cleanup

PDFs are stored in `~/.docs-scraper/output/`. The daemon automatically cleans up files older than 1 hour.

Manual cleanup:

docs-scraper cleanup                    # Delete all PDFs
docs-scraper cleanup --older-than 1h    # Delete PDFs older than 1 hour

Job management

docs-scraper jobs list         # List blocked jobs awaiting auth

Supported sources

**Direct PDF links** - Downloads PDF directly

**Notion pages** - Exports Notion page to PDF

**DocSend documents** - Handles DocSend viewer

**LLM fallback** - Uses Claude API for any other webpage

---

Scraper Reference

Each scraper accepts specific `-D` data fields. Use the appropriate fields based on the URL type.

DirectPdfScraper

**Handles:** URLs ending in `.pdf`

**Data fields:** None (downloads directly)

**Example:**

docs-scraper scrape https://example.com/document.pdf

---

DocsendScraper

**Handles:** `docsend.com/view/*`, `docsend.com/v/*`, and subdomains (e.g., `org-a.docsend.com`)

**URL patterns:**

Documents: `https://docsend.com/view/{id}` or `https://docsend.com/v/{id}`

Folders: `https://docsend.com/view/s/{id}`

Subdomains: `https://{subdomain}.docsend.com/view/{id}`

**Data fields:**

| Field | Type | Description |

|-------|------|-------------|

| `email` | email | Email address for document access |

| `password` | password | Passcode/password for protected documents |

| `name` | text | Your name (required for NDA-gated documents) |

**Examples:**

# Pre-fill email for DocSend
docs-scraper scrape https://docsend.com/view/abc123 -D email=user@example.com

# With password protection
docs-scraper scrape https://docsend.com/view/abc123 -D email=user@example.com -D password=secret123

# With NDA name requirement
docs-scraper scrape https://docsend.com/view/abc123 -D email=user@example.com -D name="John Doe"

# Retry blocked job
docs-scraper update abc123 -D email=user@example.com -D password=secret123

**Notes:**

DocSend may require any combination of email, password, and name

Folders are scraped as a table of contents PDF with document links

The scraper auto-checks NDA checkboxes when name is provided

---

NotionScraper

**Handles:** `notion.so/*`, `*.notion.site/*`

**Data fields:**

| Field | Type | Description |

|-------|------|-------------|

| `email` | email | Notion account email |

| `password` | password | Notion account password |

**Examples:**

# Public page (no auth needed)
docs-scraper scrape https://notion.so/Public-Page-abc123

# Private page with login
docs-scraper scrape https://notion.so/Private-Page-abc123 \
  -D email=user@example.com -D password=mypassword

# Custom domain
docs-scraper scrape https://docs.company.notion.site/Page-abc123

**Notes:**

Public Notion pages don't require authentication

Toggle blocks are automatically expanded before PDF generation

Uses session profiles to persist login across scrapes

---

LlmFallbackScraper

**Handles:** Any URL not matched by other scrapers (automatic fallback)

**Data fields:** Dynamic - determined by Claude analyzing the page

The LLM scraper uses Claude to analyze the page HTML and detect:

Cookie banners (auto-dismisses)

Expandable content (auto-expands)

CAPTCHAs (reports as blocked)

Paywalls (reports as blocked)

**Common dynamic fields:**

| Field | Type | Description |

|-------|------|-------------|

| `email` | email | Login email (if detected) |

| `password` | password | Login password (if detected) |

| `username` | text | Username (if login uses username) |

**Examples:**

# Generic webpage (no auth)
docs-scraper scrape https://example.com/article

# Webpage requiring login
docs-scraper scrape https://members.example.com/article \
  -D email=user@example.com -D password=secret

# When blocked, check the job for required fields
docs-scraper jobs list
# Then retry with the fields the scraper detected
docs-scraper update abc123 -D username=myuser -D password=secret

**Notes:**

Requires `ANTHROPIC_API_KEY` environment variable

Field names are extracted from the page's actual form fields

Limited to 2 login attempts before failing

CAPTCHAs require manual intervention

---

Data field summary

|---------|-------|----------|------|-------|

| DirectPdf | - | - | - | - |

| DocSend | ✓ | ✓ | ✓ | - |

| Notion | ✓ | ✓ | - | - |

| LLM Fallback | ✓* | ✓* | - | Dynamic* |

*Fields detected dynamically from page analysis

Environment setup (optional)

Only needed for LLM fallback scraper:

export ANTHROPIC_API_KEY=your_key

Optional browser settings:

export BROWSER_HEADLESS=true   # Set false for debugging

Common patterns

**Archive a Notion page:**

docs-scraper scrape https://notion.so/My-Page-abc123

**Download protected DocSend:**

docs-scraper scrape https://docsend.com/view/xxx
# If blocked:
docs-scraper update <job-id> -D email=user@example.com -D password=1234

**Batch scraping with profiles:**

docs-scraper scrape https://site.com/doc1 -p mysite
docs-scraper scrape https://site.com/doc2 -p mysite

Output

**Success**: Local file path (e.g., `~/.docs-scraper/output/1706123456-abc123.pdf`)

**Blocked**: Job ID + required credential types

Troubleshooting

**Timeout**: `docs-scraper daemon stop && docs-scraper daemon start`

**Auth fails**: `docs-scraper jobs list` to check pending jobs

**Disk full**: `docs-scraper cleanup` to remove old PDFs

// Comments

// Related skills

More tools from the same signal band

Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).

Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.

The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...

日历管理数据处理

1 installs★ 0