⚡

// Skill profile

markdown-proxy URL Fetcher

Name: markdown-proxy URL Fetcher
Author: adisinghstudent

```markdown

by adisinghstudent · published 2026-04-01

图像生成社交媒体

Total installs

Stars

★ 0

Last updated

2026-04

// Install command

$ claw add gh:adisinghstudent/adisinghstudent-markdown-proxy-url-fetcher

View on GitHub

// Full documentation

---
name: markdown-proxy-url-fetcher
description: Fetch any URL as clean Markdown via proxy cascade (r.jina.ai / defuddle.md / agent-fetch) with built-in support for WeChat, Feishu/Lark docs, and login-required pages
triggers:
  - fetch this URL as markdown
  - convert webpage to markdown
  - read this article for me
  - scrape this WeChat article
  - fetch this Feishu doc
  - get content from this URL
  - extract markdown from webpage
  - read this tweet or X post
---

# markdown-proxy URL Fetcher

> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.

Convert any URL to clean Markdown with automatic fallback across multiple proxy services. Handles login-required pages including X/Twitter, WeChat public accounts, and Feishu/Lark documents.

## What It Does

| URL Type | Method | Notes |
|----------|--------|-------|
| WeChat (`mp.weixin.qq.com`) | Playwright headless browser | Bypasses anti-scraping |
| Feishu/Lark docs (`feishu.cn`, `larksuite.com`) | Feishu Open API | Requires app credentials |
| YouTube | Dedicated YouTube skill | Not handled by this skill |
| All other URLs | Proxy cascade | Free, no API key needed |

### Proxy Cascade Order
1. **r.jina.ai** — Most complete, preserves images
2. **defuddle.md** — Cleaner output with YAML frontmatter
3. **agent-fetch** — Local fallback, no network proxy needed
4. **defuddle CLI** — Local CLI fallback

## Installation

npx skills add joeseesun/markdown-proxy


Verify installation:

ls ~/.claude/skills/markdown-proxy/SKILL.md


## Prerequisites

### Core (always needed)
- `curl` — built-in on macOS/Linux

### WeChat scraping

pip install playwright beautifulsoup4 lxml

playwright install chromium


### Feishu/Lark docs

export FEISHU_APP_ID=your_app_id

export FEISHU_APP_SECRET=your_app_secret


### Proxy fallback

npx agent-fetch --help # npx auto-downloads, no pre-install needed


## Usage with Claude Code

Just give Claude a URL in natural language:

Read this article: https://example.com/post

Fetch this tweet: https://x.com/user/status/123456

Read this WeChat article: https://mp.weixin.qq.com/s/abc123

Convert this Feishu doc: https://company.feishu.cn/docx/AbCdEfGh

Read this Feishu wiki page: https://company.feishu.cn/wiki/AbCdEfGh


## How the Proxy Cascade Works

# Step 1: Try r.jina.ai

curl -s "https://r.jina.ai/https://example.com/article"

# Step 2: If empty/failed, try defuddle.md

curl -s "https://defuddle.md/https://example.com/article"

# Step 3: If still failed, try agent-fetch locally

npx agent-fetch https://example.com/article

# Step 4: Last resort — defuddle CLI

npx defuddle https://example.com/article


## WeChat Scraping (Built-in Script)

The skill includes `fetch_wechat.py` using Playwright:

# The script is bundled — Claude Code invokes it automatically

# Manual usage:

python ~/.claude/skills/markdown-proxy/fetch_wechat.py \

"https://mp.weixin.qq.com/s/your_article_id"


What the script does internally:

from playwright.sync_api import sync_playwright

from bs4 import BeautifulSoup

def fetch_wechat_article(url: str) -> str:

with sync_playwright() as p:

browser = p.chromium.launch(headless=True)

page = browser.new_page()

page.goto(url, wait_until="networkidle")

html = page.content()

browser.close()

soup = BeautifulSoup(html, "lxml")

# Extracts #js_content div (WeChat article body)

article = soup.find(id="js_content")

return convert_to_markdown(article)


## Feishu/Lark Document Support

The bundled `fetch_feishu.py` script uses Feishu Open API:

# Set credentials (required)

export FEISHU_APP_ID=cli_xxxxxxxxxxxxxxxx

export FEISHU_APP_SECRET=your_secret_here

# Manual invocation

python ~/.claude/skills/markdown-proxy/fetch_feishu.py \

"https://company.feishu.cn/docx/AbCdEfGhIjKl"


### Supported Feishu URL Formats

https://{tenant}.feishu.cn/docx/{doc_id} # New docs

https://{tenant}.feishu.cn/docs/{doc_id} # Legacy docs

https://{tenant}.feishu.cn/wiki/{wiki_id} # Wiki pages

https://{tenant}.larksuite.com/docx/{doc_id} # Lark (international)


### Required Feishu App Permissions
- `docx:document:readonly` — for docx and doc files
- `wiki:wiki:readonly` — for wiki pages

### How Feishu API Fetching Works

import os, requests

def get_feishu_token():

resp = requests.post(

"https://open.feishu.cn/open-apis/auth/v3/tenant_access_token/internal",

json={

"app_id": os.environ["FEISHU_APP_ID"],

"app_secret": os.environ["FEISHU_APP_SECRET"],

}

)

return resp.json()["tenant_access_token"]

def fetch_doc_blocks(doc_id: str, token: str):

headers = {"Authorization": f"Bearer {token}"}

resp = requests.get(

f"https://open.feishu.cn/open-apis/docx/v1/documents/{doc_id}/blocks",

headers=headers

)

return resp.json()["data"]["items"]


### Supported Feishu Block Types → Markdown
| Block Type | Markdown Output |
|------------|-----------------|
| Heading 1-9 | `# ` to `######### ` |
| Bullet list | `- item` |
| Ordered list | `1. item` |
| Code block | ` ```lang\ncode\n``` ` |
| Quote | `> text` |
| Todo | `- [ ] task` / `- [x] done` |
| Equation | `$$math$$` |
| Image | `![alt](url)` |

## Real Code Examples

### Fetch Any URL Programmatically

import subprocess

def fetch_as_markdown(url: str) -> str:

"""Try proxy cascade to get URL content as Markdown."""

# Try r.jina.ai first

result = subprocess.run(

["curl", "-s", f"https://r.jina.ai/{url}"],

capture_output=True, text=True, timeout=30

)

if result.stdout.strip():

return result.stdout

# Fall back to defuddle.md

result = subprocess.run(

["curl", "-s", f"https://defuddle.md/{url}"],

capture_output=True, text=True, timeout=30

)

if result.stdout.strip():

return result.stdout

# Fall back to agent-fetch

result = subprocess.run(

["npx", "agent-fetch", url],

capture_output=True, text=True, timeout=60

)

return result.stdout or "Failed to fetch content"


### Route by URL Type

def smart_fetch(url: str) -> str:

if "mp.weixin.qq.com" in url:

return subprocess.run(

["python", "fetch_wechat.py", url],

capture_output=True, text=True

).stdout

elif any(d in url for d in ["feishu.cn", "larksuite.com"]):

return subprocess.run(

["python", "fetch_feishu.py", url],

capture_output=True, text=True

).stdout

else:

return fetch_as_markdown(url)


### Using r.jina.ai with Custom Headers

# Basic fetch

curl -s "https://r.jina.ai/https://example.com/article"

# With X-Return-Format header for specific output

curl -s \

-H "X-Return-Format: markdown" \

"https://r.jina.ai/https://example.com/article"

# With timeout

curl -s --max-time 20 "https://r.jina.ai/https://example.com/article"


### Using defuddle.md

# Returns Markdown with YAML frontmatter (title, author, date)

curl -s "https://defuddle.md/https://example.com/article"

# Output format:

# ---

# title: Article Title

# author: Author Name

# date: 2026-03-21

# ---

# # Article Title

# Content...


## Troubleshooting

| Problem | Fix |
|---------|-----|
| WeChat fetch fails | `playwright install chromium` |
| Feishu returns 403 | Verify `FEISHU_APP_ID` + `FEISHU_APP_SECRET` are set; check app permissions in Feishu admin |
| Feishu wiki fails | Add `wiki:wiki:readonly` permission to your Feishu app |
| r.jina.ai returns empty | Automatic — falls back to defuddle.md |
| All proxies fail | URL has strict auth; try `npx agent-fetch <url>` manually |
| Playwright not found | `pip install playwright && playwright install chromium` |
| agent-fetch slow | First run downloads via npx; subsequent runs are cached |

### Debug Proxy Chain Manually

# Test each step

echo "=== r.jina.ai ===" && curl -s "https://r.jina.ai/https://example.com" | head -20

echo "=== defuddle.md ===" && curl -s "https://defuddle.md/https://example.com" | head -20

echo "=== agent-fetch ===" && npx agent-fetch https://example.com 2>&1 | head -20


### Verify Feishu Credentials

# Test token generation

curl -X POST "https://open.feishu.cn/open-apis/auth/v3/tenant_access_token/internal" \

-H "Content-Type: application/json" \

-d "{\"app_id\": \"$FEISHU_APP_ID\", \"app_secret\": \"$FEISHU_APP_SECRET\"}"

# Should return: {"code":0,"msg":"ok","tenant_access_token":"...","expire":7200}


## Common Patterns

### Batch Fetch Multiple URLs

urls = [

"https://example.com/post-1",

"https://example.com/post-2",

"https://mp.weixin.qq.com/s/abc123",

]

results = {}

for url in urls:

results[url] = smart_fetch(url)

print(f"Fetched: {url} ({len(results[url])} chars)")


### Save Fetched Markdown to File

curl -s "https://r.jina.ai/https://example.com/article" > article.md

# Or with defuddle for cleaner frontmatter:

curl -s "https://defuddle.md/https://example.com/article" > article.md


### Integration with Claude Code Workflow

# In Claude Code conversation:

User: "Summarize this paper: https://arxiv.org/abs/2401.12345"

# Claude will:

# 1. Detect it's a standard URL

# 2. Use r.jina.ai proxy to fetch content

# 3. Fall back through cascade if needed

# 4. Return summary based on fetched Markdown

// Comments

// Related skills

More tools from the same signal band

Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).

Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.

The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...

日历管理数据处理

1 installs★ 0