Article Extract
name: article-extract
by caozeal · published 2026-03-22
$ claw add gh:caozeal/caozeal-article-extract---
name: article-extract
description: 提取微信公众号、博客、新闻等网页的正文内容,绕过反爬机制,纯文本输出。
---
# Article Extract
网页文章内容提取工具。支持微信公众号、博客、新闻网站等,输出干净的纯文本内容。
特点
安装
无需安装,直接使用 Python 3 运行。
使用
python3 skills/article-extract/scripts/extract.py <url>示例
# 提取微信公众号文章
python3 skills/article-extract/scripts/extract.py "https://mp.weixin.qq.com/s/xxxxx"
# 提取博客文章
python3 skills/article-extract/scripts/extract.py "https://example.com/blog/post"
# 保存到文件
python3 skills/article-extract/scripts/extract.py "https://mp.weixin.qq.com/s/xxxxx" > article.txt输出
工具会输出提取的纯文本内容到 stdout,可以通过重定向保存到文件:
python3 skills/article-extract/scripts/extract.py "https://..." > output.txt原理
1. 使用标准浏览器 User-Agent 发送 HTTP 请求
2. 解析 HTML,过滤 `<script>`、`<style>`、`<nav>`、`<footer>` 等无关标签
3. 提取正文文本并清理多余空格
限制
依赖
作者
基于 OpenClaw 社区实践封装
More tools from the same signal band
Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).
Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.
The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...