HomeBrowseUpload
← Back to registry
// Skill profile

Paper Parse

name: paper-parse

by chen-li-17 · published 2026-03-22

数据处理自动化任务
Total installs
0
Stars
★ 0
Last updated
2026-03
// Install command
$ claw add gh:chen-li-17/chen-li-17-paper-parse-figures
View on GitHub
// Full documentation

---

name: paper-parse

description: Parse academic PDF papers into markdown with figure extraction.

metadata:

{

"openclaw":

{

"emoji": "📄",

"requires": { "bins": ["uv"] },

},

}

---

# Paper Parse

Parse academic PDF papers into structured markdown with figure extraction using PyMuPDF.

Usage

uv run {baseDir}/scripts/parse_paper.py --pdf /path/to/paper.pdf [--output-dir ./output]

Output

The tool generates:

  • `{paper_name}_content.md` - Full paper content in markdown
  • `{paper_name}_parsed.json` - Structured metadata including:
  • - Paper title

    - Number of pages

    - Extracted figures with captions and paths

  • `cover_title_authors.png` - First-page snapshot focused on title + authors region
  • `figures/` - Directory containing high-resolution figure screenshots
  • Example

    uv run scripts/parse_paper.py --pdf ~/papers/my-paper.pdf --output-dir ./parsed

    Output structure:

    ./parsed/
    ├── my-paper_content.md
    ├── my-paper_parsed.json
    └── figures/
        ├── figure_1.png
        ├── figure_2.png
        └── ...

    Dependencies

  • PyMuPDF (fitz) - PDF parsing and rendering
  • pymupdf4llm - Markdown conversion
  • These are automatically managed by uv via the inline script metadata.

    // Comments
    Sign in with GitHub to leave a comment.
    // Related skills

    More tools from the same signal band