⚡

// Skill profile

PDF OCR with Layout Preservation

Name: PDF OCR with Layout Preservation
Author: biabia-55

name: pdf-ocr-layout

by biabia-55 · published 2026-04-01

图像生成API集成

Total installs

Stars

★ 0

Last updated

2026-04

// Install command

$ claw add gh:biabia-55/biabia-55-pdf-ocr-layout-free

View on GitHub

// Full documentation

---

name: pdf-ocr-layout

description: >

Full OCR pipeline for scanned PDFs with layout preservation. Use this skill whenever

the user wants to OCR a PDF, convert a scanned document to searchable text, or preserve

the original layout of a scanned book/document. Triggers on: "OCR this PDF", "用PaddleOCR处理",

"识别这个PDF", "扫描版PDF转文字", "把这个PDF做OCR", or when a PDF path is provided alongside

any mention of OCR, text recognition, or layout preservation.

---

# PDF OCR with Layout Preservation

Automated pipeline: **Split → OCR API → Layout PDF → Merge**

Each original page becomes one PDF page, with text placed at exact bounding-box positions

and font sizes calibrated to fill the original block dimensions.

Quick Start

python ~/.claude/skills/pdf-ocr-layout/scripts/pipeline.py "/path/to/input.pdf"

Output: `input_ocr.pdf` in the same directory. Intermediate files in `input_ocr_work/`.

Full Options

python ~/.claude/skills/pdf-ocr-layout/scripts/pipeline.py \
  "/path/to/input.pdf" \
  --output "/path/to/output.pdf" \
  --work-dir "/path/to/workdir" \
  --chunk-size 90

Steps for Claude

1. **Ask for the PDF path** if not already provided in the conversation.

2. **Check dependencies** (install only what's missing):

```bash

pip install pypdf reportlab Pillow requests -q

```

3. **Run the pipeline** and stream output to the user:

```bash

python ~/.claude/skills/pdf-ocr-layout/scripts/pipeline.py "{input_pdf}"

```

4. **Monitor progress** — the script prints step-by-step progress including API polling.

API jobs typically take 1–5 minutes per 90-page chunk.

5. **Report the output path** when done.

Resume / Retry

The pipeline saves state to the work directory and is fully resumable:

`jobs.json` — API job IDs (prevents re-submitting already-queued chunks)

`chunk_*_results.jsonl` — cached OCR results (skip re-downloading)

`chunk_*_ocr.pdf` — completed chunk PDFs (skip re-rendering)

If interrupted, simply re-run the same command. It picks up where it left off.

Common Issues

| Problem | Fix |

|---------|-----|

| `ModuleNotFoundError` | Run the pip install command above |

| API 4xx error | Check the PDF isn't password-protected |

| Job stuck in `running` | Normal for large chunks; wait up to 10 min |

| Missing images in output | Images left blank per design (API images are optional) |

| Font too small/large | The font size auto-calibrates — first page may look different if it's a cover |

Output Quality

**Block positions**: exact (scaled from 812×1269px OCR space to A4)

**Font sizes**: auto-calibrated using `fs = min(√(h×w / n×0.65), h×0.72)`

— verified to recover original ~13–14pt body text

**Page numbers, headers, footers**: included (all block types preserved)

**Images**: embedded if URL accessible, blank if not

**1 OCR page = 1 PDF page**: always maintained

// Comments

// Related skills

More tools from the same signal band

Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).

Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.

The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...

日历管理数据处理

1 installs★ 0