PDF OCR with Layout Preservation
name: pdf-ocr-layout
by biabia-55 · published 2026-04-01
$ claw add gh:biabia-55/biabia-55-pdf-ocr-layout-free---
name: pdf-ocr-layout
description: >
Full OCR pipeline for scanned PDFs with layout preservation. Use this skill whenever
the user wants to OCR a PDF, convert a scanned document to searchable text, or preserve
the original layout of a scanned book/document. Triggers on: "OCR this PDF", "用PaddleOCR处理",
"识别这个PDF", "扫描版PDF转文字", "把这个PDF做OCR", or when a PDF path is provided alongside
any mention of OCR, text recognition, or layout preservation.
---
# PDF OCR with Layout Preservation
Automated pipeline: **Split → OCR API → Layout PDF → Merge**
Each original page becomes one PDF page, with text placed at exact bounding-box positions
and font sizes calibrated to fill the original block dimensions.
Quick Start
python ~/.claude/skills/pdf-ocr-layout/scripts/pipeline.py "/path/to/input.pdf"Output: `input_ocr.pdf` in the same directory. Intermediate files in `input_ocr_work/`.
Full Options
python ~/.claude/skills/pdf-ocr-layout/scripts/pipeline.py \
"/path/to/input.pdf" \
--output "/path/to/output.pdf" \
--work-dir "/path/to/workdir" \
--chunk-size 90Steps for Claude
1. **Ask for the PDF path** if not already provided in the conversation.
2. **Check dependencies** (install only what's missing):
```bash
pip install pypdf reportlab Pillow requests -q
```
3. **Run the pipeline** and stream output to the user:
```bash
python ~/.claude/skills/pdf-ocr-layout/scripts/pipeline.py "{input_pdf}"
```
4. **Monitor progress** — the script prints step-by-step progress including API polling.
API jobs typically take 1–5 minutes per 90-page chunk.
5. **Report the output path** when done.
Resume / Retry
The pipeline saves state to the work directory and is fully resumable:
If interrupted, simply re-run the same command. It picks up where it left off.
Common Issues
| Problem | Fix |
|---------|-----|
| `ModuleNotFoundError` | Run the pip install command above |
| API 4xx error | Check the PDF isn't password-protected |
| Job stuck in `running` | Normal for large chunks; wait up to 10 min |
| Missing images in output | Images left blank per design (API images are optional) |
| Font too small/large | The font size auto-calibrates — first page may look different if it's a cover |
Output Quality
— verified to recover original ~13–14pt body text
More tools from the same signal band
Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).
Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.
The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...