PDF Contract Redactor
name: pdf-contract-redactor
by chayjan · published 2026-03-22
$ claw add gh:chayjan/chayjan-pdf-contract-redactor---
name: pdf-contract-redactor
description: PDF contract redaction tool. Use when the user needs to redact sensitive information from scanned PDF contracts. The tool performs OCR to extract text, identifies field names and their corresponding values, and redacts only the values while keeping field names visible. Supports Alibaba Cloud OCR API for accurate Chinese text recognition.
---
# PDF Contract Redactor
Redact sensitive values from scanned PDF contracts while preserving field names.
What It Does
1. **OCR Recognition**: Uses Alibaba Cloud OCR to extract text and positions from scanned PDFs
2. **Field-Value Matching**: Finds field names (e.g., "合同金额") and their corresponding values (e.g., "45640元")
3. **Selective Redaction**: Covers only the values with black boxes, keeping field names readable
Workflow
Step 1: PDF to Images
Convert PDF pages to high-resolution PNG images (200 DPI) for OCR.
Step 2: OCR with Alibaba Cloud
Call Alibaba Cloud OCR API to get:
Step 3: Match Fields to Values
For each field in the field list:
1. Find the field name text block
2. Look for the corresponding value in:
- **Right side**: Same row, to the right of field name
- **Below**: Next row, aligned with field name
3. Record field-value pair with both bounding boxes
Step 4: Generate Redacted PDF
For each matched value:
1. Convert image coordinates to PDF coordinates
2. Draw black rectangle over the value area
3. Keep field name area unchanged
Field List
The following fields are searched and their values are redacted:
Usage
Prerequisites
1. Alibaba Cloud account with OCR service enabled
2. AccessKey ID and AccessKey Secret
Running the Tool
python scripts/redact_contract.py <input.pdf> <access_key_id> <access_key_secret> [output.pdf]Example:
python scripts/redact_contract.py contract.pdf LTAIxxx xxx contract_redacted.pdfOutput
Implementation Notes
OCR API
Uses Alibaba Cloud "通用文字识别-高精度版" (RecognizeAdvanced API):
Field-Value Matching Logic
# For a field at (fx0, fy0, fx1, fy1)
# Look for values that are:
# 1. To the right: vx0 > fx1 and |vy0 - fy0| < field_height * 2
# 2. Below: vy0 > fy1 and vx0 >= fx0 - field_width * 0.3
# Choose the closest matchCoordinate Transformation
OCR returns coordinates in image space (200 DPI).
Convert to PDF space (72 DPI) using scale factor: `scale = 72 / 200 = 0.36`
Dependencies
pip install pymupdf pillow requestsError Handling
More tools from the same signal band
Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).
Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.
The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...