⚡

// Skill profile

Vision Analyze

Name: Vision Analyze
Author: cntuang

name: vision-analyze

by cntuang · published 2026-03-22

邮件处理图像生成

Total installs

Stars

★ 0

Last updated

2026-03

// Install command

$ claw add gh:cntuang/cntuang-image-vision

View on GitHub

// Full documentation

---

name: vision-analyze

description: Image analysis using multimodal vision models. Use when user needs to: (1) Describe what's in an image, (2) Extract text from images (OCR), (3) Analyze visual content, (4) Compare images, (5) Answer questions about images. Supports JPG, PNG, GIF, WebP formats.

metadata:

{

"openclaw":

{

"emoji": "👁️",

"requires": {},

}

---

# Vision Analyze

Analyze images using the built-in vision capabilities of multimodal AI models.

Quick Start

Analyze an Image

Describe what's in an image:

# The agent will automatically use vision when you provide an image path
image("/path/to/image.jpg", prompt="Describe what's in this image")

Extract Text (OCR)

Extract text from images:

image("/path/to/document.png", prompt="Extract all text from this image")

Analyze Multiple Images

Compare or analyze multiple images:

images(["/path/to/image1.jpg", "/path/to/image2.jpg"], 
       prompt="Compare these two images and describe the differences")

Usage Patterns

Visual Q&A

Ask specific questions about image content:

image("menu.jpg", prompt="What are the prices of the main courses?")
image("chart.png", prompt="What trend does this graph show?")
image("screenshot.png", prompt="What error message is displayed?")

Content Moderation

Check image content:

image("upload.jpg", prompt="Is this image appropriate for a professional setting?")

Data Extraction

Extract structured data from visual content:

image("receipt.jpg", prompt="Extract the date, total amount, and items purchased")
image("business_card.png", prompt="Extract name, phone, email, and company")
image("form.jpg", prompt="Extract all filled fields as key-value pairs")

Visual Comparison

Compare images:

images(["before.jpg", "after.jpg"], 
       prompt="What changes were made between these two images?")

Tips

**Be specific**: The more specific your prompt, the better the results

**Multiple images**: You can analyze up to 20 images at once

**Supported formats**: JPG, PNG, GIF, WebP

**Size limits**: Large images are automatically resized

When to Use

Reading text from screenshots, documents, or photos

Describing visual content for accessibility

Analyzing charts, graphs, or diagrams

Comparing visual changes

Extracting data from forms or receipts

Understanding UI elements or error messages

// Comments

// Related skills

More tools from the same signal band

Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).

Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.

The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...

日历管理数据处理

1 installs★ 0