Windows Desktop Automation
name: windows-skills
by civen-cn · published 2026-03-22
$ claw add gh:civen-cn/civen-cn-windows-skills---
name: windows-skills
description: Windows desktop automation skills - screenshot capture, OCR text extraction, and image-based UI element location. Use when: (1) capturing screen content (2) extracting text from images (3) locating UI elements for automation
---
# Windows Desktop Automation
Quick Start
Dependencies
pip install mss pytesseract pillow pyautogui opencv-python numpyNote: OCR requires [Tesseract OCR](https://github.com/UB-Mannheim/tesseract/wiki) installed
Core Features
#### 1. Screenshot
from scripts.screenshot import capture_screen, capture_region, capture_window
# Full screen
capture_screen("output.png")
# Region (x, y, width, height)
capture_region(0, 0, 800, 600, "region.png")
# Window by title
capture_window("Notepad", "notepad.png")#### 2. OCR (Text Recognition)
from scripts.ocr import extract_text
# Extract text from image
text = extract_text("screenshot.png")
print(text)
# Specify language (chi_sim=Chinese, eng=English)
text = extract_text("screenshot.png", lang="chi_sim+eng")#### 3. Image Location
from scripts.image_locate import locate_on_screen, locate_all
# Find image position (returns center coordinates)
pos = locate_on_screen("button.png")
if pos:
x, y, confidence = pos
pyautogui.click(x, y) # Click the found element
# Find all matches
positions = locate_all("icon.png")Scripts
| Script | Description |
|--------|-------------|
| `screenshot.py` | Screenshot capture |
| `ocr.py` | Text recognition |
| `image_locate.py` | Image-based element location |
| `helpers.py` | Common utilities |
Notes
---
# Windows 桌面自动化
快速开始
依赖安装
pip install mss pytesseract pillow pyautogui opencv-python numpy注意:OCR 需要安装 [Tesseract OCR](https://github.com/UB-Mannheim/tesseract/wiki)
核心功能
#### 1. 截图
from scripts.screenshot import capture_screen, capture_region, capture_window
# 全屏截图
capture_screen("output.png")
# 区域截图 (x, y, width, height)
capture_region(0, 0, 800, 600, "region.png")
# 窗口截图
capture_window("Notepad", "notepad.png")#### 2. 文字识别 (OCR)
from scripts.ocr import extract_text
# 从图片提取文字
text = extract_text("screenshot.png")
print(text)
# 指定语言 (chi_sim = 简体中文, eng = 英文)
text = extract_text("screenshot.png", lang="chi_sim+eng")#### 3. 图像定位
from scripts.image_locate import locate_on_screen, locate_all
# 查找图片位置 (返回中心坐标)
pos = locate_on_screen("button.png")
if pos:
x, y, conf = pos
pyautogui.click(x, y) # 点击找到的元素
# 查找所有匹配位置
positions = locate_all("icon.png")脚本说明
| 脚本 | 功能 |
|------|------|
| `screenshot.py` | 截图功能 |
| `ocr.py` | 文字识别 |
| `image_locate.py` | 图像定位 |
| `helpers.py` | 公共工具 |
注意事项
More tools from the same signal band
Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).
Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.
The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...