GUI Agent
name: gui-agent
by alfredjamesli · published 2026-04-01
$ claw add gh:alfredjamesli/alfredjamesli-gui-claw---
name: gui-agent
description: "GUI automation via visual detection. Clicking, typing, reading content, navigating menus, filling forms — all through screenshot → detect → act workflow. Supports macOS and Linux."
---
# GUI Agent
STEP 0: Activate Platform (MANDATORY FIRST STEP)
Before any GUI operation, run:
python3 {baseDir}/scripts/activate.pyThis detects your OS, sets up the correct action commands, and outputs platform context.
After running, `{baseDir}/actions/_actions.yaml` contains your platform's commands.
Workflow
OBSERVE → LEARN → ACT → VERIFY → SAVE1. **OBSERVE** — Take screenshot → run OCR + detector → understand current state
→ `read {baseDir}/skills/gui-observe/SKILL.md`
2. **LEARN** — First time with an app? Save components to memory
→ `read {baseDir}/skills/gui-learn/SKILL.md`
→ `learn_from_screenshot()` auto-outputs app tips if available
3. **ACT** — Pick target → execute using `_actions.yaml` commands → verify
→ `read {baseDir}/skills/gui-act/SKILL.md`
→ `read {baseDir}/actions/_actions.yaml` for available commands
4. **VERIFY** — Screenshot again → confirm action succeeded
5. **SAVE** — Record state transitions to memory
→ `read {baseDir}/skills/gui-memory/SKILL.md` for memory structure
Core Rules
Sub-Skills Reference
| Sub-Skill | When to read |
|-----------|-------------|
| `skills/gui-observe/SKILL.md` | Before screenshots or detection |
| `skills/gui-learn/SKILL.md` | Before learning a new app |
| `skills/gui-act/SKILL.md` | Before any click/type action |
| `skills/gui-memory/SKILL.md` | For memory structure details |
| `skills/gui-workflow/SKILL.md` | For multi-step navigation |
| `skills/gui-setup/SKILL.md` | For first-time machine setup |
| `skills/gui-report/SKILL.md` | For task performance reporting |
More tools from the same signal band
Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).
Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.
The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...