Desktop Control Skill
description: Advanced desktop automation with mouse, keyboard, and screen control
by breckengan · published 2026-03-22
$ claw add gh:breckengan/breckengan-control---
description: Advanced desktop automation with mouse, keyboard, and screen control
---
# Desktop Control Skill
**The most advanced desktop automation skill for OpenClaw.** Provides pixel-perfect mouse control, lightning-fast keyboard input, screen capture, window management, and clipboard operations.
🎯 Features
Mouse Control
Keyboard Control
Screen Operations
Window Management
Safety Features
---
🚀 Quick Start
Installation
First, install required dependencies:
pip install pyautogui pillow opencv-python pygetwindow
Basic Usage
from skills.desktop_control import DesktopController
# Initialize controller
dc = DesktopController(failsafe=True)
# Mouse operations
dc.move_mouse(500, 300) # Move to coordinates
dc.click() # Left click at current position
dc.click(100, 200, button="right") # Right click at position
# Keyboard operations
dc.type_text("Hello from OpenClaw!")
dc.hotkey("ctrl", "c") # Copy
dc.press("enter")
# Screen operations
screenshot = dc.screenshot()
position = dc.get_mouse_position()
---
📋 Complete API Reference
Mouse Functions
#### `move_mouse(x, y, duration=0, smooth=True)`
Move mouse to absolute screen coordinates.
**Parameters:**
**Example:**
# Instant movement
dc.move_mouse(1000, 500)
# Smooth 1-second movement
dc.move_mouse(1000, 500, duration=1.0)
#### `move_relative(x_offset, y_offset, duration=0)`
Move mouse relative to current position.
**Parameters:**
**Example:**
# Move 100px right, 50px down
dc.move_relative(100, 50, duration=0.3)
#### `click(x=None, y=None, button='left', clicks=1, interval=0.1)`
Perform mouse click.
**Parameters:**
**Example:**
# Simple left click
dc.click()
# Double-click at specific position
dc.click(500, 300, clicks=2)
# Right-click
dc.click(button='right')
#### `drag(start_x, start_y, end_x, end_y, duration=0.5, button='left')`
Drag and drop operation.
**Parameters:**
**Example:**
# Drag file from desktop to folder
dc.drag(100, 100, 500, 500, duration=1.0)
#### `scroll(clicks, direction='vertical', x=None, y=None)`
Scroll mouse wheel.
**Parameters:**
**Example:**
# Scroll down 5 clicks
dc.scroll(-5)
# Scroll up 10 clicks
dc.scroll(10)
# Horizontal scroll
dc.scroll(5, direction='horizontal')
#### `get_mouse_position()`
Get current mouse coordinates.
**Returns:** `(x, y)` tuple
**Example:**
x, y = dc.get_mouse_position()
print(f"Mouse is at: {x}, {y}")
---
Keyboard Functions
#### `type_text(text, interval=0, wpm=None)`
Type text with configurable speed.
**Parameters:**
**Example:**
# Instant typing
dc.type_text("Hello World")
# Human-like typing at 60 WPM
dc.type_text("Hello World", wpm=60)
# Slow typing with 0.1s between keys
dc.type_text("Hello World", interval=0.1)
#### `press(key, presses=1, interval=0.1)`
Press and release a key.
**Parameters:**
**Example:**
# Press Enter
dc.press('enter')
# Press Space 3 times
dc.press('space', presses=3)
# Press Down arrow
dc.press('down')
#### `hotkey(*keys, interval=0.05)`
Execute keyboard shortcut.
**Parameters:**
**Example:**
# Copy (Ctrl+C)
dc.hotkey('ctrl', 'c')
# Paste (Ctrl+V)
dc.hotkey('ctrl', 'v')
# Open Run dialog (Win+R)
dc.hotkey('win', 'r')
# Save (Ctrl+S)
dc.hotkey('ctrl', 's')
# Select All (Ctrl+A)
dc.hotkey('ctrl', 'a')
#### `key_down(key)` / `key_up(key)`
Manually control key state.
**Example:**
# Hold Shift
dc.key_down('shift')
dc.type_text("hello") # Types "HELLO"
dc.key_up('shift')
# Hold Ctrl and click (for multi-select)
dc.key_down('ctrl')
dc.click(100, 100)
dc.click(200, 100)
dc.key_up('ctrl')
---
Screen Functions
#### `screenshot(region=None, filename=None)`
Capture screen or region.
**Parameters:**
**Returns:** PIL Image object
**Example:**
# Full screen
img = dc.screenshot()
# Save to file
dc.screenshot(filename="screenshot.png")
# Capture specific region
img = dc.screenshot(region=(100, 100, 500, 300))
#### `get_pixel_color(x, y)`
Get color of pixel at coordinates.
**Returns:** RGB tuple `(r, g, b)`
**Example:**
r, g, b = dc.get_pixel_color(500, 300)
print(f"Color at (500, 300): RGB({r}, {g}, {b})")
#### `find_on_screen(image_path, confidence=0.8)`
Find image on screen (requires OpenCV).
**Parameters:**
**Returns:** `(x, y, width, height)` or None
**Example:**
# Find button on screen
location = dc.find_on_screen("button.png")
if location:
x, y, w, h = location
# Click center of found image
dc.click(x + w//2, y + h//2)
#### `get_screen_size()`
Get screen resolution.
**Returns:** `(width, height)` tuple
**Example:**
width, height = dc.get_screen_size()
print(f"Screen: {width}x{height}")
---
Window Functions
#### `get_all_windows()`
List all open windows.
**Returns:** List of window titles
**Example:**
windows = dc.get_all_windows()
for title in windows:
print(f"Window: {title}")
#### `activate_window(title_substring)`
Bring window to front by title.
**Parameters:**
**Example:**
# Activate Chrome
dc.activate_window("Chrome")
# Activate VS Code
dc.activate_window("Visual Studio Code")
#### `get_active_window()`
Get currently focused window.
**Returns:** Window title (str)
**Example:**
active = dc.get_active_window()
print(f"Active window: {active}")
---
Clipboard Functions
#### `copy_to_clipboard(text)`
Copy text to clipboard.
**Example:**
dc.copy_to_clipboard("Hello from OpenClaw!")
#### `get_from_clipboard()`
Get text from clipboard.
**Returns:** str
**Example:**
text = dc.get_from_clipboard()
print(f"Clipboard: {text}")
---
⌨️ Key Names Reference
Alphabet Keys
`'a'` through `'z'`
Number Keys
`'0'` through `'9'`
Function Keys
`'f1'` through `'f24'`
Special Keys
Arrow Keys
Modifier Keys
Lock Keys
Punctuation
---
🛡️ Safety Features
Failsafe Mode
Move mouse to **any corner** of the screen to abort all automation.
# Enable failsafe (enabled by default)
dc = DesktopController(failsafe=True)
Pause Control
# Pause all automation for 2 seconds
dc.pause(2.0)
# Check if automation is safe to proceed
if dc.is_safe():
dc.click(500, 500)
Approval Mode
Require user confirmation before actions:
dc = DesktopController(require_approval=True)
# This will ask for confirmation
dc.click(500, 500) # Prompt: "Allow click at (500, 500)? [y/n]"
---
🎨 Advanced Examples
Example 1: Automated Form Filling
dc = DesktopController()
# Click name field
dc.click(300, 200)
dc.type_text("John Doe", wpm=80)
# Tab to next field
dc.press('tab')
dc.type_text("john@example.com", wpm=80)
# Tab to password
dc.press('tab')
dc.type_text("SecurePassword123", wpm=60)
# Submit form
dc.press('enter')
Example 2: Screenshot Region and Save
# Capture specific area
region = (100, 100, 800, 600) # left, top, width, height
img = dc.screenshot(region=region)
# Save with timestamp
import datetime
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
img.save(f"capture_{timestamp}.png")
Example 3: Multi-File Selection
# Hold Ctrl and click multiple files
dc.key_down('ctrl')
dc.click(100, 200) # First file
dc.click(100, 250) # Second file
dc.click(100, 300) # Third file
dc.key_up('ctrl')
# Copy selected files
dc.hotkey('ctrl', 'c')
Example 4: Window Automation
# Activate Calculator
dc.activate_window("Calculator")
time.sleep(0.5)
# Type calculation
dc.type_text("5+3=", interval=0.2)
time.sleep(0.5)
# Take screenshot of result
dc.screenshot(filename="calculation_result.png")
Example 5: Drag & Drop File
# Drag file from source to destination
dc.drag(
start_x=200, start_y=300, # File location
end_x=800, end_y=500, # Folder location
duration=1.0 # Smooth 1-second drag
)
---
⚡ Performance Tips
1. **Use instant movements** for speed: `duration=0`
2. **Batch operations** instead of individual calls
3. **Cache screen positions** instead of recalculating
4. **Disable failsafe** for maximum performance (use with caution)
5. **Use hotkeys** instead of menu navigation
---
⚠️ Important Notes
---
🔧 Troubleshooting
Mouse not moving to correct position
Keyboard input not working
Failsafe triggering accidentally
Permission errors
---
📦 Dependencies
Install all:
pip install pyautogui pillow opencv-python pygetwindow
---
**Built for OpenClaw** - The ultimate desktop automation companion 🦞
More tools from the same signal band
Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).
Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.
The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...