HomeBrowseUpload
← Back to registry
// Skill profile

Desktop Control Skill

description: Advanced desktop automation with mouse, keyboard, and screen control

by alexbingquanxu-cpu · published 2026-04-01

图像生成API集成
Total installs
0
Stars
★ 0
Last updated
2026-04
// Install command
$ claw add gh:alexbingquanxu-cpu/alexbingquanxu-cpu-desktop-control-custom
View on GitHub
// Full documentation

---

description: Advanced desktop automation with mouse, keyboard, and screen control

---

# Desktop Control Skill

**The most advanced desktop automation skill for OpenClaw.** Provides pixel-perfect mouse control, lightning-fast keyboard input, screen capture, window management, and clipboard operations.

🎯 Features

Mouse Control

  • ✅ **Absolute positioning** - Move to exact coordinates
  • ✅ **Relative movement** - Move from current position
  • ✅ **Smooth movement** - Natural, human-like mouse paths
  • ✅ **Click types** - Left, right, middle, double, triple clicks
  • ✅ **Drag & drop** - Drag from point A to point B
  • ✅ **Scroll** - Vertical and horizontal scrolling
  • ✅ **Position tracking** - Get current mouse coordinates
  • Keyboard Control

  • ✅ **Text typing** - Fast, accurate text input
  • ✅ **Hotkeys** - Execute keyboard shortcuts (Ctrl+C, Win+R, etc.)
  • ✅ **Special keys** - Enter, Tab, Escape, Arrow keys, F-keys
  • ✅ **Key combinations** - Multi-key press combinations
  • ✅ **Hold & release** - Manual key state control
  • ✅ **Typing speed** - Configurable WPM (instant to human-like)
  • Screen Operations

  • ✅ **Screenshot** - Capture entire screen or regions
  • ✅ **Image recognition** - Find elements on screen (via OpenCV)
  • ✅ **Color detection** - Get pixel colors at coordinates
  • ✅ **Multi-monitor** - Support for multiple displays
  • Window Management

  • ✅ **Window list** - Get all open windows
  • ✅ **Activate window** - Bring window to front
  • ✅ **Window info** - Get position, size, title
  • ✅ **Minimize/Maximize** - Control window states
  • Safety Features

  • ✅ **Failsafe** - Move mouse to corner to abort
  • ✅ **Pause control** - Emergency stop mechanism
  • ✅ **Approval mode** - Require confirmation for actions
  • ✅ **Bounds checking** - Prevent out-of-screen operations
  • ✅ **Logging** - Track all automation actions
  • ---

    🚀 Quick Start

    Installation

    First, install required dependencies:

    pip install pyautogui pillow opencv-python pygetwindow
    

    Basic Usage

    from skills.desktop_control import DesktopController
    
    # Initialize controller
    dc = DesktopController(failsafe=True)
    
    # Mouse operations
    dc.move_mouse(500, 300)  # Move to coordinates
    dc.click()  # Left click at current position
    dc.click(100, 200, button="right")  # Right click at position
    
    # Keyboard operations
    dc.type_text("Hello from OpenClaw!")
    dc.hotkey("ctrl", "c")  # Copy
    dc.press("enter")
    
    # Screen operations
    screenshot = dc.screenshot()
    position = dc.get_mouse_position()
    

    ---

    📋 Complete API Reference

    Mouse Functions

    #### `move_mouse(x, y, duration=0, smooth=True)`

    Move mouse to absolute screen coordinates.

    **Parameters:**

  • `x` (int): X coordinate (pixels from left)
  • `y` (int): Y coordinate (pixels from top)
  • `duration` (float): Movement time in seconds (0 = instant, 0.5 = smooth)
  • `smooth` (bool): Use bezier curve for natural movement
  • **Example:**

    # Instant movement
    dc.move_mouse(1000, 500)
    
    # Smooth 1-second movement
    dc.move_mouse(1000, 500, duration=1.0)
    

    #### `move_relative(x_offset, y_offset, duration=0)`

    Move mouse relative to current position.

    **Parameters:**

  • `x_offset` (int): Pixels to move horizontally (positive = right)
  • `y_offset` (int): Pixels to move vertically (positive = down)
  • `duration` (float): Movement time in seconds
  • **Example:**

    # Move 100px right, 50px down
    dc.move_relative(100, 50, duration=0.3)
    

    #### `click(x=None, y=None, button='left', clicks=1, interval=0.1)`

    Perform mouse click.

    **Parameters:**

  • `x, y` (int, optional): Coordinates to click (None = current position)
  • `button` (str): 'left', 'right', 'middle'
  • `clicks` (int): Number of clicks (1 = single, 2 = double)
  • `interval` (float): Delay between multiple clicks
  • **Example:**

    # Simple left click
    dc.click()
    
    # Double-click at specific position
    dc.click(500, 300, clicks=2)
    
    # Right-click
    dc.click(button='right')
    

    #### `drag(start_x, start_y, end_x, end_y, duration=0.5, button='left')`

    Drag and drop operation.

    **Parameters:**

  • `start_x, start_y` (int): Starting coordinates
  • `end_x, end_y` (int): Ending coordinates
  • `duration` (float): Drag duration
  • `button` (str): Mouse button to use
  • **Example:**

    # Drag file from desktop to folder
    dc.drag(100, 100, 500, 500, duration=1.0)
    

    #### `scroll(clicks, direction='vertical', x=None, y=None)`

    Scroll mouse wheel.

    **Parameters:**

  • `clicks` (int): Scroll amount (positive = up/left, negative = down/right)
  • `direction` (str): 'vertical' or 'horizontal'
  • `x, y` (int, optional): Position to scroll at
  • **Example:**

    # Scroll down 5 clicks
    dc.scroll(-5)
    
    # Scroll up 10 clicks
    dc.scroll(10)
    
    # Horizontal scroll
    dc.scroll(5, direction='horizontal')
    

    #### `get_mouse_position()`

    Get current mouse coordinates.

    **Returns:** `(x, y)` tuple

    **Example:**

    x, y = dc.get_mouse_position()
    print(f"Mouse is at: {x}, {y}")
    

    ---

    Keyboard Functions

    #### `type_text(text, interval=0, wpm=None)`

    Type text with configurable speed.

    **Parameters:**

  • `text` (str): Text to type
  • `interval` (float): Delay between keystrokes (0 = instant)
  • `wpm` (int, optional): Words per minute (overrides interval)
  • **Example:**

    # Instant typing
    dc.type_text("Hello World")
    
    # Human-like typing at 60 WPM
    dc.type_text("Hello World", wpm=60)
    
    # Slow typing with 0.1s between keys
    dc.type_text("Hello World", interval=0.1)
    

    #### `press(key, presses=1, interval=0.1)`

    Press and release a key.

    **Parameters:**

  • `key` (str): Key name (see Key Names section)
  • `presses` (int): Number of times to press
  • `interval` (float): Delay between presses
  • **Example:**

    # Press Enter
    dc.press('enter')
    
    # Press Space 3 times
    dc.press('space', presses=3)
    
    # Press Down arrow
    dc.press('down')
    

    #### `hotkey(*keys, interval=0.05)`

    Execute keyboard shortcut.

    **Parameters:**

  • `*keys` (str): Keys to press together
  • `interval` (float): Delay between key presses
  • **Example:**

    # Copy (Ctrl+C)
    dc.hotkey('ctrl', 'c')
    
    # Paste (Ctrl+V)
    dc.hotkey('ctrl', 'v')
    
    # Open Run dialog (Win+R)
    dc.hotkey('win', 'r')
    
    # Save (Ctrl+S)
    dc.hotkey('ctrl', 's')
    
    # Select All (Ctrl+A)
    dc.hotkey('ctrl', 'a')
    

    #### `key_down(key)` / `key_up(key)`

    Manually control key state.

    **Example:**

    # Hold Shift
    dc.key_down('shift')
    dc.type_text("hello")  # Types "HELLO"
    dc.key_up('shift')
    
    # Hold Ctrl and click (for multi-select)
    dc.key_down('ctrl')
    dc.click(100, 100)
    dc.click(200, 100)
    dc.key_up('ctrl')
    

    ---

    Screen Functions

    #### `screenshot(region=None, filename=None)`

    Capture screen or region.

    **Parameters:**

  • `region` (tuple, optional): (left, top, width, height) for partial capture
  • `filename` (str, optional): Path to save image
  • **Returns:** PIL Image object

    **Example:**

    # Full screen
    img = dc.screenshot()
    
    # Save to file
    dc.screenshot(filename="screenshot.png")
    
    # Capture specific region
    img = dc.screenshot(region=(100, 100, 500, 300))
    

    #### `get_pixel_color(x, y)`

    Get color of pixel at coordinates.

    **Returns:** RGB tuple `(r, g, b)`

    **Example:**

    r, g, b = dc.get_pixel_color(500, 300)
    print(f"Color at (500, 300): RGB({r}, {g}, {b})")
    

    #### `find_on_screen(image_path, confidence=0.8)`

    Find image on screen (requires OpenCV).

    **Parameters:**

  • `image_path` (str): Path to template image
  • `confidence` (float): Match threshold (0-1)
  • **Returns:** `(x, y, width, height)` or None

    **Example:**

    # Find button on screen
    location = dc.find_on_screen("button.png")
    if location:
        x, y, w, h = location
        # Click center of found image
        dc.click(x + w//2, y + h//2)
    

    #### `get_screen_size()`

    Get screen resolution.

    **Returns:** `(width, height)` tuple

    **Example:**

    width, height = dc.get_screen_size()
    print(f"Screen: {width}x{height}")
    

    ---

    Window Functions

    #### `get_all_windows()`

    List all open windows.

    **Returns:** List of window titles

    **Example:**

    windows = dc.get_all_windows()
    for title in windows:
        print(f"Window: {title}")
    

    #### `activate_window(title_substring)`

    Bring window to front by title.

    **Parameters:**

  • `title_substring` (str): Part of window title to match
  • **Example:**

    # Activate Chrome
    dc.activate_window("Chrome")
    
    # Activate VS Code
    dc.activate_window("Visual Studio Code")
    

    #### `get_active_window()`

    Get currently focused window.

    **Returns:** Window title (str)

    **Example:**

    active = dc.get_active_window()
    print(f"Active window: {active}")
    

    ---

    Clipboard Functions

    #### `copy_to_clipboard(text)`

    Copy text to clipboard.

    **Example:**

    dc.copy_to_clipboard("Hello from OpenClaw!")
    

    #### `get_from_clipboard()`

    Get text from clipboard.

    **Returns:** str

    **Example:**

    text = dc.get_from_clipboard()
    print(f"Clipboard: {text}")
    

    ---

    ⌨️ Key Names Reference

    Alphabet Keys

    `'a'` through `'z'`

    Number Keys

    `'0'` through `'9'`

    Function Keys

    `'f1'` through `'f24'`

    Special Keys

  • `'enter'` / `'return'`
  • `'esc'` / `'escape'`
  • `'space'` / `'spacebar'`
  • `'tab'`
  • `'backspace'`
  • `'delete'` / `'del'`
  • `'insert'`
  • `'home'`
  • `'end'`
  • `'pageup'` / `'pgup'`
  • `'pagedown'` / `'pgdn'`
  • Arrow Keys

  • `'up'` / `'down'` / `'left'` / `'right'`
  • Modifier Keys

  • `'ctrl'` / `'control'`
  • `'shift'`
  • `'alt'`
  • `'win'` / `'winleft'` / `'winright'`
  • `'cmd'` / `'command'` (Mac)
  • Lock Keys

  • `'capslock'`
  • `'numlock'`
  • `'scrolllock'`
  • Punctuation

  • `'.'` / `','` / `'?'` / `'!'` / `';'` / `':'`
  • `'['` / `']'` / `'{'` / `'}'`
  • `'('` / `')'`
  • `'+'` / `'-'` / `'*'` / `'/'` / `'='`
  • ---

    🛡️ Safety Features

    Failsafe Mode

    Move mouse to **any corner** of the screen to abort all automation.

    # Enable failsafe (enabled by default)
    dc = DesktopController(failsafe=True)
    

    Pause Control

    # Pause all automation for 2 seconds
    dc.pause(2.0)
    
    # Check if automation is safe to proceed
    if dc.is_safe():
        dc.click(500, 500)
    

    Approval Mode

    Require user confirmation before actions:

    dc = DesktopController(require_approval=True)
    
    # This will ask for confirmation
    dc.click(500, 500)  # Prompt: "Allow click at (500, 500)? [y/n]"
    

    ---

    🎨 Advanced Examples

    Example 1: Automated Form Filling

    dc = DesktopController()
    
    # Click name field
    dc.click(300, 200)
    dc.type_text("John Doe", wpm=80)
    
    # Tab to next field
    dc.press('tab')
    dc.type_text("john@example.com", wpm=80)
    
    # Tab to password
    dc.press('tab')
    dc.type_text("SecurePassword123", wpm=60)
    
    # Submit form
    dc.press('enter')
    

    Example 2: Screenshot Region and Save

    # Capture specific area
    region = (100, 100, 800, 600)  # left, top, width, height
    img = dc.screenshot(region=region)
    
    # Save with timestamp
    import datetime
    timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
    img.save(f"capture_{timestamp}.png")
    

    Example 3: Multi-File Selection

    # Hold Ctrl and click multiple files
    dc.key_down('ctrl')
    dc.click(100, 200)  # First file
    dc.click(100, 250)  # Second file
    dc.click(100, 300)  # Third file
    dc.key_up('ctrl')
    
    # Copy selected files
    dc.hotkey('ctrl', 'c')
    

    Example 4: Window Automation

    # Activate Calculator
    dc.activate_window("Calculator")
    time.sleep(0.5)
    
    # Type calculation
    dc.type_text("5+3=", interval=0.2)
    time.sleep(0.5)
    
    # Take screenshot of result
    dc.screenshot(filename="calculation_result.png")
    

    Example 5: Drag & Drop File

    # Drag file from source to destination
    dc.drag(
        start_x=200, start_y=300,  # File location
        end_x=800, end_y=500,       # Folder location
        duration=1.0                 # Smooth 1-second drag
    )
    

    ---

    ⚡ Performance Tips

    1. **Use instant movements** for speed: `duration=0`

    2. **Batch operations** instead of individual calls

    3. **Cache screen positions** instead of recalculating

    4. **Disable failsafe** for maximum performance (use with caution)

    5. **Use hotkeys** instead of menu navigation

    ---

    ⚠️ Important Notes

  • **Screen coordinates** start at (0, 0) in top-left corner
  • **Multi-monitor setups** may have negative coordinates for secondary displays
  • **Windows DPI scaling** may affect coordinate accuracy
  • **Failsafe corners** are: (0,0), (width-1, 0), (0, height-1), (width-1, height-1)
  • **Some applications** may block simulated input (games, secure apps)
  • ---

    🔧 Troubleshooting

    Mouse not moving to correct position

  • Check DPI scaling settings
  • Verify screen resolution matches expectations
  • Use `get_screen_size()` to confirm dimensions
  • Keyboard input not working

  • Ensure target application has focus
  • Some apps require admin privileges
  • Try increasing `interval` for reliability
  • Failsafe triggering accidentally

  • Increase screen border tolerance
  • Move mouse away from corners during normal use
  • Disable if needed: `DesktopController(failsafe=False)`
  • Permission errors

  • Run Python with administrator privileges for some operations
  • Some secure applications block automation
  • ---

    📦 Dependencies

  • **PyAutoGUI** - Core automation engine
  • **Pillow** - Image processing
  • **OpenCV** (optional) - Image recognition
  • **PyGetWindow** - Window management
  • Install all:

    pip install pyautogui pillow opencv-python pygetwindow
    

    ---

    **Built for OpenClaw** - The ultimate desktop automation companion 🦞

    // Comments
    Sign in with GitHub to leave a comment.
    // Related skills

    More tools from the same signal band