/claim #21
Overview
This PR implements a new Rust example that allows users to easily capture the UI context of their currently focused application and process it with a local LLM. The example addresses the bounty request #21, which called for a tool that can:
- Detect keyboard shortcuts
- Capture application UI trees
- Send them to a local LLM
- Copy the processed context to the clipboard
Implementation Details
Core Functionality
Global Hotkey Detection
- Uses
global-hotkey
to register and detect global keyboard shortcuts
- Default shortcut:
Ctrl+Shift+C
(configurable via CLI)
- Keyboard events processed asynchronously for responsiveness
UI Tree Capture
- Implements Windows-specific UI capture via Windows Accessibility APIs
- Creates a traversable UI tree of the focused application
- Captures properties:
role
, name
, value
, and hierarchical structure
- Includes a fallback for non-Windows platforms for demo purposes
Local LLM Processing with Ollama
- Integrates with Ollama for local LLM processing
- Default model:
Gemma 2B
(configurable)
- Customizable system prompt
- Uses structured JSON for reliable parsing by LLM
Clipboard Integration
- Uses
arboard
to copy output to clipboard
- Immediate user feedback after copy
- Enables pasting into other apps or AI tools seamlessly
Project Structure
src/main.rs
– Core app logic, hotkey handling, Ollama integration
src/windows_ui.rs
– Windows-specific UI capture logic
README.md
– Setup and usage documentation
run.bat
– Windows helper script
Benefits
- Privacy-Focused: All processing is local—no external server calls
- Cross-Platform: Works on Windows and macOS (advanced capture on Windows)
- Configurable: Custom model, hotkey, and system prompt
- Easy to Use: Single hotkey triggers full pipeline, result copied to clipboard
How It Works
- Application listens for hotkey (
Ctrl+Shift+C
by default)
- On trigger:
- Captures UI structure of the focused application
- Sends UI data to local Ollama instance
- LLM (e.g., Gemma 2B) generates human-readable summary
- Result is copied to clipboard
Users can then paste the context into any AI tool, IDE, or notes.
Testing
Tested on both Windows and macOS:
- Windows: Captures detailed UI info from various apps
- macOS: Fallback logic runs correctly
- LLM Processing: Works reliably with Gemma 2B, also tested with Llama & Mistral