Skip to content

caydyan/ComputerUseAgent-macOS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AiAgentTest1

A powerful macOS native application that enables users to control their computer through natural language commands using OpenAI's Computer Use capabilities.

Features

  • 🧠 AI-Powered Control: Uses OpenAI's GPT-4 with Computer Use tools to understand and execute computer tasks
  • 🖥️ Native macOS Interface: Clean, modern SwiftUI interface inspired by ChatGPT's design
  • 🔧 Comprehensive Control: Take screenshots, click, type, scroll, press keys, and move the mouse
  • 🛡️ Privacy-Focused: All computer control happens locally on your machine
  • ⚙️ Flexible Configuration: Support for multiple OpenAI models (GPT-4, GPT-4o, etc.)

Requirements

  • macOS 13.0 or later
  • Xcode 15.0+ (for building from source)
  • OpenAI API key
  • cliclick (optional, for enhanced mouse/keyboard control)

Installation

Option 1: Build from Source

  1. Clone or download this repository
  2. Open Terminal and navigate to the project directory
  3. Run the build script:
    ./build.sh
  4. Launch the app:
    open AiAgentTest1.app

Option 2: Using Swift Package Manager

swift run

Setup

  1. Install cliclick (recommended for better control):

    brew install cliclick
  2. Grant Permissions:

    • When first running the app, macOS will request accessibility permissions
    • Go to System Preferences > Privacy & Security > Accessibility
    • Add the AiAgentTest1 app to the list of allowed applications
  3. Configure API Key:

Usage

Simply type natural language commands like:

  • "Take a screenshot of my desktop"
  • "Open Safari and navigate to google.com"
  • "Click on the first search result"
  • "Type 'Hello World' in the active text field"
  • "Scroll down on the current page"
  • "Press the Enter key"

The AI will understand your intent, take screenshots to see the current state, and perform the necessary actions step by step.

Architecture

The application follows OpenAI's Computer Use documentation and implements:

  • Computer Interface: Handles environment-specific interactions
  • Tool System: Implements the required computer use tools (screenshot, click, type, scroll, key, move)
  • Action Handlers: Executes computer actions using both cliclick and native CGEvent APIs
  • Chat Controller: Manages the conversation flow and tool execution

Supported Actions

  • screenshot: Capture the current screen
  • click: Click at specific coordinates
  • type: Type text into the active field
  • scroll: Scroll in any direction
  • key: Press keyboard keys
  • move: Move the mouse cursor

Safety & Privacy

  • All computer control happens locally on your machine
  • Screenshots are only sent to OpenAI's API when necessary for task completion
  • Your API key is stored locally and never shared
  • The application follows OpenAI's safety guidelines for computer use

Development

The project structure:

Sources/
├── Models/          # Data models for messages and OpenAI API
├── Services/        # OpenAI API service and computer control
├── Views/           # SwiftUI views and interface
├── Controllers/     # Chat and business logic controllers
└── Extensions/      # Utility extensions

Contributing

This application implements OpenAI's Computer Use capabilities as specified in their official documentation. Contributions should maintain compatibility with the OpenAI Computer Use API specification.

License

MIT License - see LICENSE file for details.

Disclaimer

This application can control your computer through AI commands. Use with caution and always review what the AI is doing before allowing it to perform actions on important systems or data.

About

A powerful macOS native application that enables users to control their computer through natural language commands using OpenAI's Computer Use capabilities.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors