Open Interface Introduction
Open Interface is a revolutionary software that offers full autopilot for all computers using LLMs (Large Language Models). By sending user requests to an LLM backend (such as GPT-4V), Open Interface can figure out the required steps to complete a task and automatically execute them by simulating keyboard and mouse input. It can also course-correct by sending the LLMs a current screenshot of the computer as needed.
Open Interface Features
Self-Driving Software for All Your Computers
Open Interface operates by sending user requests to an LLM backend to determine the necessary steps. It then executes these steps autonomously, simulating keyboard and mouse input. This feature makes it a true self-driving software for your computer.
Demo
To see Open Interface in action, you can request it to "Make me a meal plan in Google Docs." The software will then autonomously carry out the task.
Install
MacOS
For MacOS users, download the MacOS binary from the latest release. Unzip the file and move Open Interface to the Applications Folder. For Apple Silicon M-Series Macs, Open Interface will ask for Accessibility and Screen Recording access. For Intel Macs, you might need to manually allow Open Interface in System Preferences -> Security and Privacy.
Linux
Linux users can download the Linux zip file from the latest release. Extract the executable and run it from the Terminal.
Windows
Windows users can download the Windows zip file from the latest release. Unzip the folder, move the exe to the desired location, and double-click to open.
Setup
To use Open Interface, you need to set up the OpenAI API key. Get your OpenAI API key and save it in Open Interface settings. Optionally, you can also set up a Custom LLM in the Advanced Settings.
Open Interface Use Cases
Stuff It’s Bad At (For Now)
Currently, Open Interface may struggle with accurate spatial-reasoning and clicking buttons, keeping track of itself in tabular contexts like Excel and Google Sheets, and navigating complex GUI-rich applications.
Future
With better models trained on video walkthroughs like Youtube tutorials, Open Interface will be able to handle more complex tasks, such as creating bass samples in Garage Band, editing code on Github, creating party playlists on Spotify, and making montages in iMovie.
System Diagram
The system diagram of Open Interface shows how it interacts with the GUI, Core, LLM (GPT-4V), Interpreter, and Executer to carry out tasks.
Star History
Open Interface has seen 8 releases so far, with continuous improvements and updates.
Open Interface FAQs
How much does Open Interface cost?
The cost ranges from $0.05 to $0.20 per user request. However, this will be much lower in the near future once GPT-4V enables assistant/stateful mode.
Can I interrupt the app?
Yes, you can interrupt the app anytime by pressing the Stop button or by dragging your cursor to any of the screen corners.
Can Open Interface work with multiple monitors?
Open Interface can only see your primary display when using multiple monitors. If the cursor/focus is on a secondary screen, it might keep retrying the same actions as it is unable to see its progress.