summree
I Built a Real AI Jarvis That Controls My Computer
OpenAI
Riley Brown

I Built a Real AI Jarvis That Controls My Computer

⏱ 21 min video · 3 min read1 Jul 2026Worth watching
TL;DR
Riley Brown builds a voice-controlled AI desktop companion called 'Ricky' using Cursor and OpenAI's GPT-4o Realtime voice model, with zero coding experience required. The agent can search the web via Exa, generate and edit images, create Mermaid diagrams, and fully control the computer — all built iteratively through natural-language prompts to Cursor.
Key points
1
The entire app was built using Cursor with no coding experience required, iterating through a handful of natural-language prompts over one session.
2
The voice agent uses OpenAI's GPT-4o Realtime voice model for low-latency, interruptible real-time conversation with an animated face that syncs mouth movement to speech.
3
Web search is powered by the Exa API, image generation and editing use OpenAI's image model, and Mermaid.js renders structured diagrams — all callable by voice.
4
Computer use mode lets the agent shrink to a small overlay and directly control the desktop: opening apps like Codex, typing prompts, and submitting them.
5
A multi-image thumbnail workflow was built on the fly: the agent generates images in a numbered grid, allows per-slot editing by voice, and supports parallel generation.
Actionable insights
Get your OpenAI API key at openai.com/api-keys and your Exa API key from Exa's site before starting — both are required for full functionality.
Paste the full prompt (linked in the video description) into Cursor using GPT-4o or GPT-5.5 to scaffold the entire Electron desktop app in one go, then iterate with follow-up prompts for design fixes and new features.
Use computer use mode carefully — the agent defaults to asking for explicit approval before sensitive actions like typing or submitting; you can prompt Cursor to remove that restriction if you want fully autonomous control.
The project is uploaded to GitHub so you can clone it directly and feed the repo link to Cursor to recreate or customise Ricky without starting from scratch.
Extend the agent further by connecting it to email, Slack, or other APIs — Riley specifically calls out email integration and team-shared agents as high-value next steps.
Notable quotes

How much coding experience is required? Well, the answer is 0 out of 10. You can be a complete novice to coding and you'll be able to create a voice agent for your personal use or for your business.

I one thing I would love to see is someone create it connect it to their email and someone would connect it to just some of their other tools. I think that would be really cool.

Worth watching?
Worth watching the full video?
Worth watching if you want to see the live build process and watch the agent actually control a desktop in real time — all the key steps and prompts are captured here, but the footage of computer use mode and the thumbnail grid workflow is genuinely impressive to see in action.
Topics
AI & TechOpenAI

Explore more summaries on these topics →

Saved you some time? The creator still deserves a like.

Watch on YouTube →
More like this

Want this for your own channels?

Add the channels you follow. Every new video summarised and in your inbox the moment it drops. From £4/month.

Try it free