Earlier today, OpenAI posted a video on social media announcing the $6.4 billion acquisition of io, Jony Ive’s “nascent hardware company” as well as a collaboration with his design studio LoveFrom. The video showcased a joint interview over coffee at Francis Ford Coppola’s Café Zoetrope about the future not just of AI, but of our overall relationship with technology.
Ive is famous for these sorts of “deep thinker” musings, challenging the fundamental nature of the human relationship with the physical world of product design. He couldn’t have picked a better person to go into business with than Altman, who co-founded OpenAI nine years ago when AI was more Skynet than helping write your term paper, and then landed center stage after the November 2022 public launch of ChatGPT. They are both futurists more interested in pushing boundaries than providing attractive quarterly earnings reports.
In the video, Altman mentions how the human-machine interface has remained relatively unchanged for almost fifty years. He described wanting to ask ChatGPT something from an earlier part of their conversation: “I would get on my laptop, I’d open it up, I’d launch a web browser, I’d start typing, and I’d have to, like, explain that thing. And I would hit enter, and I would wait, and I would get a response.” The launch of the iPhone nearly 18 years ago provided the most recent form factor alternative, but still relies on an app and (for most people) typing into a text box. Apple’s Siri provided another alternative (albeit a limited one) with its launch in 2011, but it wasn’t until over a decade later that OpenAI released Advanced Voice Mode which is the first real AI-powered voice assistant.
We’ve had to wait a long time for this. Star Trek had a voice interface with the U.S.S. Enterprise computer as far back as the first season in the 1960s, and it’s been a staple of the Enterprise computer ever since. The lack of such an interface provided a comic moment in the 1986 movie Star Trek IV: The Voyage Home when Scotty picks up a mouse like a microphone and attempts to verbally greet a Macintosh Plus at Plexicorp. When told to use the keyboard he retorts “a keyboard… how quaint”.
Indeed, four decades later, Scotty would still find our relationship with computers mostly confined to a “quaint” keyboard with voice assistants still rather a curiosity for setting cooking timers, asking for map directions or asking to call someone while driving. One modern alternative, the Rabbit r1, ditched the keyboard entirely, but received mixed reviews despite winning major design awards. OpenAI’s acquisition of io puts Silicon Valley heavyweights under the same well-funded roof. Altman and Ive bring both visionary ambition and the credibility to attract funding for a moonshot.
So, what would this new human-machine interface look like? OpenAI will reportedly not be releasing any of its collaborations with Ive and LoveFrom until 2026, but we can guess. Traditional peripherals like keyboards, mice, and even touchscreens will likely become fallback options rather than primary interfaces. The replacement will likely be the most natural mode of communication for human beings: your voice.
Humans have been speaking to one another for at least the last 50,000 years, and our bodies and brains have adapted to it over this time. It’s natural, and we’re predisposed to it. Babies do not need to be explicitly taught how to speak–it’s instinctual, like learning to walk. Provided they’re in a social environment and not isolated, and developmentally typical, babies pick up on speech through the process of language acquisition.
Writing and reading are significantly more recent going back only about 5,000 years. While less natural than speaking and listening, it is in many ways much more efficient. Humans can read on average 2-3x faster than listening. It's also much better for things like skimming or scanning such as when reviewing a dashboard. With reading or observing, you can access information selectively versus listening which is a linear mode of communication.
Despite this efficiency gain, inputting information via text requires learning to type, and navigate a graphical user interface. Storing information requires typing letters and numbers into forms, emails, documents, spreadsheets, ERP systems, search engines, URL bars and the list goes on and on. All these things need to be built and maintained and while originally optimized for one workflow or perhaps a handful of different ways of presenting data, one small desired change in how it’s used means weeks or months of development work for what might be a simple request to consume data in a unique way one time.
Voice + screen interfaces thus present the best of both worlds. Instead of building a program to input, store, process, and present information, you can just speak naturally. The applications for this are obvious when you think about it paired with agentic AI. Instead of writing a query, you might one day be able to ask AI to “Check the impact of the Shanghai port congestion on our Q3 delivery schedule for our high-margin SKUs, and show me which suppliers are at risk of missing their delivery windows.” Instead of listening to an AI voice rattle off a long list of suppliers, the user would be presented with an easily scannable table of suppliers that they could also ask to be sorted using their voice.
None of this graphical display would need to be preprogrammed, designed in advance or require any input by the user. Rather, it could be created on request by AI. The user could also choose to drill down in our hypothetical example: “Show me the top three suppliers with the highest cumulative revenue at risk due to this delay, and break it down by affected SKUs, delivery method, and inventory levels at our North American distribution centers” and be presented with that information, instantly without a query, filtering, or even a prebuilt program. Provided your AI agent has access to a data lake containing this information, it can take your natural language “query”, process the data available to it, and then show you exactly what you asked for.
Voice does have some limitations such as needing to speak out loud potentially in a public place where a keyboard fallback would be needed. Eventually, Brain-Computer Interfaces (BCI) could eliminate the need to speak out loud, and instead the user would just think what they want to accomplish. Traditionally direct neurological connections involved brain implants or inserting a sensor close to the brain through a blood vessel, but several companies including Meta are working on non-invasive BCI including using magnetoencephalography (MEG) and electroencephalography (EEG). This technology already exists in limited form, but more powerful iterations are likely still several years away. In the meantime, traditional keyboard backup in noisy or noise sensitive areas will likely remain the go-to for technology companies pursuing voice AI.
What OpenAI’s acquisitions of io and collaboration with LoveFrom ultimately produce remains to be seen. Altman called one of their early prototypes “the coolest piece of technology the world will have ever seen” which could either be Steve Jobs-like hyperbole, or a serious assessment by one of the most influential technology leaders today. One thing is for sure though, the era of interacting with technology via keyboard and mouse as a primary interface is coming to an end. After over sixty years of waiting, it’s time to say “Hello Computer…”