A thought experiment: Let’s say AI really takes off, like really, and it’s built into everything everywhere. Certainly our giant tech companies would love for that to happen—it’s why they’re investing billions of dollars into building the largest, best language models they possibly can, even if they’re not sure exactly why they’re doing it (clearly the market isn’t sure, either).
In this scenario, the main interface for using a computer becomes “conversation plus.” You type something, and the computer does something. You describe a picture and the computer draws it. You describe a spreadsheet and the computer generates it. You describe an essay and the computer writes it. Maybe you upload a bunch of PDFs first, or a URL for a web page (or a thousand web pages), and the computer reads all that before it responds.
But the basic idea is: You do something, the computer does something in return. And that’s going to change the world! But my guess is that we are just at the simple, relatively tidy beginning of something huge and messy—that this is the “command line” era of AI. Consider:
- The command-line interface was replaced by the mouse-driven windowing interface in the 1980s, which led to an absolutely vast consumer revolution in tech.
- Then, in the early 2010s, people were pretty sure that the “conversational” interface would take over the world. Apple released Siri, Microsoft released Cortana, Amazon released Alexa. They’re more useful than they were initially, but they’re not the central way we interact with computers.
- More recently, Slack developed a rich set of widgets (Blocks) that you could use to build applications. People did build lots of Slack apps, but Slack is still widely understood to be a “chat app,” not an OS.
And so forth.
I’m not saying the command-line interface was a failure—it’s still relevant and useful. It’s just that we are multi-modal humans. We use as many senses as we can. I like to interact with my computer by: Typing things into a terminal, using a text editor, talking to my phone, tapping my earphones, and listening to it, to name a few. I do lots of work on my phone, but some on a laptop or desktop, too. I use hearing, vision, and touch. (When I dabble with electronics I do sometimes test a battery by electrocuting my tongue, and when I screw up voltages blue smoke comes out and it smells bad, so I guess taste and smell are sort of in there. Not really, though.) Some people really like those Facebook Ray-Ban eyeglasses that take pictures.
It sounds reductive to say this but it’s important: Model-driven AI is a technology, not a platform, product, or solution. Think about databases: They have command line interfaces that let you issue queries. But they get interesting when you let people poke around and explore them, whether that’s Amazon’s catalog of products or Google Maps.
The reason AI grew up in the form of chatbots and through weird Discord interfaces is that command lines are almost always the easiest solution to complicated new technologies, especially when those technologies are really, really slow—like databases used to be. Input followed by output is much, much simpler to develop than the alternative, which is “computer is constantly sensing and reacting to user input.”
However, we live in an event-loop world. Your web browser, your email client, your operating system—they’re constantly looping and waiting for you. A lot of AI technologies just can’t work that fast yet, but I would expect we’re headed that way. You could walk down the street with your smart goggles on and an LLM could turn the world around you into a cartoon. Which sounds useless—but imagine if the cartoon was Pokémon and it was a new version of Pokémon GO that Pokémonized all of reality in real time. People might lose their Pikachu-addled minds.
Or a graduate student could set up an angry thesis committee of bots, each with different levels of expertise, to yell at them in real time as they defend their thesis, picking apart each argument exhaustively. Or you could simply ask your word processor to change its instructions so that when you paste text, it removes the formatting, and one of the greatest challenges of our age would finally be over. Right now you feed AI a few hundred pages and quiz it; what happens when that becomes a few million pages?
When I try to figure out where this technology is going, I don’t think about infinite conversations as much as what the world might look like if there were dozens of AI-ish things happening at once. A system that waits for input and reacts in lots of different ways—draws pictures, fills in blanks, reads web pages, makes suggestions—or one that makes software for you as you coach it along.
Because all of those things it creates for you will not just be images or paragraphs, but will actually be live software—and those things will be waiting for you to do more stuff, and reacting in their own ways. It’s exciting but also kind of unsettling, because we now know that when the tech industry gets its heart set on something it’s hard to let go—and once AI moves out of “batch mode” it’s going to be an enormous mess. No one knows how it’s really going to work. But that’s also why it rules.