the aboard newsletter

What Are World Models?

And do we need to care about them?

by -
image of a blueish disco ball illuminating a light blue pattern on a darker blue surface.

It’s a big, simulated party in here.

There is much talk lately about world models—especially since ur-AI researcher Yann LeCun, claiming that LLMs are a dead end, left Meta to boot up his very own French World Model company, AMI Labs. It’s worth billions at the jump.

LLMs are large language models, which means that they turn huge piles of text into a network of numeric connections, then they search that network to generate yet more text. World models turn tons of representations of “reality”—for example, millions of hours of video of factories—into “world states,” which get absorbed into a huge network of numeric connections and can be used to generate more world states. 

Let’s say you want a robot to, I don’t know, knit a hat. You tell an LLM, “Hey, get a robot to knit me a hat.” If I ask Claude, here’s what I get:

Ha! I appreciate the creative request, but I don’t have any robot-knitting capabilities at my disposal. My tools are more in the “search the web, write code, browse websites, query your HubSpot” category rather than the “command physical robots to craft knitwear” department.

“Ha!” indeed, Claude, you non-knitting piece of LLM garbage, thanks for absolutely nothing. 

With a world model (note that this is the promise, not the product) you’d conceivably set up a 3D model of a robot and give it virtual knitting needles and virtual yarn and give it a prompt, like, “simulate knitting a hat until you’ve knitted a great hat.” It would use its vast network of world states, its knowledge of physics, its database of hats, its awareness of knitting patterns cadged from centuries of sweater magazines, and maybe a knowledge of the effect of humidity on yarn, to simulate making hats 10,000 times, until it knew exactly how a real-world hat would need to be made, and how to avoid dropping the needles. Then, ultimately, it might give you code to program the real hat-knitting robot—or rather a thousand hat-knitting robots—and now you would be in a position to completely corner the knitted hat market. If they made a movie about you, it would be called Triumph of the Wool.

Other, non-hat promises would be: Not just emitting text based on the medical literature, but simulating diseases. Or using a “code world model” to simulate what happens when you run code…on a computer…running code…which…okay, I don’t get it either, next one. Or instead of creating goofy images, Google will be able to create whole goofy 3D worlds you’ll want to explore for several minutes. Big thinkers are excited!

This idea—that consciousness is embodied and the brain and body are intertwined, and thus to simulate consciousness you need to simulate reality—is a steady babbling brook that runs through AI research and cognitive science over the last 70 years. Or back to Heidegger or Merleau-Ponty if you must—and yoga, of course.

In some ways, it makes perfect sense: Language describes the world, but, postmodern arguments aside, it doesn’t practically simulate the world the way, say, the Unreal Engine simulates physics for Gears of War or Fortnite. So in many ways, this big debate is an extremely high-stakes repeat of an old disagreement. Plus it’s a way for nerds to out-nerd each other and say, LLMs will never become true consciousness! AGI needs virtual digestive systems! Your bot cannot poop and is thus Cartesian and irrelevant.

An LLM apologist might say, “I can get an open-source physics library or a game library and it’ll write code for me to simulate the world and I can evaluate that code. So, you know, it might take me some steps but I’ll get there anyway. Maybe calm down a little?” 

But a world model partisan might say, “My sweet summer child, while you are begging Claude to finish even five lines of code, I will have a field of five million virtual robots testing every possible way to make a pom-pom…as soon as my funding comes through.”

And a venture capitalist might say—well, they can’t say anything, because they’re drooling so hard. It impedes their speech. They can only produce bubbles (in both meanings of the word). 

Here’s why: If you unlock world models you unlock that robot hatmaker. For decades, we’ve been cycling through the Internet of Things, the 3D printing revolution, and building robots galore (often in Boston). We’ve all seen the robots fall on the floor, and we laughed, hoping they couldn’t hear us.

The promise of the world model is that you’d be able to stop those robots falling—that you wouldn’t need all that research or all those prototypes. And so what? Well multiply it, then multiply that. Look at what LLMs can do to coding. Now apply that to automobile manufacturing. The big money is seeing that the entire world could become one enormous CNC machine. You talk into a phone and tell it what you want, and it makes it. Even if it’s, like, an airplane. Is this possible? Not today. Perhaps not at all. But it feels closer, right? Everything is possible right now, albeit mostly bad things.

My guess is that VCs are doing a lot of Google searches like “size of global textile industry,” seeing it’s in the trillions, and doing the Antonio Banderas fist-to-mouth leanback in their Herman Miller chairs. You know how Carl Sagan said, “If you wish to make an apple pie from scratch, you must first create the universe?” The idea here is that you can just go ahead and make the universe you need, out of your Digital Universe Kit, and then instruct PieBot4000 to make a perfect apple pie. Then PieDrive3000 can swing by and pick it up and use PieSight2000 to navigate to the house of a willing pie recipient. And your home PieGuy1000 can use a fork to feed it to you while you work on building more world models on deadline. And that’s how the soft rain falls.

Everyone you’d expect to be doing something with world models is here. Nvidia has Cosmos. AMI has competitors like World Labs, run by the Stanford-affiliated Fei-Fei Li. Meta is training robots by showing them videos, kind of. Google has Genie, as noted; Waymo is all in, and on and on and on. When the air starts to come out of the big LLM balloon, people will be ready to jump right into world models and keep the party going.

What do I think? Thank you for asking. I think that (1) people are excited; (2) by the big, big money; (3) and that software development is accelerating; (4) which could accelerate the timeline of world model development; (5) but the problems seem very, very hard; (6) though if it starts to come to fruition, it will mean that all knowledge work and all physical work will be done by digital machines; (7) and society is not built for that; (8) but there are many, many other things keeping me awake all night. My tech industry anxiety alarms aren’t going off yet, like they did with ChatGPT-4.

World model progress remains incremental, and the physical world is unbearably expensive to retool compared to the purely digital world. But could I be ordering a shirt via a prompt in ten years, with no humans involved until it arrives in my mailbox? It’s…a lot less ridiculous-sounding than it might have seemed before.