A Comforting Software Mess

By
Bauhaus marionette composed largely of black squares and rectangles.

An amazing Bauhaus marionette, from Wikipedia.

In a recent podcast, we discussed how Claude from Anthropic is now able to directly control a computer. “Developers can direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking buttons, and typing text.” This feature is called…“computer use,” which is not a great name. They should have asked their own product to name it.

This sounded pretty wild! Imagine if AI could open a web browser and fill out any form. Given that AI tech can now figure out captchas, it could be a recipe for cultural disaster/hackery—a zillion AI bots pasting AI-generated images into Instagram all day long. On the other hand, it could allow little conversations that clean up your desktop, rename files, take screenshots of websites, maybe even do stuff like “schedule an appointment with my dentist.”

So I took it for a spin, following Anthropic’s “quickstart,” and I have amazing news: It doesn’t work yet. Which is fair—they promise that it’s very experimental. It definitely kind of works! The tool can move your mouse around, and report back on what it’s seeing. It takes a lot of screenshots as it goes, to figure out what to do next. The demo/quickstart uses a very minimal installation of Linux. The whole thing feels very 1997.

Screenshot of Claude's "computer use" feature in action

This gives me a certain kind of joy, because this isn’t some magic futuristic wonder-device. This is good old software—a janky Linux desktop, running inside a web browser, with a chat window on the left that produces as many errors as good results. I tried to get it to:

  • Open a web page and copy the text on that page (worked with coaching)
  • Open a drawing program (worked) and draw a circle (did not work)
  • Run some command line utilities and review the output (worked fine)
  • Install a text editor (tried, but package manager broke)

And so forth. If you’re nerdy in this way, I’d recommend playing with this tool—I love when you can see the seams in software. If past is prologue, this is going to get better and better: You’ll be able to describe outcomes and it’ll write code and do things for you like drawing pictures, sending emails, and so forth. And when it works, it will be both very powerful and catastrophic. But for right now you can actually see all the pieces clanking noisily along, and it’s both edifying and fun.