Building Is Better Than Chatting

GenAI works best when you can combine its outputs with problem-solving code.

by Paul Ford - October 16, 2024

GenAI isn’t reliable. If you ask it to make a list of very big volcanos, it’ll make a solid list. But as you get more specific—asking it for individual volcano heights, or diameters, or other kinds of lava lore—it will produce equivalently specific results, but that specificity isn’t the same as accuracy. The more specific the results, the more you need to fact check.

One of my hopes with this new technology is that I’ll be able to upload a file and do substantive things with it. For example, I’d like to upload a really long PDF and generate an index of concepts. In a perfect world I’d be able to add that index back to the end of the PDF with clickable links, and have a much better PDF produced as a result. But with the current crop of tools, you can’t be sure that it really did extract relevant terms, and the “generate PDF” part is hard to pull off. You can use chat interfaces to get a sense of the PDF. An LLM can describe contents, list data types, write documentation, and provide various kinds of guidance. These are modern miracles—but they don’t seal the deal.

When I find a wall that AI can’t jump over, I change context. I stop asking questions and start building tools. In the example above, after getting it to describe the PDF, I might say: “Write a Python program that will open a long PDF, parse the contents, extract terms, and build a multi-page index with links to the original terms. Include instructions on how to use the code, with documentation. Make it executable on the command line.”

The code it produces will do roughly that. Maybe with bugs, and not with perfect fidelity, but it’ll save me days of researching libraries and cutting-and-pasting code and figuring out how to use command-line frameworks. I still have to read and understand the code, but it’s much easier to read than write. And once you solve one problem, you can ask for more specific things. I can have the robot write code that breaks the document into chunks, then farm those chunks to simpler AI tools that run on my local machine.

You can start with a large, messy system like ChatGPT or Claude—which will spew information and code about anything—and use it to evolve a small, useful system that is highly reliable and well-understood. You can create classic, well, software.

My guess is that we’ll see more and more of this pattern, because it combines two worlds effectively: The new world where you say something and the computer generates all kinds of artifacts in response, and the old world where you build reusable components and tools that solve very specific problems. In some ways, tools like Cursor or GitHub Copilot are the first developer-friendly attempt to bridge those two worlds. At Aboard, we’re hoping to build a bridge between AI, code, and businesspeople. (A three-way bridge may sound complex, but we’re New Yorkers and used to weird traffic patterns).

Given the way LLMs work, we may never really be able to trust them to be accurate—they can only simulate thinking and reasoning. But a computer can’t tell the difference between simulated code or human-written code; it’ll just run it. This means you can solve all kinds of problems way more quickly than you used to, because computers are many things, good and bad, but at their best they’re really, really fast.