Use Worse Tools

I can’t really understand LLMs unless I keep taking steps backwards.

by Paul Ford - January 22, 2025

I’ve been on this big quest to learn how to write software with AI. Not so much because I need to write code; I’m a co-founder, which means I’m in sales. But I used to be a programmer, and I do want to fully understand how these new systems work, and their limitations. It’s surprisingly hard to get good guidance here. The space is evolving very quickly. You just have to try stuff.

For a while I was using Claude and ChatGPT, which are the big players in the space. I used them through “aider,” an intermediary program that runs in the terminal on your computer. It sits between you and the chatbot, and turns the chatbot’s rambling utterances into artifacts like code files or documentation. You ask it to write some code, it asks the chat for the code, then it makes a directory and saves the new code. Often it works!

Recently a new competitor to those big LLMs, DeepSeek, has been gaining ground. It’s from a Chinese company, made with cheaper chips—so it cost much, much less to create, and it’s quite good. Not as good as Claude at writing code, but much cheaper to run. I gave it a try. Several weeks later, I’m still using it, even though it’s clunkier in spots.

I wrote about it a little bit before, and it is incredibly cheap—but frankly I’ve only saved $30 using it, which is what programmers typically spend on edgy canned water per day. It’s not the money. I’m using DeepSeek because I’m learning more from a slightly less advanced model, and I’m trying to understand the math behind LLMs (the book Why Machines Learn is good for that).

DeepSeek has more obvious seams than Claude or ChatGPT. Here are some:

It forgets more things. Sometimes I ask it to debug some code and it will forget that we’re coding and just keep trying to find more bugs.
It’s not as good at keeping track of where different code objects are. It’s bad at importing stuff and I need to go in and figure it out myself.
It’s very easy for it to get caught in loops in general.

Does it sound like I’m complaining? I’m not. I really like working with DeepSeek, because it’s more…visible. The current crop of technologies can seem like a magic show. Let me draw you a picture! Write you some code? Didn’t work? Have more! I built you an app right here in the browser. They have so many tricks and the magic comes so fast that by the time you learn what’s going on, they’ve updated the entire model. So going just one step back has helped me stop focusing on the shiny new things and understand more.

The big thing I’m learning is how to manage and think about context windows. LLMs don’t know what they know. They forget as they go along. To use them effectively, you need to be constantly reminding it of what it learned. The way you do this, I’m finding, is to mentally organize your session in an outline.

Let’s say you are (1) Building a to-do app; (2) using a bunch of Javascript libraries; (3) working on the “new task” functionality. It writes the code! And then you go: “Let’s do the next part, and build the ability to delete a task.” The last one went really well, but on this next step, it seems to have forgotten everything it learned. But if you remind it of all the levels in the hierarchy, it will work. Every time you do something new, you need to go, “We are building a to-do-list and we are using these libraries…and now we’re going to work on task deletion. Please review the last bit of code and propose a plan.” And the results tend to be better! So much of technology means traversing up and down a hierarchical tree.

Anyway, as I wrote this newsletter, DeepSeek released a much better model, called R1. I was tempted to ignore it, but my desire to upgrade things is so strong that I gave up. I’m using it now. It seems to be thinking much harder. It’s still very cheap.

But I am going to start looking for smaller, dumber, more specific models that run on my laptop (DeepSeek is open sourced, but too big for your personal computer). I don’t think I can really understand this technology unless I keep taking steps backwards. Otherwise it’s too easy to get impressed with what is, after all—not a magical star-baby come to save us, but just a big bunch of dumb software spitting out tokens.