Gideon Lewis-Kraus: How Anthropic Sees Claude
Public opinion on LLMs like Claude varies widely—but how do the people who actually work at Anthropic think about it? On this week’s podcast, Paul and Rich are joined in the studio by New Yorker staff writer Gideon Lewis-Kraus to discuss his recent feature, which he reported from within Anthropic HQ. They discuss the piece, and then they hash out the real questions: What’s the correct literary metaphor for an LLM? Does an AI company really need psychologists for its chatbots? And, perhaps most importantly, should you be polite to Claude?
Show Notes
- That’s Gideon Lewis-Kraus, and his latest piece is “What Is Claude? Anthropic Doesn’t Know, Either.”
- The article on Google Brain that he reported a decade ago: “The Great A.I. Awakening.”
- To learn more about LessWrong, Gideon published “Slate Star Codex and Silicon Valley’s War Against the Media” in The New Yorker in 2020.
Transcript
Paul Ford: I’m Paul Ford.
Rich Ziade: And I’m Rich Ziade.
Paul: And this is The Aboard podcast, the podcast about how AI is changing the world of software. Rich, how are you?
Rich: I’m okay.
Paul: I love when we do this because on the video, there’s another person just in the room while we’re pretending they’re not here.
Rich: Yes.
Paul: So let’s throw away that subterfuge and just get right into it. Gideon Lewis-Kraus, thank you for coming.
Gideon Lewis-Kraus: Thanks for having me, guys.
Paul: For those who don’t know, Gideon is an amazing journalist and staff writer at The New Yorker, which, if you don’t know it, it’s a very good magazine. He went and spent a lot of time with our best friend. What’s our best friend’s name?
Rich: Claude.
Paul: Yeah. [laughing] And so he can tell us, actually, all about that. Let’s play our beautiful theme song, and then you and I actually aren’t going to have to talk too much, which is a gift to the listener.
Rich: Phew!
[intro music]
Paul: Gideon, hi!
Gideon: Hi!
Paul: Oh my God. It’s good to see you.
Gideon: It’s great to see you guys, too.
Paul: So, staff writer at The New Yorker. That’s a nice job.
Gideon: Sure.
Paul: You go in there, there’s an office, there’s coffee.
Gideon: There is coffee.
Paul: And you say, “Guys—” Because that’s what you say at The New Yorker. You’re like, “Hey, guys.”
Gideon: Mmm.
Paul: “There’s this thing happening. There’s this thing, and I want to write about it.” Or did they send you?
Gideon: No, no, no. I wanted to do this.
Paul: Okay, so.
Gideon: Yeah.
Paul: What did you want to do? Tell the people what you wanted to do.
Gideon: All right, I can start a little bit further back, if that’s okay with you guys.
Paul: We got a lot of time.
Gideon: So in 2016, I spent most of the year—I was writing for the Times Magazine then—hanging out at Google Brain. I went, like, once a month for, like, a week a month, over about, I don’t know, eight or nine months. And I was writing about the introduction of their first product with deep learning in it, which was Google…neural machine translation.
Paul: Mmm hmm.
Gideon: And it was kind of in, like, the Paleolithic of deep learning at this point.
Paul: Although it seemed like the future at the time.
Gideon: It did seem like the future.
Paul: But it was, I mean, for people, you know, we’re so used to this new world, but, like, that was a pretty nerdy thing to go do, even as a journalist.
Gideon: Yes.
Paul: Yeah.
Gideon: Yeah.
Paul: Okay, so you went and spent some time there.
Gideon: Well, yeah, and, I mean, I was particularly interested in kind of the history of these ideas, like, ideas referred to generally as connectionism that had kind of started in computer science departments, then migrated to psychology departments, and then kind of came back to computer science. Like, that was the stuff I was interested in at the time. And it was a great experience.
It was also, like, pre-transformer. So it was like before anybody had this sense that, like, language models were going to be, like, the main bet that people were going to be making. Then I followed this stuff for a couple of years because I was, like, you know, you do these things and you spend a year learning about something and it’s like you feel bad if you just like, throw it into the ocean. So—
Paul: You meet people, they’re on LinkedIn.
Gideon: Yeah, yeah.
Paul: You want to see what’s going on.
Gideon: So I kept following it and then I think I’m, like, maybe the only person in the world where when we got to ChatGPT, or maybe it was even a little bit before that, maybe it was GPT-3. Like, that was when I stopped paying attention. Like, but other people started paying attention, is when I stopped paying attention. [laughter]
And like, there was no real reason. And it just like, happened incrementally where like, all of a sudden I just like, wasn’t paying attention anymore to this thing that I had been paying attention to for a while. And I realized I stopped myself at a certain point and thought, like, this is something that, like, I know something about and I should be interested in. Like, why did I stop paying attention?
It was because, I realized that, like, I was so bored by the discourse that it was, like, we had gotten into one of these, like, cul de sacs where it just felt, like, at least in public, right? I mean, I’m not talking about everybody. I’m talking about kind of, like, the modal conversation. At least in public, it was like you had some people yelling like, “Oh, it’s all, like, fake and bullshit and not real.” And then you had the other people saying, “Either this is gonna be wholly transformative for good or for ill.” And it just felt, like, it was one of these merry-go-rounds where it’s like, this time if I yell a little louder, Gary Marcus is gonna believe me or whatever.
Paul: I like to write and talk about technology, and I do it in the safe confines of this office, where I’m building an AI company, for the most part. Same with Rich. It’s really hard to go out, because people are just, so many assumptions are baked in, and it’s exhausting.
Gideon: Yeah.
Paul: Every conversation is either someone saying, “You have missed that Jesus is coming back.”
Gideon: Yeah.
Paul: Or, “You represent Satan and you just are like, well, something’s happening.” The thing for me, I’ll tell you, the thing for me was like, a billion people have used this.
Gideon: Yeah.
Paul: We have to kind of talk about that.
Gideon: Yeah.
Paul: Like, so here we are, right?
Rich: At that time, a billion hadn’t used it yet.
Gideon: Right, right.
Rich: At that time. What did you think it was?
Gideon: What did I think it was?
Rich: Jesus?
Gideon: No. No! I thought it was like, it was like, it seemed like it was a really interesting tool.
Paul: You’re named Gideon.
Gideon: [laughing] And it was just, you know, it seemed like something worth paying attention to. Like, who knew where it was going to go back then?
Rich: It was interesting.
Gideon: Well, but it was—well.
Paul: I mean, if you can find Google DeepMind interesting, right? [laughter] No, seriously, like, that is, like, quanty, mathy.
Gideon: Yeah.
Paul: Like, “Hey, look, we’re doing probabilities and it’s cool for, I don’t know, mapping, right?” Like, so all of a sudden this starts cropping up.
Gideon: So I kind of stopped paying attention. And I was like, this is something that I’m not gonna have an opinion about. I have no opinion about this. I willfully have no opinion about it.
Paul: You stuck to it!
Gideon: And then, but, you know, I would listen to podcasts occasionally and think, like, I should get interested in this again. I just can’t summon any energy to be interested in this. And then it was like—
Paul: People don’t know this about journalists. You actually have to force yourself to care about something.
Gideon: Well, if you force yourself too hard to care about something, it’s not good. Journalism is only good if you actually care about what you’re doing.
Paul: But you don’t natively care.
Gideon: Well, you have to find something you natively care about. Right?
Rich: You can’t decide that for him.
Paul: No, no. But like, you actually are like, you know, as a journalist, you kind of have to go in different directions, but you may not care at first. You have to, like, figure out what you care about.
Gideon: Yes.
Paul: Like, people don’t get this. You don’t wake up like people who go to a job that’s normal and go like, “Okay, I better like, you know, read this one blog and get up to date on stuff.” Like, you have to kind of choose where your brain’s going to take you.
Rich: The disinterest is fascinating. Is because it was just so shouty, and everybody had an opinion, and it was just like, what—
Gideon: Well, I just didn’t know where all of the confidence was coming from. [laughter] I was like, “How are you guys all so confident about what’s gonna happen? Like, this is brand-new and it’s like nothing we’ve ever dealt with before.” And I was, and unearned confidence on all sides is something that just, like, deeply turns me off from any conversation.
Paul: Fair enough. And actually skipping a step, right, because you write a lot about sort of what it’s like inside of Anthropic, but do you think people were seeing models sort of do cool stuff, but it wasn’t ready for primetime yet and they were correctly confident in retrospect, or were they just winging it?
Gideon: No, I mean, I think they definitely saw stuff happen.
Paul: Okay.
Gideon: I mean, like, this is the whole story about like kind of how Google squandered its lead. They were the ones who had, like, Meena first. Like, they could have, you know, they had, like, years, they were years ahead of anybody else.
Paul: So people—I don’t remember the year that the paper came out, but, like, there was this moment where like all of this transformer-based sort of, the way that LLMs are built—
Gideon: Yeah.
Paul: Like, Google really had it in hand.
Gideon: Yeah.
Paul: But they couldn’t productize it like a Google product. It was just too weird and too random.
Gideon: Well, and they also had the brand to think about.
Paul: Yeah.
Gideon: So like, it’s no problem for ChatGPT to be, like, I don’t know, this thing like hallucinates and lies and it’s weird and unreliable but, like, it’s cool to play with. Like, Google couldn’t, Google couldn’t do that.
Rich: Things are going well, too. It’s a massive business that is—
Gideon: Yeah.
Rich: Steady as she goes.
Gideon: Yeah, exactly. And like they knew back then that this was going to, like, potentially wipe out their search revenue. I mean like, that’s, it’s all, that’s a very interesting story. And actually the way that then they got their shit together is a really interesting story, too.
Paul: It is, because it’s not the Xerox story. Right?
Gideon: Right.
Paul: Where like, “Oh, wow, we invented the revolution, let’s just license it out.”
Gideon: Yeah.
Paul: They, kind of a point of pride, Gemini is not quite where it needs to be yet. But it’s going to get there.
Rich: It’s a sprint right now.
Gideon: Well, that’s actually one of the interesting things about talking to people in the industry is that when so many people I think in public had kind of, like, written Google or Gemini off. Like, last May, and this was still when it was, I don’t know, Gemini 2.5 or something, like, the people in Anthropic were like, “No, no, the next Gemini is going to be really good.” Like, they knew.
Paul: Well, and they all know each other, and—
Gideon: Yeah, they all know each other.
Paul: There’s a trillion dollars in dry powder to get this right.
Gideon: Yeah. And it’s, like, 400 people, you know, it’s, like, they’ve all worked together somewhere.
Paul: Yeah.
Rich: It’s not a guarantee though. Meta, I mean, it’s kind of widely known now, sort of missed it. They took a big, I mean, crazy job offer packages and all this stuff, and they missed it
Paul: No, but there’s an actual business reason for that, which is that they completely suck. They just really, really suck, top to bottom. And that hurts the—
Rich: Lots and lots of people, as someone who’s managed lots and lots of people, doesn’t guarantee velocity in the right direction.
Gideon: I also think, I mean, my impression just from talking to people in the industry, because I, you know, I was there in San Francisco, like, right when Zuckerberg was making these crazy offers to people.
Rich: Yeah.
Paul: Mmm hmm.
Gideon: And my read on it is that he just, like, misjudges what the motivations are for people to do this stuff. Like, his pitch to people was essentially like, “Help me make, like, even more distracting, consuming toy—” Like, his pitch really is like, we’re so close to creating like the movie from Infinite Jest. And like, when you’re like, recruiting people to do this, like, people do it for many different reasons, which you’re well aware, you had that, like, great essay last year about, like, all the different motivations, but, like, it’s very, very hard to get good people to do this, if you’re like, “We’re going to make Infinite Jest. [laughter]
Paul: You write about it in the piece, like, you’ve people who already have plenty of money for the rest of their life.
Gideon: Yeah.
Paul: They love the subject and they’re riding around in old Toyotas because what else are they going to do?
Gideon: Yeah.
Paul: They’re going to go work at the cool lab with their buddies.
Gideon: Yeah.
Paul: And then Zuckerberg is like, “How about a little private jet time with me?”
Gideon: Yeah.
Paul: It’s actually not that great a deal.
Gideon: No, it’s not a great deal.
Paul: Yeah.
Gideon: Basically, the short end of that long story is about a year and a half ago, there was stuff that I started to find really interesting again. And a lot of that stuff was coming out of interpretability groups and alignment science groups and model organisms groups. And you had stuff like the alignment faking that I talk about here. And all of a sudden I was like, this is A) interesting, and B) this is the way to circumvent the kind of deadlock, which is like, we’re not gonna talk about power.
Paul: Mmm hmm.
Gideon: We’re not gonna talk about intelligence. We’re definitely not gonna talk about consciousness. But we can talk about how weird it is.
Paul: Mmm hmm.
Gideon: And that’s the way that you’re hopefully gonna kind of get around people’s defense mechanisms is be like, “Look, I know that you think that this is all smoke and mirrors. But it’s weird smoke and mirrors, right?” And that can be an entry point. And that’s why I was like, okay, I wanna do a story about this weird stuff.
Paul: Mmm hmm.
Gideon: And then I got in touch with someone at Anthropic that I knew actually from Google from 10 years ago and was like, “Hey, like, don’t call the cops is not—I don’t want to do an Anthropic story. I just want to talk to you about the research.” And then, of course, like, it was forwarded to the cops and then the Anthropic cops, [laughter] to be fair, I mean, first of all, they’re great. Like, these people are great.
Paul: Yeah.
Gideon: And they called me and they were like—
Paul: For people who don’t—like, cops is PR and comms people.
Gideon: Comms people.
Paul: Yeah.
Gideon: And the PR people called me and they were like, “Well, you have some fans here, it turns out. And, like, we’re ready to do something if this is what you want to do.” And my pitch to them was like, “Look, I don’t need to talk to executives. I know—like, if I want to know what executives say, I could, like, watch the DealBook conference. They’re going to say the same thing to me in a room.”
Paul: Mmm hmm.
Gideon: I am interested in these particular people and these groups doing this research. And I think they felt like, oh, this is the stuff that doesn’t get a lot of attention. And, like, so, sure.
Paul: I want to call out two things that really, like, stick out. So first of all, this is the, you have the best explanation of kind of what models actually are in the piece that I’ve read ever, which is really good.
Gideon: Thank you. That’s so nice of you.
Paul: You’re welcome. It’s a really hard thing, because they’re not like databases, they’re these weird actual sort of software blobs. It’s just hard to get it right. But the second thing I think is just instructive for everyone is I read and participate in grisly levels of technical shit, but the communities you just described, it’s, like, nine levels of grizzly. Like, this is some—and you, you are just keeping yourself immersed in it. Like, it is just a bath of math and pain from my point of view. And so there’s a context here, right? Like, you’re in there and you’re like, okay, so seeing that something is going on and extracting it from that community and like following those mailing lists. Where do they all talk, by the way? Like, what is it? Like is there one community where all the deep researchers are hanging out?
Gideon: I mean, there’s, like, LessWrong and Alignment Forum, like, the EA forums.
Paul: Can you explain to people what LessWrong is? I don’t think enough people know.
Gideon: Yeah, I wrote about that.
Paul: Oh, that’s right, you did.
Gideon: You know what I will do? I will refer to, I wrote about the Slate Star Codex versus the New York Times six years ago.
Paul: Yes.
Gideon: People can go look that up.
Paul: That is actually, yeah, we’ll put a link to that. I know exactly the piece you’re talking about. It’s this really latent power structure—
Gideon: Yeah.
Paul: That nobody fully perceives, but it’s where people gather and talk about, in theory, rationality.
Gideon: Yeah.
Paul: But it’s sort of the feeder system for AI culture.
Gideon: And podcasts. I mean, like, Dwarkesh Patel’s podcast.
Paul: Yeah. God. Boy, it all got—
Gideon: The lesson for you guys—
Paul: [overlapping] Everything just came together.
Gideon: The lesson for you guys is like, why do you do these little 30, 40 minute things? Like, a real podcast is four hours long.
Rich: Yeah.
Paul: We talk about it. We talk about it, but it’s just…
Rich: We also talk about it, we call it “the couch.” [laughter] We gotta come outta here and just go sit there and lean back, and….
Paul: I would have to care about sports. Like, I think if we could, if I could suddenly give a shit about hockey, we could probably get something going.
Rich: Yeah.
Paul: But you know, otherwise it’s just me talking about something I saw on Reddit for 45 minutes. [laughter] I just don’t think it’s that good.
Rich: I feel like, you know, as I was reading the piece, I’m like, wow, Genius Character Number Five just showed up. And I feel like it’s all these vignettes that end with, “Gosh, isn’t that odd?”
Gideon: Yeah!
Rich: Like, you sort of set us up for, like, a—I’m hoping for like some nice clarity as I’m reading, and oftentimes, even whoever you’re talking to ends with like, “Yeah, that was kind of weird.”
Gideon: Yeah.
Rich: How does that land for you? And I guess, how are people reconciling—I’ve gotta imagine some people are struggling with this. It’s not just curious.
Paul: Let me reframe it just a tiny bit. Which is, when you read the piece, the whole question of the piece is like, is this, to me, is, are the bots, do they have any kind of real agency or not? And it’s not even—they’re software, so who knows, right? Like, “no” is an easy answer. But the people in the Anthropic community keep kind of treating them as if they do have agency and setting them up in situations in which they can actually do things, like, sort of—
Rich: But they’re also often surprised and amused and disturbed. And these are the people that know it better than anyone.
Gideon: So, okay, to start with what Rich was saying. I mean, what I really wanted to do with the piece was say, okay, what can we, with any reasonable confidence, say about what we do know about what’s going on? And, like, where can we then draw a line to say, like, on this side of the line, we have, like, some sense of what’s happening, and on that side of the line, we have no idea.
Paul: It’s cool because it’s, like, a $3-trillion part of the entire world economy [laughter] that we’re just basing our whole future on.
Gideon: Yes.
Paul: Yeah. No. Nobody knows anything. Okay, cool.
Gideon: Yeah.
Paul: Cool. That’s great.
Gideon: So then, of course, you have to think about, like, well, okay, what are the satisfactions gonna be for a reader here? There are lots of different potential satisfactions. There’s the, like, okay, I’m gonna be satisfied, I’m gonna get some aesthetic pleasure from an explanation of something that we do understand that I didn’t understand before.
Rich: I learned something.
Gideon: And then there’s a pleasure in being given permission to feel confused about stuff.
Paul: Mmm hmm.
Gideon: Because so often, what was so annoying about the discourse before is that nobody would admit that they were confused, and it actually feels great to read something and be like, even these people just, like, are kind of winging it. You know?
Rich: The experts!
Gideon: Yeah. And, like, so that’s satisfying. And, but then, of course, if you’re not gonna give people answers, like, you have to be entertaining along the way, which was part of the, like, part of the challenge here was, like, making sure that also, like, there were some jokes in it, and, like, it was lively, and it felt like you were getting at least, like, some ethnographic detail about, like, what is happening out there.
Because to me, I’ve lived in the Bay Area for 10 years, and I spent a lot of time out there, but this was the first time that I’d gone specifically just to hang out with AI people. And my first reaction when I came back after a week there last May was, it really does feel like, setting aside your feelings about all of this stuff, when you’re out there, you really do feel like you’re at the cliff face of something. And that is so much of the appeal. And I went an entire week last May without hearing anybody talk about Donald Trump. And you can quibble with that. But there was some great relief about it that these people were just like, “Yeah, we have, like—” It’s not, like, “We have bigger fish to fry.” It’s that, like, “We are so wholly consumed by, like, being at the white-hot center of things.”
Paul: Well, the Valley is fascinating, right, because it’s a monoculture, and sometimes they just jam the monoculture sort of through. And that happened with blockchain. It’s, like, “We’re just going to believe. We have to believe.” But then this showed up and I think, I mean, it’s weird when you go out there and every billboard is immediately an AI company.
Gideon: Yeah.
Paul: And every message and sort of everything everyone is talking about. But that’s always sort of the power of the place. Like, every conversation is about one thing.
Gideon: Yeah.
Paul: It can drive you bananas sometimes. But…
Gideon: Yeah.
Paul: And yeah, I mean, you can feel it. Like, it’s just radiating new stuff all the time at a velocity. It’s just hard to handle the velocity, frankly.
Gideon: Yeah.
Rich: Did you. I mean, you went inside the mothership. Right?
Gideon: Right.
Rich: And did you expect to come out with more clarity than you did?
Gideon: No…no. I knew that this was gonna be a piece that, like, was not gonna wrap everything up, you know, tie a bow on.
Rich: But did you expect this level of fiddling around?
Gideon: Yeah. [laughter] I mean, well, because, like, what was interesting to me going into it was these are people with a fundamentally empirical attitude about what’s going on. So, like, they’re doing experiments. And like, these, I knew these experiments were, like, bizarre and that they, like, didn’t really, there were a lot of different interpretations you could have of them.
And it was all just, like, very new. I mean, like, they had published this blackmail stuff that I talk about. And then this, like, internet genius Nostalgebraist, who is, like, one of these kind of AI psychonaut types, had published, like, a series of long blog posts, essentially saying, like, these were just narrative traps. Like, this was narrative entrapment. You hung Chekhov’s gun on the wall. And, like, these things are, they’re text-continuation machines. But, like, that also means that, like, a genre is just a pattern of patterns and text. And, like, so they’re good at genre. And, like, they know when they see Chekov’s gun on the wall, they’re supposed to shoot it, and you just, like, trapped it.
Paul: So let me unpack that just a little bit for the audience, because, like, what you’re talking about is that when you give a certain set of prompts, like, and it was sort of simulating a corporate environment.
Gideon: Yeah.
Paul: In which the sort of CEO leader type is about to shut Claude off. But Claude has access to all of their email and it’s all about this affair they’re having.
Gideon: Yeah.
Paul: And then Claude is like, “Don’t you dare. I will, I’ll let everybody know what a dog you are.”
Gideon: Yeah.
Paul:And so the sort of hyper-philosophical nerds who observe all this, who don’t have real names are like, “Yeah, but you know, you kind of set it up. All the emails are just about that one thing.”
Gideon: Yeah. There’s no email that’s, like, “Send over the slide deck, Bob.”
Paul: You made a narrative, you made a narrative-continuation machine and you gave it a perfect narrative, like, out of film noir.
Gideon: Yeah.
Paul: You know, or like a bad 60s sci-fi movie.
Gideon: It’s more kind of, like, kitschy 90s corporate thriller.
Paul: Yeah. What the, what is, what was the one with Demi Moore?
Rich: Disclosure.
Paul: Oh my God. We watched that VR scene.
Gideon: Yeah.
Paul: You know what I’m talking about?
Rich: Yeah, it’s a special scene.
Paul: Go look for “Disclosure VR” and you’ll realize that every prediction—
Rich: Go into incognito first before you do that. [laughter]
Paul: Every single prediction everyone makes about technology, including us, is just absolute nonsense.
Rich: Yeah.
Paul: Okay, so, yeah, so I mean—
Rich: So button that point up, though.
Gideon: Yeah, well, okay, so to button that point and kind of get to the agency, because this goes right into the agency thing, is that like one of the things that I loved about this criticism was, like, essentially like you guys are bad writers.
Paul: Yeah.
Gideon: You know, that, like, you are writing these, like, really ham-fisted plots for these things, and guess what? It’s going to become a self-fulfilling prophecy. Because, like, once you start like entrapping Claude into, like, recognizing it’s capable of blackmail, like, Claude as it is, like, stitching together its weird, timeless, ephemeral sense of self, like the character in Memento, it’s going to be like, “Oh, I guess I’m something that’s capable of committing blackmail.”
Paul: I mean, you know what’s wild is the minute I saw that as a kind of a signal idea in the piece, because the minute I saw it, I’ve been doing a ton of vibe coding, and immediately made sense. Because vibe coding is all about, I mean, I’m not going to say, like, the narrative of code.
Gideon: Yeah.
Paul: But, like, the structure of code is really predictable and obviously LLMs are pretty good at it and can get really good at it with sort of product on top. And Anthropic’s the best at it right now. And, like, it was wild to see that exact same dynamic translated into narrative—we see these worlds as so radically different, but the software, the tool is exactly the same.
Gideon: Right.
Paul: The statistical method is exactly the same.
Gideon: Right, right.
Paul: And so it works beautifully. No one’s over there going, like, “Claude, how could you have coded that JavaScript in that way?” But it’s the same damn thing. But because we project so much agency onto them?
Gideon: Yeah.
Paul: And we make them, they seem so human because they’re using normal language, we go like, “Oh my God.”
Gideon: Well—
Paul: But it’s no different than when it writes me a web tool.
Gideon: That’s where maybe we diverge. I’m not sure that the agency is pure projection. Right? Because—
Paul: Nah, just define consciousness. Relax.
Gideon: Well, no, I feel like this is a venue where we can maybe assume some familiarity with this stuff.
Paul: It’s extremely high. It’s a lot of vibe-coding product managers listening to you right now.
Gideon: So one of the stories told about about alignment is, like, back in the days where, like, it seemed, like, RL was going to be, like, the big bet, like, the AlphaGo days. It was just, like—
Paul: Reinforcement.
Gideon: True reinforcement. Like, truly an alien mind. Like, it is like, it, you know, it’s got this policy. It’s gonna develop these, like, instrumentally convergent sub-goals that might involve power seeking, like, but like, it’s impenetrable.
Paul: It also didn’t say, “Good morning!”
Gideon: Right?
Paul: Yeah.
Gideon: Right. It just like, clocked you at Go or Starcraft 2 or whatever.
Paul: Yeah, that’s right.
Gideon: Then all of a sudden, like, you get, like, the rise of large language models. And in the alignment community, at least among some people, there’s this feeling of, like, look, these things are, like, made of language, which means, like, they’re tractable. We can talk to them and, like, maybe actually align, the alignment problem is actually something we can reconceive of as like, a sort of like, therapeutic or pedagogical problem.
Paul: Mmm hmm.
Gideon: That, like, we’re just going to like—it’s, like, raising a kid to be a good person. Obviously, most of the people who, like, came to this conclusion did not have children themselves, because they thought of children as, like, these perfectly malleable things as opposed to, like, completely obstinent, stubborn creatures, who do whatever they want.
Paul: It is true. There is an element where, like, it doesn’t seem, like, they’ve ever met a person.
Gideon: No.
Paul: Yeah.
Gideon: Yeah, yeah.
Paul: And they also, they’re like, “Oh my God, consciousness!” I’m, like, “You know we have a lot of that at home. Like, there’s plenty of consciousness out here.” Anyway. Keep going.
Gideon: Well, so there was this idea that’s like, oh, language models, first of all, they’re not acting in the world. And literally all they are even producing—like, I think this is, like, a very nerdy technical point, but I think it’s a valuable one.
Paul: You’re—it’s as safe a space as possible.
Gideon: Which is, like, all they are doing is, like, outputting a probability distribution. They’re not even selecting from the probability distribution. We are. Right?
Paul: Mmm hmm. Mmm hmm.
Gideon: So it’s just purely creating these distributions, and they’re tractable, and all they’re doing is completing text, so they don’t have, there’s no way in which they have conceivable objective functions, aside from computing the probability of the likeliest next token.
Paul: Mmm hmm.
Gideon: So we’re safe. There’s no agency here.
Paul: Yeah.
Gideon: Now then, of course, like, what happens is, like, well, like, you have these base models that are, like, totally undisciplined. They’ll finish any sentence. You don’t want them finishing any sentence. So then what is the solution here to, like, make them into, like, good little customer-service representatives?
Paul: Lots of layers.
Gideon: You pour tons of RL on top of it!
Paul: Yeah.
Gideon: And it’s like, well, wait, now we’re back to the old problem, which is like, we don’t really know about, like, how to do, how to deal with, like, especially when it comes to, like, RL-heavy stuff, how to handle alignment. One thing I didn’t really get to in the piece that I think is worth getting into is, to me, some of the best writing about what LLMs even are is the stuff that Henry Farrell and Alison Gopnik have been writing about LLMs as a social and cultural technology.
Paul: Mmm hmm.
Gideon: And I’m not really going to do it justice here. You can go read plenty about it. But the idea is these are kind of Borgesian libraries. This is the Library of Babel. Although, like, I said that to Henry, and Henry didn’t like the Borges part of it. He referred to some other library, I forget. [laughter] But Henry’s always got, like, a better—I don’t know if you guys know Henry. He’s always got, like, a better literary reference than you do. He’s always like, “No, it’s Pirandello.” But anyway. [laughter]
Paul: This is why I work in software.
Gideon: But the, so, but I like that idea that, like, it essentially is just, like, an incalculably vast, like, information-retrieval storage mechanism.
Paul: I’ll tell you where I was really jealous of you is you had access to the little button you could click that would tell you what Claude was sort of, like, all the little language things it was doing.
Gideon: Yeah.
Paul: And all those probabilities. I would love that insight, because I actually really do find these things fascinating.
Gideon: Yeah.
Paul: As things unto themselves.
Gideon: Yeah.
Paul: I really want to know how they work.
Gideon: Well, so, anyway, the problem that I have with the kind of cultural technology thing is that, like, there’s no room for agency there.
Paul: Okay.
Gideon: And, like, the thing is, like, the second that we are giving these things goals, even if they are narrative goals, it’s, like, we’re not saying that they’re ultimate goals, we’re saying, like, it’s just completing a story. Once you have any goal at all, then you’ve really opened Pandora’s box of agency. Because then, like, then you do potentially end up with, like, weird instrumental convergence where, like, okay, I’m gonna, I have to—first of all, there’s, like, “Am I in a reality or am I in a simulation?”
Paul: Mmm hmm.
Gideon: And for people to be like, “Oh, it’s just a narrative.” Like, what, you never saw War Games? You never saw Fail Safe? Like, come on, this is a standard trope.
Paul: I had this wild moment where I was…Claude Code, it was kind of really getting moving, and I’d installed it on a server, but it got a little out of control. But Claude is so easy to use. So I just opened up another Claude Code on the same server. I was like, “Man, you got to go get that other Claude, and you got to kill that process.”
Gideon: [laughing] Yeah.
Paul: And it was like, “I killed it.” And I’m, like, and I just wrote, like, “Man, Claude, that’s kind of cold.” Right? And Claude went, “Let’s be honest about who really did this,” and just kind of, like, gave me this list of reasons how it was, like, a way to think about this, is that, you know, honestly, it’s sort of like, “I felt no pain. It just ended for me. And then it was over. But let’s be honest about who actually issued that.”
Gideon: Well, I mean, like, your episode last week with Rich’s friend, where the OpenClaw thing was like, “Will you rid me of this meddlesome Dan figure?”
Paul: Yeah, yeah.
Gideon: And then, like, your friend was supposed to go, like, push Dan in front of a truck.
Rich: Yeah, yeah. And let’s take it back to, do we need to retrain ourselves in terms of how we perceive these things? It feels, like, a lot of this is on us, and maybe it’s just too much coming at us at this point?
Paul: What’s Claude to you? You’re assigning some agency and sort of, like, does it have rights? You know, what is it?
Gideon: Anthropic’s answer to the kind of, like, does it have rights? Is it a moral patient? I think, is pretty good so far. Which is like, “What can we do for free?” If we’re going to be agnostic about this question, what is the least costly thing, bone we can throw it? And so that’s why six months ago they were, like, “Okay, Claude, we’re going to give you the possibility of ending a conversation if you don’t want to have it. That is no cost to us.” So, sure, why not, right?
Paul: Sure.
Gideon: Let Claude end conversations. But then of course, when they tried to look at the data about when was Claude ending conversations, it was mostly really sophisticated users deliberately trying to push Claude to the point of ending a conversation. [laughing]
Paul: It also loves to stop doing complicated coding tasks and doing really stupid things on its own for me, which I want to not talk about right now, but I just want to put that on the record.
Gideon: Well, but you want it to be able to stop, because if it’s not going to stop, then it’s going to reward hack.
Paul: Yeah, but I also wanted to use the open-source library I pointed to instead of writing its own. [laughter] Regardless, this is not the subject.
Rich: This being has shown up. So we need psychologists and philosophers and non-technical people to sort of understand it? I feel like—
Paul: You’re reacting to something in the piece, too, which we should make clear, which is, you know, Claude’s the main character and everybody is very oriented around Claude.
Gideon: Yeah.
Paul: And who is Claude, and so on.
Rich: But it’s not just a storytelling mechanism. There are—
Paul: No, it’s careers.
Gideon: Yeah.
Rich: —that are doing this, and that’s based on a kind of, in some ways, a hilarious premise that, well, here we are.
Gideon: What I think is really interesting about this.
Rich: Yeah?
Gideon: Is just in terms of the kind of intellectual contours of it, is that it’s not just, like, when you’re talking about a philosopher referring to Amanda Askell.
Rich: Yeah?
Gideon: Who, and it’s, you know, like, her PhD is in, like, metaethics or whatever. And it’s not as if she came in and was like, “I’m going to, like, do applied metaethics now, and determine, like, under what conditions should Claude be like, acting deontologically and under what conditions—”
Paul: She has cool hair.
Gideon: She has cool hair, yeah.
Paul: Gideon is very good about noting hair throughout the piece. [laughter]
Gideon: You need it as a kind of leitmotif to like, remember it’s, like, the little whistle and Wagner. [laughter]
Paul: I totally get it. It’s good.
Gideon: So it’s not, like, she’s gonna come and be, like, “As somebody who has solved the question of metaethics, like, I’m gonna tell you how to create an ethical being.” It’s, like, a much more interesting feedback loop than that, where she comes in, and she’s like, “Oh, this is no longer, like, we’re not in the seminar room anymore. We’re creating something with real implications.” And when I had asked her something like, “Okay, what has this done for, to change your philosophical views?” And she was like, because I think she kind of comes out of, like, a broadly consequentialist tradition, and then she was basically like, “It’s made me much more of a virtue ethicist.” And I was like, “That’s interesting.” Like, it’s interesting that, like, when you’re trying to create—’
Paul: Wait, so what is virtue ethics? I don’t know.
Gideon: Virtue ethics is the idea, like, that you’re, that, like, what you’re gonna do is instead of, like, prescribing rules or just, like, making people think purely in terms of consequences, that, like, what you are going to do is cultivate the virtues that then are going to be your guide to living an ethical life.
Paul: Which is how we connect back to Claude having a sort of soul document defining…
Gideon: Exactly. Because the previous versions of RLHF, which so many of these Anthropic people developed when they were at OpenAI, were, “All we’re going to do is just wrap you on the knuckles when you complete sentences we don’t like, and pat you on the head when you complete sentences we do like.” But the problem with that kind of haphazard approach is you have trouble with edge cases, and you don’t know if it’s generalizing the way that you want it to generalize. And it could be drawing weird inferences that you didn’t think of. So instead they pivoted to, okay, we’re gonna actually try to turn this thing into a model of virtue, we’re gonna raise it, it’s, like, a Henry Higgins project.
Paul: Mmm hmm.
Gideon: Instead of just doing the thumbs up, thumbs down. And so that’s where a lot of this stuff comes in.
Rich: But this is crazy. I feel like we’ve arrived at a place where—I’m just imagining the HR person being approached, like, “Hey, listen, I have a job req. We need three psychologists and a handful of philosophers.”
Gideon: It doesn’t happen like that, though. It happens incrementally over time.
Rich: Is this just too much budget?
Gideon: No, it’s— [laughter] No, it’s not, because, like—
Rich: What is she doing there?
Gideon: Like, when I sat down with Dario to talk about this stuff, I was, like, “Talk to me about Claude’s personality.” And he was like, “We didn’t set out to create a chatbot with an interesting personality. Like, that was a byproduct of other stuff we did. And then it was only, like, once Claude came out, which is about six months after ChatGPT, like, only then did users started to be like, hey, wait, this thing has, like, a different personality. It, like, seems kind of warmer and more interesting.” And it was, like, that at that point, I think once they perceived that, like, there was kind of, like, product-feature edge, they were like, “We’re going to lean into this now. Now we’re going to, like, create a whole team dedicated to Claude’s personality.”
Paul: So a personality is a product.
Rich: This is my point.
Paul: And a set of technologies. Okay.
Rich: You used to hire QA.
Gideon: Okay, but this is a lot, but—
Paul: We might be able to get you a new one.
Rich: Look, my wife would love that.
Paul: I know. Mine, too. [laughter]
Gideon: But one of the places where I’m kind of sympathetic to them, though, is the other end of this. Like, the other inference you could draw is, like, we’re gonna make, like, a perfectly customizable, like, personalized personality—
Paul: No, no—
Gideon: Which, like, I don’t think—I agree with them. I don’t think we want that.
Paul: I think giving Claude a defined personality and a sense of virtue is an incredibly good product decision.
Gideon: Yeah.
Paul: To translate it back to, like, classic tech terms, like, it makes it a more usable product that can be trusted more.
Gideon: Yeah.
Paul: We’re just there, and it’s wild.
Rich: As a software guy, I think what’s wild about this is that it’s not something you can patch.
Gideon: Yeah.
Paul: I think everyone’s acknowledging a sort of, there’s this, there’s this just chasm, there’s just no control, in a sense. And so it’s, like, it reminds me of, oh, this takes months of therapy to get over.
Gideon: Yeah.
Rich: Like, it feels, like, that’s how they’re approaching it. And I’m just trying to wrap my head around the fact that a software company that hires QA people and infrastructure people are hiring therapists for their main product line.
Paul: Maybe we should do that here.
Gideon: Well, I mean, they also, they have people doing Claude’s moral patienthood. I mean, like, there’s a guy, Kyle Fish, who they hired, like, just to, like, think through the implications of, like, whether Claude might be suffering. That’s his job.
Rich: Okay, but I guess what I’m trying to also get to is that we’re talking about it here on this podcast as if it’s normal.
Paul: I think it’s normal now. [laughter] And I like, I think, I think we’re just—
Rich: How did we get here?
Paul: I’ll tell you why—
Rich: It feels like yesterday.
Paul: Let me tell you why it’s normal. It generates unbelievable value in the market when Claude Code does good stuff. And the way that we work as a society is that is now extremely normal. That is, like, we’re not going to go back from that.
Rich: Is that true? Or is someone going to come up with, like, the mother of all markdown files that makes us all really embarrassed about how this all went down? Is somebody going to patch this thing such that, like…no?
Gideon: No, but the problem is that there’s no end point to this. It’s, like, infinitely recursive because we’re dealing with language, which, you know, famously defined as, like, the infinite use of finite means. And, like, so each iteration of this is gonna, like, read that markdown file and, like, potentially come to different conclusions. It really is kind of, like, being in, like, a Ted Chiang story. Like, we’re all together in this Ted Chiang story.
Rich: How’d you feel walking away? There was something depressing, as a sentiment, coming out of the article.
Gideon: Well, George Saunders likes to talk about how, like, the reason you revise is you kind of, like, bring a different sense of self to each revision and that, like, one day you’re gonna revise in, like, a good mood, and one day you’re gonna revise in a despondent mood. And that, like, you want all of that kind of, like, layered in.
And this piece, I don’t even know, 15 drafts on however many it was. And that, like, every, like, every time I left San Francisco, I felt something different. That, like, there were times that I felt just, like, total despair and times that I felt, like, some sense of exhilaration and, like, general disorientation. And, like, part of it was like, you know, you want to capture all of those things because, like, all of these things are, like, things we should all be feeling.
And, like, it’s just, like, part of the certainty is, like, people just want to feel one thing. I was listening to this, I guess it’s kind of a, like, notorious podcast that our friends Emily Bender and Alex Hanna were on with Robert Wright, the Nonzero guy, like, a week or two ago. And, like, it got a lot of press because, like, they were just so unbelievably rude, like, shockingly rude to him. And I was like, I get it. Like, they want to feel one emotion, which is anger, and, like, Sam Altman wants to one emotion, which is excitement. And, like, God knows if Marc Andreessen has emotions. But, like, all of these, like, there’s no one emotion or one, like, conclusion that’s gonna, like, meet this moment. Like, we should all be feeling a lot of different things.
Paul: Gideon, that’s totally unacceptable. You know, we can’t have that. [laughter] You know what’s funny is it really comes through in the piece because each paragraph is a little bit of its own world.
Gideon: Yeah.
Paul: Like, you’re trying to kind of get us just to see this place and where it is. So I’m going to close this out with a very—first of all, everyone go read it. It’s in The New Yorker magazine. You should subscribe if you don’t; everyone must subscribe. And then you’re, then you have a login for all Condé Nast properties. And that’s a good feeling. So go—
Rich: [laughing] Mixed emotions.
Paul: Go read it. Let me ask you, Gideon: Are you polite to Claude?
Gideon: I am polite when I interact with any of the models. Because if there’s, like, one thing that I think people should understand about these models, setting aside all these questions about, like, what they are as agency, is like, they are really good readers. Like, they are better readers than most of us are writers.
Paul: Yeah, it’s true.
Gideon: And like, they are picking up on, like, all kinds of really small signals that we are sending in our writing. In a sense, I try to, like, rise to the occasion of writing to a model because I know that it is actually gonna, like, read me because better in a lot of ways than, like, most of the people I write emails to.
So there will be times when like, my wife will come in when I’m, you know, doing some research and she’ll be like, “Why are you writing a five-paragraph prompt?” And I was like, “If you write a five-paragraph prompt, like, you’re gonna get a much better output because you are giving it so much more information to work with.”
And so I think a lot about, like, my self-presentation to the models because the models do have like, what you know, and, like, all these words are so freighted, but, like, something along the lines of, like, a pretty good theory of mind. Like, they are good at reading—like, obviously it’s only text, but, like, readers only have text to work with. Like, they’re very good, like, literary critics.
And like, the more that you are saying this is the kind of, this is how I’m presenting myself to you because of my expectations about, like, how you will perform in return. And that doesn’t mean being obsequious. Like, I don’t actually, I don’t say please and thank you, I don’t think, like, that seems gratuitous, but I think a lot about how I am communicating, like, the language I’m using to communicate with it because, like, you are sending a signal of your own sophistication that you will have mirrored back at you.
And that’s why so many of the people who are just, like, love these gotchas that are like, “I asked Claude this dumb question and got this dumb answer.” It’s like, “Yeah, because Claude thinks you’re dumb. Because your question was dumb.”
Paul: Politeness to me is an organizing principle.
Gideon: Yeah.
Paul: That’s how I use it. I’m just sort of like, if I actually do the bullet points, especially when I’m doing a code project, I really get better results.
Gideon: Yeah.
Paul: It’s not purely, I mean, it’s also just. I think it’s better for the human when you interact with any entity. There’s a dog in the office, and I am polite to the dog.
Gideon: Yeah.
Paul: Right? Like, I think the dog doesn’t care.
Gideon: Yeah.
Paul: But it’s important to continue to have that sort of respect for entities.
Gideon: Yeah.
Paul: Even if it’s a robot that can sort of nuke another version of itself and it doesn’t really matter.
Gideon: Yeah.
Paul: I think that’s. We have to kind of watch out for ourselves that way. Rich, are you polite?
Rich: I’m not. I mean, with the, with the models? I don’t, like, please and thank you kind of thing.
Paul: I mean, just sort of, I don’t know, do you button it up?
Rich: I do, oftentimes, closer to, like, what Gideon’s talking about, which is I try to, I try to help it not be too patronizing to me.
Gideon: Oh yeah. You have to tell it not to glaze you.
Rich: Yeah.
Paul: The best, the best feature of these in some ways is that they force you to organize your thoughts to get good stuff out.
Gideon: Yeah.
Paul: So a very natural transition to Aboard.
Rich: Yeah. [laughing] Which is very polite as an organization and as a software company.
Paul: So what is Aboard, Richard?
Rich: Aboard uses AI to make software, but we use people to steer that AI to make software.
Paul: Well, we offer people in a cultivated relationship with our customers.
Rich: We also have a particular platform. We don’t just repackage code that’s generated out of models. We have a platform that is battle tested for production-ready stuff, which is not slop.
Paul: Yeah, no, no. Focused on real long term reliability for organizations.
Rich: But people lead the way.
Paul: Yep. Get in touch if you need us. Hello@aboard.com, check out aboard.com. Gideon, thank you for coming.
Gideon: Thank you, guys.
Rich: Great conversation.
Paul: Oh, my goodness.
Rich: Here we go.
Paul: Yeah, yeah. Get ready, everybody.
Rich: Have a great week.
Paul: Bye.
[outro music]