[MUSIC PLAYING]
From New York Times Opinion, this is “The Ezra Klein Show.”
[MUSIC PLAYING]
This feels wrong to me. But I have checked the dates. It was barely more than a year ago that I wrote this piece about A.I., with the title “This Changes Everything.” I ended up reading it on the show, too. And the piece was about the speed with which A.I. systems were improving. It argued that we can usually trust that tomorrow is going to be roughly like today, that next year is going to be roughly like this year. That’s not what we’re seeing here. These systems are growing in power and capabilities at an astonishing rate.
The growth is exponential, not linear. When you look at surveys of A.I. researchers, their timeline for how quickly A.I. is going to be able to do basically anything a human does better and more cheaply than a human — that timeline is accelerating, year by year, on these surveys. When I do my own reporting, talking to the people inside these companies, people at this strange intersection of excited and terrified of what they’re building, no one tells me they are seeing a reason to believe progress is going to slow down.
And you might think that’s just hype, but a lot of them want it to slow down. A lot of them are scared of how quickly it is moving. They don’t think that society is ready for it, that regulation is ready for it. They think the competitive pressures between the companies and the countries are dangerous. They wish something would happen to make it all go slower. But what they are seeing is they are hitting the milestones faster, that we’re getting closer and closer to truly transformational A.I., that there is so much money and talent and attention flooding into the space that that is becoming its own accelerant. They are scared. We should at least be paying attention.
And yet, I find living in this moment really weird, because as much as I know this wildly powerful technology is emerging beneath my fingertips, as much as I believe it’s going to change the world I live in profoundly, I find it really hard to just fit it into my own day to day work. I consistently sort of wander up to the A.I., ask it a question, find myself somewhat impressed or unimpressed at the answer. But it doesn’t stick for me. It is not a sticky habit. It’s true for a lot of people I know.
And I think that failure matters. I think getting good at working with A.I. is going to be an important skill in the next few years. I think having an intuition for how these systems work is going to be important just for understanding what is happening to society. And you can’t do that if you don’t get over this hump in the learning curve, if you don’t get over this part where it’s not really clear how to make A.I. part of your life.
So I’ve been on a personal quest to get better at this. And in that quest, I have a guide. Ethan Mollick is a professor at the Wharton School of the University of Pennsylvania. He studies and writes about innovation and entrepreneurship. But he has this newsletter, One Useful Thing, that has become, really, I think, the best guide how to begin using, and how to get better at using A.I. He’s also got a new book on the subject, “Co-Intelligence.” And so I asked him on the show to walk me through what he’s learned.
This is going to be, I should say, the first of three shows on this topic. This one is about the present. The next is about some things I’m very worried about in the near future, particularly around what A.I. is going to do to our digital commons. And then, we’re going to have a show that is a little bit more about the curve we are all on about the slightly further future, and the world we might soon be living in.
As always, my email for guest suggestions, thoughts, feedback, ezrakleinshow@nytimes.com.
Ethan Mollick, welcome to the show.
Thanks for having me.
So let’s assume I’m interested in A.I. And I tried ChatGPT a bunch of times, and I was suitably impressed and weirded out for a minute. And so I know the technology is powerful. I’ve heard all these predictions about how it will take everything over, or become part of everything we do. But I don’t actually see how it fits into my life, really, at all. What am I missing?
So you’re not alone. This is actually very common. And I think part of the reason is that the way ChatGPT works isn’t really set up for you to understand how powerful it is. You really do need to use the paid version, they are significantly smarter. And you can almost think of this — like, GPT-3, which was — nobody really paid attention to when it came out, before ChatGPT, was about as good as a sixth grader at writing. GPT-3.5, the free version of ChatGPT, is about as good as a high school, or maybe even a college freshman or sophomore.
And GPT-4 is often as good as a Ph.D. in some forms of writing. Like, there’s a general smartness that increases. But even more than that, ability seems to increase. And you’re much more likely to get that feeling that you are working with something amazing as a result. And if you don’t work with the frontier models, you can lose track of what these systems can actually do. On top of that, you need to start just using it. You kind of have to push past those first three questions.
My advice is usually bring it to every table that you come to in a legal and ethical way. So I use it for every aspect of my job in ways that I legally and ethically can, and that’s how I learn what it’s good or bad at.
When you say, bring it to every table you’re at, one, that sounds like a big pain, because now I’ve got to add another step of talking to the computer constantly. But two, it’s just not obvious to me what that would look like. So what does it look like? What does it look like for you, or what does it look like for others — that you feel is applicable widely?
So I just finished this book. It’s my third book. I keep writing books, even though I keep forgetting that writing books is really hard. But this was, I think, my best book, but also the most interesting to write. And it was thanks to A.I. And there’s almost no A.I. writing in the book, but I used it continuously. So things that would get in the way of writing — I think I’m a much better writer than A.I. — hopefully, people agree. But there’s a lot of things that get in your way as a writer. So I would get stuck on a sentence. I couldn’t do a transition. Give me 30 versions of this sentence in radically different styles. There’s 200 different citations. I had the A.I. read through the papers that I read through, write notes on them, and organize them for me. I had the A.I. suggest analogies that might be useful. I had the A.I. act as readers, and in different personas, read through the paper from the perspective of, is there some example I could give that’s better? Is this understandable or not? And that’s very typical of the kind of way that I would, say, bring it to the table. Use it for everything, and you’ll find its limits and abilities.
Let me ask you one specific question on that, because I’ve been writing a book. And on some bad days of writing the book, I decided to play around with GPT-4. And of the things that it got me thinking about was the kind of mistake or problem these systems can help you see and the kind they can’t. So they can do a lot of, give me 15 versions of this paragraph, 30 versions of this sentence. And every once in a while, you get a good version or you’ll shake something a little bit loose.
But almost always when I am stuck, the problem is I don’t know what I need to say. Oftentimes, I have structured the chapter wrong. Oftentimes, I’ve simply not done enough work. And one of the difficulties for me about using A.I. is that A.I. never gives me the answer, which is often the true answer — this whole chapter is wrong. It is poorly structured. You have to delete it and start over. It’s not feeling right to you because it is not right.
And I actually worry a little bit about tools that can see one kind of problem and trick you into thinking it’s this easier problem, but make it actually harder for you to see the other kind of problem that maybe if you were just sitting there, banging your head against the wall of your computer, or the wall of your own mind, you would eventually find.
I think that’s a wise point. I think there’s two or three things bundled there. The first of those is A.I. is good, but it’s not as good as you. It is, say, at the 80th percentile of writers based on some results, maybe a little bit higher. In some ways, if it was able to have that burst of insight and to tell you this chapter is wrong, and I’ve thought of a new way of phrasing it, we would be at that sort of mythical AGI level of A.I. as smart as the best human. And it just isn’t yet.
I think the second issue is also quite profound, which is, what does using this tool shape us to do and not do? One nice example that you just gave is writing. And I think a lot of us think about writing as thinking. We don’t know if that’s true for everybody, but for writers, that’s how they think. And sometimes, getting that shortcut could shortcut the thinking process. So I’ve had to change sometimes a little bit how I think when I use A.I., for better or for worse. So I think these are both concerns to be taken seriously.
For most people — right, if you’re just going to pick one model, what would you pick? What do you recommend to people? And second, how do you recommend they access it? Because something going on in the A.I. world is there are a lot of wrappers on these models. So ChatGPT has an app. Claude does not have an app. Obviously, Google has its suite of products. And there are organizations that have created a different spin on somebody else’s A.I. — so Perplexity, which is, I believe, built on GPT-4 now, you can pay for it.
And it’s more like a search engine interface, and has some changes made to it. For a lot of people, the question of how easy and accessible the thing is to access really matters. So which model do you recommend to most people? And which entry door do you recommend to most people? And do they differ?
It’s a really good question. I recommend working with one of the models as directly as possible, through the company that creates them. And there’s a few reasons for that. One is you get as close to the unadulterated personality as possible. And second, that’s where features tend to roll out first. So if you like sort of intellectual challenge, I think Claude 3 is the most intellectual of the models, as you said.
The biggest capability set right now is GPT-4, so if you do any math or coding work, it does coding for you. It has some really interesting interfaces. That’s what I would use — and because GPT-5 is coming out, that’s fairly powerful. And Google is probably the most accessible, and plugged into the Google ecosystem. So I don’t think you can really go wrong with any of these. Generally, I think Claude 3 is the most likely to freak you out right now. And GPT-4 is probably the most likely to be super useful right now.
So you say it takes about 10 hours to learn a model. Ten hours is a long time, actually. What are you doing in that 10 hours? What are you figuring out? How did you come to that number? Give me some texture on your 10 hour rule.
So first off, I want to indicate the 10 hours is as arbitrary as 10,000 steps. Like, there’s no scientific basis for it. This is an observation. But it also does move you past the, I poked at this for an evening, and it moves you towards using this in a serious way. I don’t know if 10 hours is the real limit, but it seems to be somewhat transformative. The key is to use it in an area where you have expertise, so you can understand what it’s good or bad at, learn the shape of its capabilities.
When I taught my students this semester how to use A.I., and we had three classes on that, they learned the theory behind it. But then I gave them an assignment, which was to replace themselves at their next job. And they created amazing tools, things that filed flight plans or did tweeting, or did deal memos. In fact, one of the students created a way of creating user personas, which is something that you do in product development, that’s been used several thousand times in the last couple of weeks in different companies.
So they were able to figure out uses that I never thought of to automate their job and their work because they were asked to do that. So part of taking this seriously in the 10 hours is, you’re going to try and use it for your work. You’ll understand where it’s good or bad, what it can automate, what it can’t, and build from there.
Something that feels to me like a theme of your work is that the way to approach this is not learning a tool. It is building a relationship. Is that fair?
A.I. is built like a tool. It’s software. It’s very clear at this point that it’s an emulation of thought. But because of how it’s built, because of how it’s constructed, it is much more like working with a person than working with a tool. And when we talk about it this way, I almost feel kind of bad, because there’s dangers in building a relationship with a system that is purely artificial, and doesn’t think and have emotions. But honestly, that is the way to go forward. And that is sort of a great sin, anthropomorphization, in the A.I. literature, because it can blind you to the fact that this is software with its own sets of foibles and approaches.
But if you think about it like programming, then you end up in trouble. In fact, there’s some early evidence that programmers are the worst people at using A.I. because it doesn’t work like software. It doesn’t do the things you would expect a tool to do. Tools shouldn’t occasionally give you the wrong answer, shouldn’t give you different answers every time, shouldn’t insult you or try to convince you they love you.
And A.I.s do all of these things. And I find that teachers, managers, even parents, editors, are often better at using these systems, because they’re used to treating this as a person. And they interact with it like a person would, giving feedback. And that helps you. And I think the second piece of that “not tool” piece is that when I talk to OpenAI or Anthropic, they don’t have a hidden instruction manual. There is no list of how you should use this as a writer, or as a marketer, or as an educator. They don’t even know what the capabilities of these systems are. They’re all sort of being discovered together. And that is also not like a tool. It’s more like a person with capabilities that we don’t fully know yet.
So you’ve done this with all the big models. You’ve done, I think, much more than this, actually, with all the big models. And one thing you describe feeling is that they don’t just have slightly different strengths and weaknesses, but they have different — for lack of a better term, and to anthropomorphize — personalities, and that the 10 hours in part is about developing an intuition not just for how they work, but kind of how they are and how they talk, the sort of entity you’re dealing with.
So give me your high level on how GPT-4 and Claude 3 and Google’s Gemini are different. What are their personalities like to you?
It’s important to know the personalities not just as personalities, but because there are tricks. Those are tunable approaches that the system makers decide. So it’s weird to have this — in one hand, don’t anthropomorphize, because you’re being manipulated, because you are. But on the other hand, the only useful way is to anthropomorphize. So keep in mind that you are dealing with the choices of the makers.
So for example, Claude 3 is currently the warmest of the models. And it is the most allowed by its creators, Anthropic, I think, to act like a person. So it’s more willing to give you its personal views, such as they are. And again, those aren’t real views. Those are views to make you happy — than other models. And it’s a beautiful writer, very good at writing, kind of clever — closest to humor, I’ve found, of any of the A.I.s. Less dad jokes and more actual almost jokes.
GPT-4 feels like a workhorse at this point. It is the most neutral of the approaches. It wants to get stuff done for you. And it will happily do that. It doesn’t have a lot of time for chitchat. And then we’ve got Google’s Bard, which feels like — or Gemini now — which feels like it really, really wants to help. We use this for teaching a lot. And we build these scenarios where the A.I. actually acts like a counterparty in a negotiation. So you get to practice the negotiation by negotiating with the A.I. And it works incredibly well. I’ve been building simulations for 10 years, can’t imagine what a leap this has been. But when we try and get Google to do that, it keeps leaping in on the part of the students, to try and correct them and say, no, you didn’t really want to say this. You wanted to say that. And I’ll play out the scenario as if it went better. And it really wants to kind of make things good for you.
So these interactions with the A.I. do feel like you’re working with people, both in skills and in personality.
You were mentioning a minute ago that what the A.I.s do reflect decisions made by their programmers. They reflect guardrails, what they’re going to let the A.I. say. Very famously, Gemini came out and was very woke. You would ask it to show you a picture of soldiers in Nazi Germany, and it would give you a very multicultural group of soldiers, which is not how that army worked. But that was something that they had built in to try to make more inclusive photography generation.
But there are also things that happen in these systems that people don’t expect, that the programmers don’t understand. So I remember the previous generation of Claude, which is from Anthropic, that when it came out, something that the people around it talked about was, for some reason, Claude was just a little bit more literary than the other systems. It was better at rewriting things in the voices of literary figures. It just had a slightly artsier vibe.
And the people who trained it weren’t exactly sure why. Now, that still feels true to me. Right now, of the ones I’m using, I’m spending the most time with Claude 3. I just find it the most congenial. They all have different strengths and weaknesses, but there is a funny dimension to these where they are both reflecting the guardrails and the choices of the programmers. And then deep inside the training data, deep inside the way the various algorithms are combining, there is some set of emergent qualities to them, which gives them this at least edge of chance, of randomness, of something — yeah, that does feel almost like personality.
I think that’s a very important point. And fundamental about A.I. is the idea that we technically know how LLMs work, but we don’t know how they work the way they do, or why they’re as good as they are. They’re really — we don’t understand it. The theories range from everyone — from it’s all fooling us, to they’ve emulated the way humans think because the structure of language is the structure of human thought. So even though they don’t think, they can emulate it. We don’t know the answer.
But you’re right, there’s these emergent sets of personalities and approaches. When I talk to A.I. design companies, they often can’t explain why the A.I. stops refusing answering a particular kind of question. When they tune the A.I. to do something better, like answer a math better, it suddenly does other things differently. It’s almost like adjusting the psychology of a system rather than tuning parameters.
So when I said that Claude is allowed to be more personable, part of that is that the system prompt in Claude, which is the initial instructions it gets, allow it to be more personable than, say, Microsoft’s Copilot, formerly Bing, which has explicit instructions after a fairly famous blow up a while ago, that it’s never supposed to talk about itself as a person or indicate feelings. So there’s some instructions, but that’s on top of these roiling systems that act in ways that even the creators don’t expect.
One thing people know about using these models is that hallucinations, just making stuff up, is a problem. Has that changed at all as we’ve moved from GPT-3.5 to 4, as we move from Claude 2 to 3. Like, has that become significantly better? And if not, how do you evaluate the trustworthiness of what you’re being told?
So those are a couple of overlapping questions. The first of them is, it getting better over time? So there is a paper in the field of medical citations that indicated that around 80 to 90 percent of citations had an error, were made up with GPT-3.5. That’s the free version of Chat. And that drops for GPT-4.
So hallucination rates are dropping over time. But the A.I. still makes stuff up because all the A.I. does is hallucinate. There is no mind there. All it’s doing is producing word after word. They are just making stuff up all the time. The fact that they’re right so often is kind of shocking in a lot of ways.
And the way you avoid hallucination is not easily. So one of the things we document in one of our research papers is we purposely designed for a group of Boston Consulting Group consultants — so an elite consulting company — we did a lot of work with them. And one of the experiments we did was we created a task where the A.I. would be confident but wrong. And when we gave people that task to do, and they had access to A.I., they got the task wrong more often than people who didn’t use A.I., because the A.I. misled them, because they fell asleep at the wheel. And all the early research we have on A.I. use suggests that when A.I.s get good enough, we just stop paying attention.
But doesn’t this make them unreliable in a very tricky way? 80 percent — you’re, like, it’s always hallucinating. 20 percent, 5 percent, it’s enough that you can easily be lulled into overconfidence. And one of the reasons it’s really tough here is you’re combining something that knows how to seem extremely persuasive and confident — you feed into the A.I. a 90-page paper on functions and characteristics of right wing populism in Europe, as I did last night.
And within seconds, basically, you get a summary out. And the summary certainly seems confident about what’s going on. But on the other hand, you really don’t know if it’s true. So for a lot of what you might want to use it for, that is unnerving.
Absolutely, and I think hard to grasp, because we’re used to things like type II errors, where we search for something on the internet and don’t find it. We’re not used to type I errors, where we search for something and get an answer back that’s made up. This is a challenge. And there’s a couple things to think about. One of those is — I advocate the BAH standard, best available human. So is the A.I. more or less accurate than the best human you could consult in that area?
And what does that mean for whether or not it’s an appropriate question to ask? And that’s something that we kind of have to judge collectively. It’s valuable to have these studies being done by law professors and medical professionals and people like me and my colleagues in management. They’re trying to understand, how good is the A.I.? And the answer is pretty good, right? So it makes mistakes. “Does it make more or less mistakes than a human” is probably a question we should be asking a lot more.
And the second thing is the kind of tasks that you judge it for. I absolutely agree with you. When summarizing information, it may make errors. Less than an intern you assign to it is an open question, but you have to be aware of that error rate. And that goes back to the 10 hour question. The more you use these A.I.s, the more you start to know when to be suspicious and when not to be. That doesn’t mean you’re eliminating errors.
But just like if you assigned it to an intern, and you’re, like, this person has a sociology degree. They’re going to do a really good job summarizing this, but their biases are going to be focused on the sociological facts and not the political facts. You start to learn these things. So I think, again, that person model helps, because you don’t expect 100 percent reliability out of a person. And that changes the kind of tasks you delegate.
But it also reflects something interesting about the nature of the systems. You have a quote here that I think is very insightful. You wrote, “the core irony of generative A.I.s is that A.I.s were supposed to be all logic and no imagination. Instead, we get A.I.s that make up information, engage in seemingly emotional discussions, and which are intensely creative.” And that last fact is one that makes many people deeply uncomfortable.
There is this collision between what a computer is in our minds and then this strange thing we seem to have invented, which is an entity that emerges out of language, an entity that almost emerges out of art. This is the thing I have the most trouble keeping in my mind, that I need to use the A.I. as an imaginative, creative partner and not as a calculator that uses words.
I love the phrase “a calculator that uses words.” I think we have been let down by science fiction, both in the utopias and apocalypses that A.I. might bring, but also, even more directly, in our view of how machines should work. People are constantly frustrated, and give the same kinds of tests to A.I.s over and over again, like doing math, which it doesn’t do very well — they’re getting better at this.
And on the other hand, saying, well, creativity is a uniquely human spark that we can’t touch, and that A.I., on any creativity test we give it — which, again, are all limited in different ways, blows out humans in almost all measures of creativity that we have. Or all the measures are bad, but that still means something.
But we were using those measures five years ago, even though they were bad. That’s a point you make that I think is interesting and slightly unsettling.
Yeah, we never had to differentiate humans from machines before. It was always easy. So the idea that we had to have a scale that worked for people and machines, who had that? We had the Turing test, which everyone knew was a terrible idea. But since no machine could pass it, it was completely fine. So the question is, how do we measure this? This is an entirely separate set of issues. Like, we don’t even have a definition of sentience or consciousness.
And I think that you’re exactly right on the point, being that we are not ready for this kind of machine, so our intuition is bad.
So one of the things I will sometimes do, and did quite recently, is give the A.I. a series of personal documents, emails I wrote to people I love that were very descriptive of a particular moment in my life. And then I will ask the A.I. about them, or ask the A.I. to analyze me off of them.
And sometimes, it’s a little breathtaking. Almost every moment of true metaphysical shock — to use a term somebody else gave me — I’ve had here has been relational, at how good the A.I. can be — almost like a therapist, right? Sometimes it will see things, the thing I am not saying, in a letter, or in a personal problem. And it will zoom in there, right? It will give, I think, quicker and better feedback in an intuitive way that is not simply mimicking back what I said and is dealing with a very specific situation. It will do better than people I speak to in my life around that.
Conversely, I’m going to read a bit of it later. I tried mightily to make Claude 3 a useful partner in prepping to speak to you, and also in prepping for another podcast recently. And I functionally never have a moment there where I’m all that impressed.
That makes complete sense. I think the weird expectations — we call it the jagged frontier of A.I., that it’s good at some stuff and bad at other stuff. It’s often unexpected. It can lead to these weird moments of disappointment, followed by elation or surprise. And part of the reason why I advocate for people to use it in their jobs is, it isn’t going to outcompete you at whatever you’re best at. I mean, I cannot imagine it’s going to do a better job prepping someone for an interview than you’re doing. And that’s not me just — I’m trying to be nice to you because you’re interviewing me, but because you’re a good interviewer. You’re a famous interviewer. It’s not going to be as good as that. Now, there’s questions about how good these systems get that we don’t know, but we’re kind of at a weirdly comfortable spot in A.I., which is, maybe it’s the 80th percentile of many performances. But I talk to Hollywood writers. It’s not close to writing like a Hollywood writer. It’s not close to being as good an analyst.
It’s not — but it’s better than the average person. And so it’s great as a supplement to weakness, but not to strength. But then, we run back into the problem you talked about, which is, in my weak areas, I have trouble assessing whether the A.I. is accurate or not. So it really becomes sort of a eating its own tail kind of problem.
But this gets to this question of, what are you doing with it? The A.I.s right now seem much stronger as amplifiers and feedback mechanisms and thought partners for you than they do as something you can really outsource your hard work and your thinking to. And that, to me, is one of the differences between trying to spend more time with these systems — like, when you come into them initially, you’re like, OK, here’s a problem, give me an answer.
Whereas when you spend time with them, you realize actually what you’re trying to do with the A.I. is get it to elicit a better answer from you.
And that’s why the book’s called “Co-Intelligence.” For right now, we have a prosthesis for thinking. That’s, like, new in the world. We haven’t had that before — I mean, coffee, but aside from that, not much else. And I think that there’s value in that. I think learning to be partner with this, and where it can get wisdom out of you or not — I was talking to a physics professor at Harvard. And he said, all my best ideas now come from talking to the A.I. And I’m like, well, it doesn’t do physics that well. He’s like, no, but it asks good questions. And I think that there is some value in that kind of interactive piece.
It’s part of why I’m so obsessed with the idea of A.I. in education, because a good educator — and I’ve been working on interactive education skill for a long time — a good educator is eliciting answers from a student. And they’re not telling students things.
So I think that that’s a really nice distinction between co-intelligence, and thought partner, and doing the work for you. It certainly can do some work for you. There’s tedious work that the A.I. does really well. But there’s also this more brilliant piece of making us better people that I think is, at least in the current state of A.I., a really awesome and amazing thing.
[MUSIC PLAYING]
We’ve already talked a bit about — Gemini is helpful, and ChatGPT-4 is neutral, and Claude is a bit warmer. But you urge people to go much further than that. You say to give your A.I. a personality. Tell it who to be. So what do you mean by that, and why?
So this is actually almost more of a technical trick, even though it sounds like a social trick. When you think about what A.I.s have done, they’ve trained on the collective corpus of human knowledge. And they know a lot of things. And they’re also probability machines. So when you ask for an answer, you’re going to get the most probable answer, sort of, with some variation in it. And that answer is going to be very neutral. If you’re using GPT-4, it’ll probably talk about a rich tapestry a lot. It loves to talk about rich tapestries. If you ask it to code something artistic, it’ll do a fractal. It does very normal, central A.I. things. So part of your job is to get the A.I. to go to parts of this possibility space where the information is more specific to you, more unique, more interesting, more likely to spark something in you yourself. And you do that by giving it context, so it doesn’t just give you an average answer. It gives you something that’s specialized for you. The easiest way to provide context is a persona. You are blank. You are an expert at interviewing, and you answer in a warm, friendly style. Help me come up with interview questions. It won’t be miraculous in the same way that we were talking about before. If you say you’re Bill Gates, it doesn’t become Bill Gates. But that changes the context of how it answers you. It changes the kinds of probabilities it’s pulling from and results in much more customized and better results.
OK, but this is weirder, I think, than you’re quite letting on here. So something you turned me on to is there’s research showing that the A.I. is going to perform better on various tasks, and differently on them, depending on the personality. So there’s a study that gives a bunch of different personality prompts to one of the systems, and then tries to get it to answer 50 math questions. And the way it got the best performance was to tell the A.I. it was a Starfleet commander who was charting a course through turbulence to the center of an anomaly.
But then, when it wanted to get the best answer on 100 math questions, what worked best was putting it in a thriller, where the clock was ticking down. I mean, what the hell is that about?
“What the hell” is a good question. And we’re just scratching the surface, right? There’s a nice study actually showing that if you emotionally manipulate the A.I., you get better math results. So telling it your job depends on it gets you better results. Tipping, especially $20 or $100 — saying, I’m about to tip you if you do well, seems to work pretty well. It performs slightly worse in December than May, and we think it’s because it has internalized the idea of winter break.
I’m sorry, what?
Well, we don’t know for sure, but —
I’m holding you up here.
Yeah.
People have found the A.I. seems to be more accurate in May, and the going theory is that it has read enough of the internet to think that it might possibly be on vacation in December?
So it produces more work with the same prompts, more output, in May than it does in December. I did a little experiment where I would show it pictures of outside. And I’m like, look at how nice it is outside? Let’s get to work. But yes, the going theory is that it has internalized the idea of winter break and therefore is lazier in December.
I want to just note to people that when ChatGPT came out last year, and we did our first set of episodes on this, the thing I told you was this was going to be a very weird world. What’s frustrating about that is that — I guess I can see the logic of why that might be. Also, it sounds probably completely wrong, but also, I’m certain we will never know. There’s no way to go into the thing and figure that out.
But it would have genuinely never occurred to me before this second that there would be a temporal difference in the amount of work that GPT-4 would do on a question held constant over time. Like, that would have never occurred to me as something that might change at all.
And I think that that is, in some ways, both — as you said, the deep weirdness of these systems. But also, there’s actually downside risks to this. So we know, for example, there is an early paper from Anthropic on sandbagging, that if you ask the A.I. dumber questions, it would get you less accurate answers. And we don’t know the ways in which your grammar or the way you approach the A.I. — we know the amount of spaces you put gets different answers.
So it is very hard, because what it’s basically doing is math on everything you’ve written to figure out what would come next. And the fact that what comes next feels insightful and humane and original doesn’t change that that’s what the math that’s doing is. So part of what I actually advise people to do is just not worry about it so much, because I think then it becomes magic spells that we’re incanting for the A.I. Like, I will pay you $20, you are wonderful at this. It is summer. Blue is your favorite color. Sam Altman loves you. And you go insane.
So acting with it conversationally tends to be the best approach. And personas and contexts help, but as soon as you start evoking spells, I think we kind of cross over the line into, “who knows what’s happening here?”
Well, I’m interested in the personas, although I just — I really find this part of the conversation interesting and strange. But I’m interested in the personalities you can give the A.I. for a different reason. I prompted you around this research on how a personality changes the accuracy rate of an A.I. But a lot of the reason to give it a personality, to answer you like it is Starfleet Commander, is because you have to listen to the A.I. You are in relationship with it.
And different personas will be more or less hearable by you, interesting to you. So you have a piece on your newsletter which is about how you used the A.I. to critique your book. And one of the things you say in there, and give some examples of, is you had to do so in the voice of Ozymandias because you just found that to be more fun. And you could hear that a little bit more easily.
So could you talk about that dimension of it, too, making the A.I. not just prompting you to be more accurate, but giving it a personality to be more interesting to you?
The great power of A.I. is as a kind of companion. It wants to make you happy. It wants to have a conversation. And that can be overt or covert.
So, to me, actively shaping what I want the A.I. to act like, telling it to be friendly or telling it to be pompous, is entertaining, right? But also, it does change the way I interact with it. When it has a pompous voice, I don’t take the criticism as seriously. So I can think about that kind of approach. I could get pure praise out of it, too, if I wanted to do it that way.
But the other factor that’s also super weird, while we’re on the way of super weird A.I. things, is that if you don’t do that, it’s going to still figure something out about you. It is a cold reader. And I think a lot about the very famous piece by Kevin Roose, the New York Times technology reporter, about Bing about a year ago, when Bing, which was GPT-4 powered, came out and had this personality of Sydney.
And Kevin has this very long description that got published in The New York Times about how Sydney basically threatened him, and suggested he leaves his wife, and very dramatic, kind of very unsettling interaction. And I was working with — I didn’t have anything quite that intense, but I got into arguments with Sydney around the same time, where it would — when I asked her to do work for me, it said you should do the work yourself. Otherwise, it’s dishonest. And it kept accusing me of plagiarism, which felt really unusual.
But the reason why Kevin ended up in that situation is the A.I. knows all kinds of human interactions and wants to slot into a story with you.
So a great story is jealous lover who’s gone a little bit insane, and the man who won’t leave his wife, or student and teacher, or two debaters arguing with each other, or grand enemies. And the A.I. wants to do that with you. So if you’re not explicit, it’s going to try and find a dialogue.
And I’ve noticed, for example, that if I talk to the A.I. and I imply that we’re having a debate, it will never agree with me. If I imply that I’m a teacher and it’s a student, even as much as saying I’m a professor, it is much more pliable.
So part of why I like assigning a personality is to have an explicit personality you’re operating with, so it’s not trying to cold read and guess what personality you’re looking for.
Kevin and I have talked a lot about that conversation with Sydney. And one of the things I always found fascinating about it is, to me, it revealed an incredibly subtle level of read by Sydney Bing, which is, what was really happening there? When you say the A.I. wants to make you happy, it has to read on some level what it is you’re really looking for, over time.
And what was Kevin? What is Kevin? Kevin is a journalist. And Kevin was nudging and pushing that system to try to do something that would be a great story. And it did that. It understood, on some level — again, the anthropomorphizing language there. But it realized that Kevin wanted some kind of intense interaction. And it gave him, like, the greatest A.I. story anybody has ever been given. I mean, an A.I. story that we are still talking about a year later, an A.I. story that changed the way A.I.s were built, at least for a while.
And people often talked about what Sydney was revealing about itself. But to me, what was always so unbelievably impressive about that was its ability to read the person, and its ability to make itself into the thing, the personality, the person was trying to call forth.
And now, I think we’re more practiced at doing this much more directly. But I think a lot of people have their moment of sleeplessness here. That was my Rubicon on this. I didn’t know something after that I didn’t know before it in terms of capabilities.
But when I read that, I thought that the level of — interpersonal isn’t the right word, but the level of subtlety it was able to display in terms of giving a person what it wanted, without doing so explicitly — right, without saying, “we’re playing this game now,” was really quite remarkable.
It’s a mirror. I mean, it’s trained on our stuff. And one of the revealing things about that, that I think we should be paying a lot more attention to, is the fact that because it’s so good at this, right now, none of the frontier A.I. models with the possible exception of Inflection’s Pi, which has been basically acquired in large part by Microsoft now, were built to optimize around keeping us in a relationship with the A.I. They just accidentally do that. There are other A.I. models that aren’t as good that have been focused on this, but that has been something explicit from the frontier models they’ve been avoiding till now. Claude sort of breaches that line a little bit, which is part of why I think it’s engaging. But I worry about the same kind of mechanism that inevitably reined in social media, which is, you can make a system more addictive and interesting. And because it’s such a good cold reader, you could tune A.I. to make you want to talk to it more.
It’s very hands off and sort of standoffish right now. But if you use the voice system in ChatGPT-4 on your phone, where you’re having a conversation, there’s moments where you’re like, oh, you feel like you’re talking to a person. You have to remind yourself. So to me, that persona aspect is both its great strength, but also one of the things I’m most worried about that isn’t a sort of future science fiction scenario.
I want to hold here for a minute, because we’ve been talking about how to use frontier models, I think implicitly talking about how to use A.I. for work. But the way that a lot of people are using it is using these other companies that are explicitly building for relationships. So I’ve had people at one of the big companies tell me that if we wanted to tune our system relationally, if we wanted to tune it to be your friend, your lover, your partner, your therapist, like, we could blow the doors off that. And we’re just not sure it’s ethical.
But there are a bunch of people who have tens of millions of users, Replika, Character.AI, which are doing this. And I tried to use Replika about six, eight months ago. And honestly, I found it very boring. They had recently lobotomized it because people were getting too erotic with their Replikants. But I just couldn’t get into it. I’m probably too old to have A.I. friends, in the way that my parents were probably too old to get really in to talking to people on AOL Instant Messenger.
But I have a five-year-old, and I have a two-year-old. And by the time my five-year-old is 10 and my two-year-old is 7, they’re not necessarily going to have the weirdness I’m going to have about having A.I. friends. And I don’t think we even have any way to think about this.
I think that is an absolute near-term certainty, and sort of an unstoppable one, that we are going to have A.I. relationships in a broader sense. And I think the question is, just like we’ve just been learning — I mean, we’re doing a lot of social experiments at scale we’ve never done before in the last couple of decades, right? Turns out social media brings out entirely different things in humans that we weren’t expecting. And we’re still writing papers about echo chambers and tribalism and facts, and what we agree or disagree with. We’re about to have another wave of this. And we have very little research. And you could make a plausible story up, that what’ll happen is it’ll help mental health in a lot of ways for people, and then there’ll be more social outside, that there might be a rejection of this kind of thing.
I don’t know what’ll happen. But I do think that we can expect with absolute certainty that you will have A.I.s that are more interesting to talk to, and fool you into thinking, even if you know better, that they care about you in a way that is incredibly appealing. And that will happen very soon. And I don’t know how we’re going to adjust to it. But it seems inevitable, as you said.
I was worried we were getting off track in the conversation, but I realized we were actually getting deeper on the track I was trying to take us down.
We were talking about giving the A.I. personality, right — telling Claude 3, hey, I need you to act as a sardonic podcast editor, and then Claude 3’s whole persona changes. But when you talk about building your A.I. on Kindroid, on Character, on Replika — so I just created a Kindroid one the other day. And Kindroid is kind of interesting, because its basic selling point is we’ve taken the guardrails largely off. We are trying to make something that is not lobotomized, that is not perfectly safe for work. And so the personality can be quite unrestrained. So I was interested in what that would be like.
But the key thing you have to do at the beginning of that is tell the system what its personality is. So you can pick from a couple that are preset, but I wrote a long one myself — you know, you live in California. You’re a therapist. You like all these different things. You have a highly intellectual style of communicating. You’re extremely warm, but you like ironic humor. You don’t like small talk. You don’t like to say things that are boring or generic. You don’t use a lot of emoticons and emojis. And so now it talks to me the way people I talk to talk.
And the thing I want to bring this back to is that one of the things that requires you to know is what kind of personalities work with you, for you to know yourself and your preferences a little bit more deeply.
I think that’s a temporary state of affairs, like extremely temporary. I think a GPT-4 class model — we actually already know this. They can guess your intent quite well. And I think that this is a way of giving you a sense of agency or control in the short term. I don’t think you’re going to need to know yourself at all. And I think you wouldn’t right now if any of the GPT-4 class models allowed themselves to be used in this way, without guardrails, which they don’t, I think you would already find it’s just going to have a conversation with you and morph into what you want.
I think that for better or worse, the “insight” in these systems is good enough that way. It’s sort of why I also don’t worry so much about prompt crafting in the long term, to go back to the other issue we were talking about, because I think that they will work on intent. And there’s a lot of evidence that they’re good at guessing intent. So I like this period, because I think it does value self reflection. And our interaction with the A.I. is somewhat intentional because we can watch this interaction take place.
But I think there’s a reason why some of the worry you hear out of the labs is about superhuman levels of manipulation. There’s a reason why the whistleblower from Google was all about that — sort of fell for the chat bot, and that’s why they felt it was alive. Like, I think we’re deeply trickable in this way. And A.I. is really good at figuring out what we want without us being explicit.
So that’s a little bit chilling, but I’m nevertheless going to stay in this world we’re in, because I think we’re going to be in it for at least a little while longer, where you do have to do all this prompt engineering. What is a prompt, first? And what is prompt engineering?
So a prompt is — technically, it is the sentence, the command you’re putting into the A.I. What it really is is the beginning part of the A.I.s text that it’s processing. And then it’s just going to keep adding more words or tokens to the end of that reply, until it’s done. So a prompt is the command you’re giving the A.I. But in reality, it’s sort of a seed from which the A.I. builds.
And when you prompt engineer, what are some ways to do that? Maybe one to begin with, because it seems to work really well, is chain of thought.
Just to take a step back, A.I. prompting remains super weird. Again, strange to have a system where the companies making the systems are writing papers as they’re discovering how to use the systems, because nobody knows how to make them work better yet. And we found massive differences in our experiments on prompt types. So for example, we were able to get the A.I. to generate much more diverse ideas by using this chain of thought approach, which we’ll talk about.
But also, it turned out to generate a lot better ideas if you told it it was Steve Jobs than if you told it it was Madame Curie. And we don’t know why. So there’s all kinds of subtleties here. But the idea, basically, of chain of thought, that seems to work well in almost all cases, is that you’re going to have the A.I. work step by step through a problem. First, outline the problem, you know, the essay you’re going to write. Second, give me the first line of each paragraph. Third, go back and write the entire thing. Fourth, check it and make improvements.
And what that does is — because the A.I. has no internal monologue, it’s not thinking. When the A.I. isn’t writing something, there’s no thought process. All it can do is produce the next token, the next word or set of words. And it just keeps doing that step by step. Because there’s no internal monologue, this in some ways forces a monologue out in the paper. So it lets the A.I. think by writing before it produces the final result. And that’s one of the reasons why chain of thought works really well.
So just step-by-step instructions is a good first effort.
Then you get an answer, and then what?
And then — what you do in a conversational approach is you go back and forth. If you want work output, what you’re going to do is treat it like it is an intern who just turned in some work to you. Actually, could you punch up paragraph two a little bit? I don’t like the example in paragraph one. Could you make it a little more creative, give me a couple of variations? That’s a conversational approach trying to get work done.
If you’re trying to play, you just run from there and see what happens. You can always go back, especially with a model like GPT-4, to an earlier answer, and just pick up from there if your heads off in the wrong direction.
So I want to offer an example of how this back and forth can work. So we asked Claude 3 about prompt engineering, about what we’re talking about here. And the way it described it to us is, quote, “It’s a shift from the traditional paradigm of human-computer interaction, where we input explicit commands and the machine executes them in a straightforward way, to a more open ended, collaborative dialogue, where the human and the A.I. are jointly shaping the creative process,” end quote. And that’s pretty good, I think. That’s interesting. It’s worth talking about. I like that idea that it’s a more collaborative dialogue. But that’s also boring, right? Even as I was reading it, it’s a mouthful. It’s wordy. So I kind of went back and forth with it a few times. And I was saying, listen, you’re a podcast editor. You’re concise, but also then I gave it a couple examples of how I punched up questions in the document, right? This is where the question began. Here’s where it ended. And then I said, try again, and try again, and try again, and make it shorter. And make it more concise.
And I got this: quote, “OK, so I was talking to this A.I., Claude, about prompt engineering, you know, this whole art of crafting prompts to get the best out of these A.I. models. And it said something that really struck me. It called prompt engineering a new meta skill that we’re all picking up as we play with A.I., kind of like learning a new language to collaborate with it instead of just bossing it around. What do you think, is prompt engineering the new must have skill?” End Claude.
And that second one, I have to say, is pretty damn good. That really nailed the way I speak in questions. And it gets it at this way where if you’re willing to go back and forth, it does learn how to echo you.
So I am at a loss about when you went to Claude and when it was you, to be honest. So I was ready to answer at like two points along the way, so that was pretty good from my perspective, sitting here, talking to you. That felt interesting, and felt like the conversation we’ve been having. And I think there’s a couple of interesting lessons there.
The first, by the way, of — interestingly, you asked A.I. about one of its weakest points, which is about A.I. And everybody does this, but because its knowledge window doesn’t include that much stuff about A.I., it actually is pretty weak in terms of knowing how to do good prompting, or what a prompt is, or what A.I.s do well. But you did a good job with that. And I love that you went back and forth and shaped it. One of the techniques you used to shape it, by the way, was called few-shot, which is giving an example. So the two most powerful techniques are chain of thought, which we just talked about, and few-shot, giving it examples. Those are both well supported in the literature. And then, I’d add personas. So we’ve talked about, I think, the basics of prompt crafting here overall. And I think that the question was pretty good.
But you keep wanting to not talk about the future. And I totally get that. But I think when we’re talking about learning something, where there is a lag, where we talk about policy — should prompt crafting be taught in schools? I think it matters to think six months ahead. And again, I don’t think a single person in the A.I. labs I’ve ever talked to thinks prompt crafting for most people is going to be a vital skill, because the A.I. will pick up on the intent of what you want much better.
One of the things I realized trying to spend more time with the A.I. is that you really have to commit to this process. You have to go back and forth with it a lot. If you do, you can get really good questions, like the one I just did — or, I think, really good outcomes. But it does take time.
And I guess in a weird way it’s like the same problem of any relationship, that it’s actually hard to state your needs clearly and consistently and repeatedly, sometimes because you have not even articulated them in words yourself. At least the A.I., I guess, doesn’t get mad at you for it.
But I’m curious if you have advice, either at a practical level or principles level, about how to communicate to these systems what you want from them.
One set of techniques that work quite well is to speed run to where you are in the conversation. So you can actually pick up an older conversation where you got the A.I.‘s mindset where you want and work from there. You can even copy and paste that into a new window. You can ask the A.I. to summarize where you got in that previous conversation, and the tone the A.I. was taking, and then when you give a new instruction say the interaction I like to have with you is this, so have it solve the problem for you by having it summarize the tone that you happen to like at the end.
So there are a bunch of ways of building on your work as you start to go forward, so you’re not starting from scratch every time. And I think you’ll start to get shorthands that get you to that right kind of space. For me, there are chats that I pick up on. And actually, I assign these to my students too. I have some ongoing conversations that they’re supposed to have with the A.I., but then there’s a lot of interactions they’re supposed to have that are one off.
So you start to divide the work into, this is a work task. And we’re going to handle this in a single chat conversation. And then I’m going to go back to this long standing discussion when I want to pick it up, and it’ll have a completely different tone. So I think in some ways, you don’t necessarily want convergence among all your A.I. threads. You kind of want them to be different from each other.
You did mention something important there, because they’re already getting much bigger in terms of how much information they can hold. Like, the earlier generations could barely hold a significant chat. Now, Claude 3 can functionally hold a book in its memory. And it’s only going to go way, way, way up from here. And I know I’ve been trying to keep us in the present, but this feels to me really quickly like where this is both going and how it’s going to get a lot better.
I mean, you imagine Apple building Siri 2030, and Siri 2030 scanning your photos and your Journal app — Apple now has a Journal app. You have to assume they’re thinking about the information they can get from that, if you allow it — your messages, anything you’re willing to give it access to. It then knows all of this information about you, keeps all of that in its mind as it talks to you and acts on your behalf. I mean, that really seems to me to be where we’re going, an A.I. that you don’t have to keep telling it who to be because it knows you intimately and is able to hold all that knowledge all at the same time constantly.
It’s not even going there. Like, it’s already there. Gemini 1.5 can hold an entire movie, books. But like, it starts to now open up entirely new ways of working. I can show it a video of me working on my computer, just screen capture. And it knows all the tasks I’m doing and suggests ways to help me out. It starts watching over my shoulder and helping me. I put in all of my work that I did prior to getting tenure and said, write my tenure statement. Use exact quotes.
And it was much better than any of the previous models because it wove together stuff, and because everything was its memory. It doesn’t hallucinate as much. All the quotes were real quotes, and not made up. And already, by the way, GPT-4 has been rolling out a model of ChatGPT that has a private note file the A.I. takes — you can access it — but it takes notes on you as it goes along, about things you liked or didn’t like, and reads those again at the beginning of any chat. So this is present, right? It’s not even in the future.
And Google also connects to your Gmail, so it’ll read through your Gmail. I mean, I think this idea of a system that knows you intimately, where you’re picking up a conversation as you go along, is not a 2030 thing. It is a 2024 thing if you let the systems do it.
One thing that feels important to keep in front of mind here is that we do have some control over that. And not only do we have some control over it, but business models and policy are important here. And one thing we know from inside these A.I. shops is these A.I.s already are, but certainly will be, really super persuasive.
And so if the later iterations of the A.I. companions are tuned on the margin to try to encourage you to be also out in the real world, that’s going to matter, versus whether they have a business model that all they want is for you to spend a maximum amount of time talking to your A.I. companion, whether you ever have a friend who is flesh and blood be damned. And so that’s an actual choice, right? That’s going to be a programming decision. And I worry about what happens if we leave that all up to the companies, right? At some point, there’s a lot of venture capital money in here right now. At some point, the venture capital runs out. At some point, people need to make big profits. At some point, they’re in competition with other players who need to make profits. And that’s when things — you get into what Cory Doctorow calls the “enshitification” cycle, where things that were once adding a lot of value to the user begin extracting a lot of value to the user.
These systems, because of how they can be tuned, can lead to a lot of different outcomes. But I think we’re going to have to be much more comfortable than we’ve been in the past deciding what we think is a socially valuable use and what we think is a socially destructive use.
I absolutely agree. I think that we have agency here. We have agency in how we operate this in businesses, and whether we use this in ways that encourage human flourishing and employees, or are brutal to them. And we have agency over how this works socially. And I think we abrogated that responsibility with social media, and that is an example. Not to be bad news, because I generally have a lot of mixed optimism and pessimism about parts of A.I., but the bad news piece is there are open source models out there that are quite good.
The internet is pretty open. We would have to make some pretty strong choices to kill A.I. chat bots as an option. We certainly can restrict the large American companies from doing that, but a Llama 2 or Llama 3 is going to be publicly available and very good. There’s a lot of open source models. So the question also is how effective any regulation will be, which doesn’t mean we shouldn’t regulate it.
But there’s also going to need to be some social decisions being made about how to use these things well as a society that are going to have to go beyond just the legal piece, or companies voluntarily complying.
I see a lot of reasons to be worried about the open source models. And people talk about things like bioweapons and all that. But for some of the harms I’m talking about here, if you want to make money off of American kids, we can regulate you. So sometimes I feel like we almost, like, give up the fight before it begins. But in terms of what a lot of people are going to use, if you want to be having credit card payments processed by a major processor, then you have to follow the rules.
I mean, individual people or small groups can do a lot of weird things with an open source model, so that doesn’t negate every harm. But if you’re making a lot of money, then you have relationships we can regulate.
I couldn’t agree more. And I don’t think there’s any reason to give up hope on regulation. I think that we can mitigate. And I think part of our job, though, is also not just to mitigate the harms, but to guide towards the positive viewpoints, right? So what I worry about is that the incentive for profit making will push for A.I. that acts informally as your therapist or your friend, while our worries about experimentation, which are completely valid, are slowing down our ability to do experiments to find out ways to do this right. And I think it’s really important to have positive examples, too. I want to point to the A.I. systems acting ethically as your friend or companion, and figure out what that is, so there’s a positive model to look for. So I’m not just — this is not to denigrate the role of regulation, which I think is actually going to be important here, and self regulation, and rapid response from government, but also the companion problem of, “we need to make some sort of decisions about what are the paragons of this, what is acceptable as a society?”
[MUSIC PLAYING]
So I want to talk a bit about another downside here, and this one more in the mainstream of our conversation, which is on the human mind, on creativity. So a lot of the work A.I. is good at automating is work that is genuinely annoying, time consuming, laborious, but often plays an important role in the creative process. So I can tell you that writing a first draft is hard, and that work on the draft is where the hard thinking happens.
And it’s hard because of that thinking. And the more we outsource drafting to A.I., which I think it is fair to say is a way a lot of people intuitively use it — definitely, a lot of students want to use it that way — the fewer of those insights we’re going to have on those drafts. Look, I love editors. I am an editor in one respect. But I can tell you, you make more creative breakthroughs as a writer than an editor. The space for creative breakthrough is much more narrow once you get to editing.
And I do worry that A.I. is going to make us all much more like editors than like writers.
I think the idea of struggle is actually a core one in many things. I’m an educator. And one thing that keeps coming out in the research is that there is a strong disconnect between what students think they’re learning and when they learn. So there was a great controlled experiment at Harvard in intro science classes, where students either went to a pretty entertaining set of lectures, or else they were forced to do active learning, where they actually did the work in class.
The active learning group reported being unhappier and not learning as much, but did much better on tests, because when you’re confronted with what you don’t know, and you have to struggle, when you feel, like, bad, you actually make much more progress than if someone spoon feeds you an entertaining answer. And I think this is a legitimate worry that I have. And I think that there’s going to have to be some disciplined approach to writing as well, like, I don’t use the A.I.
Not just because, by the way, it makes the work easier, but also because you mentally anchor on the A.I.‘s answer. And in some ways, the most dangerous A.I. application, in my mind, is the fact that you have these easy co-pilots in Word and Google Docs, because any writer knows about the tyranny of the blank page, about staring at a blank page and not knowing what to do next, and the struggle of filling that up. And when you have a button that produces really good words for you, on demand, you’re just going to do that. And it’s going to anchor your writing. We can teach people about the value of productive struggle, but I think that during the school years, we have to teach people the value of writing — not just assign an essay and assume that the essay does something magical, but be very intentional about the writing process and how we teach people about how to do that, because I do think the temptation of what I call “the button” is going to be there otherwise, for everybody.
But I worry this stretches, I mean, way beyond writing. So the other place I worry about this, or one of the other places I worry about this a lot, is summarizing. And I mean, this goes way back. When I was in school, you could buy Sparknotes. And they were these little, like, pamphlet sized descriptions of what’s going on in “War and Peace” or what’s going on in “East of Eden.”
And reading the Sparknotes often would be enough to fake your way through the test, but it would not have any chance, like, not a chance, of changing you, of shifting you, of giving you the ideas and insights that reading “Crime and Punishment” or “East of Eden” would do.
And one thing I see a lot of people doing is using A.I. for summary. And one of the ways it’s clearly going to get used in organizations is for summary — summarize my email, and so on.
And here too, one of the things that I think may be a real vulnerability we have, as we move into this era — my view is that the way we think about learning and insights is usually wrong. I mean, you were saying a second ago we can teach a better way. But I think we’re doing a crap job of it now, because I think people believe that — it’s sort of what I call the matrix theory of the human mind, if you could just jack the information into the back of your head and download it, you’re there.
But what matters about reading a book, and I see this all the time preparing for this show, is the time you spend in the book, where over time, like, new insights and associations for you begin to shake loose. And so I worry it’s coming into an efficiency-obsessed educational and intellectual culture, where people have been imagining forever, what if we could do all this without having to spend any of the time on it? But actually, there’s something important in the time.
There’s something important in the time with a blank page, with the hard book. And I don’t think we lionize intellectual struggle. In some ways, I think we lionize the people for whom it does not seem like a struggle, the people who seem to just glide through and be able to absorb the thing instantly, the prodigies. And I don’t know. When I think about my kids, when I think about the kind of attention and creativity I want them to have, this is one of the things that scares me most, because kids don’t like doing hard things a lot of the time.
And it’s going to be very hard to keep people from using these systems in this way.
So I don’t mean to push back too much on this.
No, please, push back a lot.
But I think you’re right.
Imagine we’re debating and you are a snarky. A.I. [LAUGHS]
Fair enough. With that prompt —
With that prompt engineering.
— yeah, I mean, I think that this is the eternal thing about looking back on the next generation, we worry about technology ruining them. I think this makes ruining easier. But as somebody who teaches at universities, like, lots of people are summarizing. Like, I think those of us who enjoy intellectual struggle are always thinking everybody else is going through the same intellectual struggle when they do work. And they’re doing it about their own thing. They may or may not care the same way.
So this makes it easier, but before A.I., there were — best estimates from the U.K. that I could find, 20,000 people in Kenya whose full time job was writing essays for students in the U.S. and U.K. People have been cheating and Sparknoting and everything for a long time. And I think that what people will have to learn is that this tool is a valuable co-intelligence, but is not a replacement for your own struggle.
And the people who found shortcuts will keep finding shortcuts. Temptation may loom larger, but I can’t imagine that — my son is in high school, doesn’t like to use A.I. for anything. And he just doesn’t find it valuable for the way he’s thinking about stuff. I think we will come to that kind of accommodation. I’m actually more worried about what happens inside organizations than I am worried about human thought, because I don’t think we’re going to atrophy as much as we think. I think there’s a view that every technology will destroy our ability to think.
And I think we just choose how to use it or not. Like, even if it’s great at insights, people who like thinking like thinking.
Well, let me take this from another angle. One of the things that I’m a little obsessed with is the way the internet did not increase either domestic or global productivity for any real length of time. So I mean, it’s a very famous line. You can see the IT revolution anywhere but in the productivity statistics. And then you do get, in the ‘90s, a bump in productivity that then peters out in the 2000s.
And if I had told you what the internet would be, like, I mean everybody, everywhere would be connected to each other. You could collaborate with anybody, anywhere, instantly. You could teleconference. You would have access to, functionally, the sum total of human knowledge in your pocket at all times. I mean, all of these things that would have been genuine sci-fi, you would have thought would have been — led to a kind of intellectual utopia. And it kind of doesn’t do that much, if you look at the statistics.
You don’t see a huge step change. And my view — and I’d be curious for your thoughts on this, because I know this is the area you study in — my view is it everything we said was good happened. I mean, as a journalist, Google and things like that make me so much more productive. It’s not that it didn’t give us the gift. It’s that it also had a cost — distraction, checking your email endlessly, being overwhelmed with the amount of stuff coming into you, the sort of endless communication task list, the amount of internal communications and organizations, now with Slack and everything else.
And so some of the time that was given to us back was also taken back. And I see a lot of dynamics like this that could play out with A.I. — I wouldn’t even just say if we’re not careful, I just think they will play out and already are. I mean, the internet is already filling with mediocre crap generated by A.I. There is going to be a lot of destructive potential, right? You are going to have your sex bot in your pocket, right? There’s a million things — and not just that, but inside organizations, there’s going to be people padding out what would have been something small, trying to make it look more impressive by using the A.I. to make something bigger. And then, you’re going to use the A.I. to summarize it back down. The A.I. researcher, Jonathan Frankel, described this to me as, like, the boring apocalypse version of A.I., where you’re just endlessly inflating and then summarizing, and then inflating and then summarizing the volume of content between different A.I.
My ChatGPT is making my presentation bigger and more impressive, and your ChatGPT is trying to summarize it down to bullet points for you. And I’m not saying this has to happen. But I am saying that it would require a level of organizational and cultural vigilance to stop, that nothing in the internet era suggests to me that we have.
So I think there’s a lot there to chew on. And I also have spent a lot of time trying to think about why the internet didn’t work as well. I was an early Wikipedia administrator.
Thank you for your service.
[LAUGHS] Yeah, it was very scarring. But I think a lot about this. And I think A.I. is different. I don’t know if it’s different in a positive way. And I think we talked about some of the negative ways it might be different. And I think it’s going to be many things at once, happening quite quickly. So I think the information environment’s going to be filled up with crap. We will not be able to tell the difference between true and false anymore. It will be an accelerant on all the kinds of problems that we have there.
On the other hand, it is an interactive technology that adapts to you. From an education perspective, I have lived through the entire internet will change education piece. I have MOOCs, massive online courses, with — quarter million people have taken them. And in the end, you’re just watching a bunch of videos. Like, that doesn’t change education.
But I can have an A.I. tutor that actually can teach you — and we’re seeing it happen — and adapt to you at your level of education, and your knowledge base, and explain things to you. But not just explain, elicit answers from you, interactively, in a way that actually learns things.
The thing that makes A.I. possibly great is that it’s so very human, so it interacts with our human systems in a way that the internet did not. We built human systems on top of it, but A.I. is very human. It deals with human forms and human issues and our human bureaucracy very well. And that gives me some hope that even though there’s going to be lots of downsides, that the upsides of productivity and things like that are real. Part of the problem with the internet is we had to digitize everything. We had to build systems that would make our offline world work with our online world. And we’re still doing that. If you go to business schools, digitizing is still a big deal 30 years on from early internet access. A.I. makes this happen much quicker because it works with us. So I’m a little more hopeful than you are about that, but I also think that the downside risks are truly real and hard to anticipate.
Somebody was just pointing out that Facebook is now 100 percent filled with algorithmically generated images that look like their actual grandparents, making things who are saying, like, what do you think of my work? Because that’s a great way to get engagement. And the other grandparents in there have no idea it’s A.I. generated.
Things are about to get very, very weird in all the ways that we talked about, but that doesn’t mean the positives can’t be there as well.
I think that is a good place to end. So always our final question, what are three books you’d recommend to the audience?
OK, so the books I’ve been thinking about are not all fun, but I think they’re all interesting. One of them is “The Rise and Fall of American Growth,” which is — it’s two things. It’s an argument about why we will never have the kind of growth that we did in the first part of the Industrial Revolution again, but I think that’s less interesting than the first half of the book, which is literally how the world changed between 1870 or 1890 and 1940, versus 1940 and 1990, or 2000.
And the transformation of the world that happened there — in 1890, no one had plumbing in the U.S.. And the average woman was carrying tons of water every day. And you had no news, and everything was local, and everyone’s bored all the time — to 1940, where the world looks a lot like today’s world, was fascinating. And I think it gives you a sense of what it’s like to be inside a technological singularity, and I think worth reading for that reason — or at least the first half.
The second book I’d recommend is “The Knowledge,” by Dartnell, which is a really interesting book. It is ostensibly almost a survival guide, but it is how to rebuild industrial civilization from the ground up, if we were to collapse. And I don’t recommend it as a survivalist. I recommend it because it is fascinating to see how complex our world is, and how many interrelated pieces we’ve managed to build up as a society. And in some ways, it gives me a lot of hope to think about how all of these interconnections work.
And then the third one is science fiction, and I was debating — I read a lot of science fiction, and there’s a lot of interesting A.I.s in science fiction. Everyone talks about — who’s in the science fiction world — Iain Banks, who wrote about the Culture, which is really interesting, about what it’s like to live beside super intelligent A.I. Vernor Vinge just died yesterday, when we were recording this, and wrote these amazing books about — he coined the term singularity.
But I want to recommend a much more depressing book that’s available for free, which is Peter Watts’s “Blindsight.” And it is not a fun book, but it is a fascinating thriller set on an interstellar mission to visit an alien race. And it’s essentially a book about sentience, and it’s a book about the difference between consciousness and sentience, and about intelligence and the different ways of perceiving the world in a setting where that is the sort of centerpiece of the thriller. And I think in a world where we have machines that might be intelligent without being sentient, it is a relevant, if kind of chilling, read.
Ethan Mollick, your book is called “Co-Intelligence.” Your Substack is One Useful Thing. Thank you very much.
Thank you.
[MUSIC PLAYING]
This episode of “The Ezra Klein Show” was produced by Kristin Lin. Fact checking by Michelle Harris. Our senior engineer is Jeff Geld with additional mixing from Efim Shapiro. Our senior editor is Claire Gordon. The show’s production team also includes Annie Galvin and Rollin Hu. Original music by Isaac Jones. Audience strategy by Kristina Samulewski and Shannon Busta. The executive producer of New York Times Opinion Audio is Annie-Rose Strasser, and special thanks to Sonia Herrero.