www.newyorker.com /magazine/2021/12/06/the-science-of-mind-reading

The Science of Mind Reading

James Somers 10-13 minutes 11/24/2021

As the woman watched the slide show, the scanner tracked patterns of activation among her neurons. These patterns would be analyzed in terms of “voxels”—areas of activation that are roughly a cubic millimetre in size. In some ways, the fMRI data was extremely coarse: each voxel represented the oxygen consumption of about a million neurons, and could be updated only every few seconds, significantly more slowly than neurons fire. But, Norman said, “it turned out that that information was in the data we were collecting—we just weren’t being as smart as we possibly could about how we’d churn through that data.” The breakthrough came when researchers figured out how to track patterns playing out across tens of thousands of voxels at a time, as though each were a key on a piano, and thoughts were chords.

The origins of this approach, I learned, dated back nearly seventy years, to the work of a psychologist named Charles Osgood. When he was a kid, Osgood received a copy of Roget’s Thesaurus as a gift. Poring over the book, Osgood recalled, he formed a “vivid image of words as clusters of starlike points in an immense space.” In his postgraduate days, when his colleagues were debating how cognition could be shaped by culture, Osgood thought back on this image. He wondered if, using the idea of “semantic space,” it might be possible to map the differences among various styles of thinking.

Osgood conducted an experiment. He asked people to rate twenty concepts on fifty different scales. The concepts ranged widely: BOULDER, ME, TORNADO, MOTHER. So did the scales, which were defined by opposites: fair-unfair, hot-cold, fragrant-foul. Some ratings were difficult: is a TORNADO fragrant or foul? But the idea was that the method would reveal fine and even elusive shades of similarity and difference among concepts. “Most English-speaking Americans feel that there is a difference, somehow, between ‘good’ and ‘nice’ but find it difficult to explain,” Osgood wrote. His surveys found that, at least for nineteen-fifties college students, the two concepts overlapped much of the time. They diverged for nouns that had a male or female slant. MOTHER might be rated nice but not good, and COP vice versa. Osgood concluded that “good” was “somewhat stronger, rougher, more angular, and larger” than “nice.”

Osgood became known not for the results of his surveys but for the method he invented to analyze them. He began by arranging his data in an imaginary space with fifty dimensions—one for fair-unfair, a second for hot-cold, a third for fragrant-foul, and so on. Any given concept, like TORNADO, had a rating on each dimension—and, therefore, was situated in what was known as high-dimensional space. Many concepts had similar locations on multiple axes: kind-cruel and honest-dishonest, for instance. Osgood combined these dimensions. Then he looked for new similarities, and combined dimensions again, in a process called “factor analysis.”

When you reduce a sauce, you meld and deepen the essential flavors. Osgood did something similar with factor analysis. Eventually, he was able to map all the concepts onto a space with just three dimensions. The first dimension was “evaluative”—a blend of scales like good-bad, beautiful-ugly, and kind-cruel. The second had to do with “potency”: it consolidated scales like large-small and strong-weak. The third measured how “active” or “passive” a concept was. Osgood could use these three key factors to locate any concept in an abstract space. Ideas with similar coördinates, he argued, were neighbors in meaning.

For decades, Osgood’s technique found modest use in a kind of personality test. Its true potential didn’t emerge until the nineteen-eighties, when researchers at Bell Labs were trying to solve what they called the “vocabulary problem.” People tend to employ lots of names for the same thing. This was an obstacle for computer users, who accessed programs by typing words on a command line. George Furnas, who worked in the organization’s human-computer-interaction group, described using the company’s internal phone book. “You’re in your office, at Bell Labs, and someone has stolen your calculator,” he said. “You start putting in ‘police,’ or ‘support,’ or ‘theft,’ and it doesn’t give you what you want. Finally, you put in ‘security,’ and it gives you that. But it actually gives you two things: something about the Bell Savings and Security Plan, and also the thing you’re looking for.” Furnas’s group wanted to automate the finding of synonyms for commands and search terms.

They updated Osgood’s approach. Instead of surveying undergraduates, they used computers to analyze the words in about two thousand technical reports. The reports themselves—on topics ranging from graph theory to user-interface design—suggested the dimensions of the space; when multiple reports used similar groups of words, their dimensions could be combined. In the end, the Bell Labs researchers made a space that was more complex than Osgood’s. It had a few hundred dimensions. Many of these dimensions described abstract or “latent” qualities that the words had in common—connections that wouldn’t be apparent to most English speakers. The researchers called their technique “latent semantic analysis,” or L.S.A.

At first, Bell Labs used L.S.A. to create a better internal search engine. Then, in 1997, Susan Dumais, one of Furnas’s colleagues, collaborated with a Bell Labs cognitive scientist, Thomas Landauer, to develop an A.I. system based on it. After processing Grolier’s American Academic Encyclopedia, a work intended for young students, the A.I. scored respectably on the multiple-choice Test of English as a Foreign Language. That year, the two researchers co-wrote a paper that addressed the question “How do people know as much as they do with as little information as they get?” They suggested that our minds might use something like L.S.A., making sense of the world by reducing it to its most important differences and similarities, and employing this distilled knowledge to understand new things. Watching a Disney movie, for instance, I immediately identify a character as “the bad guy”: Scar, from “The Lion King,” and Jafar, from “Aladdin,” just seem close together. Perhaps my brain uses factor analysis to distill thousands of attributes—height, fashion sense, tone of voice—into a single point in an abstract space. The perception of bad-guy-ness becomes a matter of proximity.

In the following years, scientists applied L.S.A. to ever-larger data sets. In 2013, researchers at Google unleashed a descendant of it onto the text of the whole World Wide Web. Google’s algorithm turned each word into a “vector,” or point, in high-dimensional space. The vectors generated by the researchers’ program, word2vec, are eerily accurate: if you take the vector for “king” and subtract the vector for “man,” then add the vector for “woman,” the closest nearby vector is “queen.” Word vectors became the basis of a much improved Google Translate, and enabled the auto-completion of sentences in Gmail. Other companies, including Apple and Amazon, built similar systems. Eventually, researchers realized that the “vectorization” made popular by L.S.A. and word2vec could be used to map all sorts of things. Today’s facial-recognition systems have dimensions that represent the length of the nose and the curl of the lips, and faces are described using a string of coördinates in “face space.” Chess A.I.s use a similar trick to “vectorize” positions on the board. The technique has become so central to the field of artificial intelligence that, in 2017, a new, hundred-and-thirty-five-million-dollar A.I. research center in Toronto was named the Vector Institute. Matthew Botvinick, a professor at Princeton whose lab was across the hall from Norman’s, and who is now the head of neuroscience at DeepMind, Alphabet’s A.I. subsidiary, told me that distilling relevant similarities and differences into vectors was “the secret sauce underlying all of these A.I. advances.”

In 2001, a scientist named Jim Haxby brought machine learning to brain imaging: he realized that voxels of neural activity could serve as dimensions in a kind of thought space. Haxby went on to work at Princeton, where he collaborated with Norman. The two scientists, together with other researchers, concluded that just a few hundred dimensions were sufficient to capture the shades of similarity and difference in most fMRI data. At the Princeton lab, the young woman watched the slide show in the scanner. With each new image—beach, cave, forest—her neurons fired in a new pattern. These patterns would be recorded as voxels, then processed by software and transformed into vectors. The images had been chosen because their vectors would end up far apart from one another: they were good landmarks for making a map. Watching the images, my mind was taking a trip through thought space, too.

The larger goal of thought decoding is to understand how our brains mirror the world. To this end, researchers have sought to watch as the same experiences affect many people’s minds simultaneously. Norman told me that his Princeton colleague Uri Hasson has found movies especially useful in this regard. They “pull people’s brains through thought space in synch,” Norman said. “What makes Alfred Hitchcock the master of suspense is that all the people who are watching the movie are having their brains yanked in unison. It’s like mind control in the literal sense.”

One afternoon, I sat in on Norman’s undergraduate class “fMRI Decoding: Reading Minds Using Brain Scans.” As students filed into the auditorium, setting their laptops and water bottles on tables, Norman entered wearing tortoiseshell glasses and earphones, his hair dishevelled.

He had the class watch a clip from “Seinfeld” in which George, Susan (an N.B.C. executive he is courting), and Kramer are hanging out with Jerry in his apartment. The phone rings, and Jerry answers: it’s a telemarketer. Jerry hangs up, to cheers from the studio audience.

“Where was the event boundary in the clip?” Norman asked. The students yelled out in chorus, “When the phone rang!” Psychologists have long known that our minds divide experiences into segments; in this case, it was the phone call that caused the division.

Norman showed the class a series of slides. One described a 2017 study by Christopher Baldassano, one of his postdocs, in which people watched an episode of the BBC show “Sherlock” while in an fMRI scanner. Baldassano’s guess going into the study was that some voxel patterns would be in constant flux as the video streamed—for instance, the ones involved in color processing. Others would be more stable, such as those representing a character in the show. The study confirmed these predictions. But Baldassano also found groups of voxels that held a stable pattern throughout each scene, then switched when it was over. He concluded that these constituted the scenes’ voxel “signatures.”