«

多语神经科学家揭秘大脑如何解析语言

qimuai 发布于 阅读:30 一手编译


多语神经科学家揭秘大脑如何解析语言

内容来源:https://www.quantamagazine.org/the-polyglot-neuroscientist-resolving-how-the-brain-parses-language-20251205/

内容总结:

【神经科学新发现:人脑内置“语言处理器”,功能类似大语言模型】

在人工智能语言模型日益普及的今天,一项持续15年的脑科学研究揭示:人类大脑中可能天生内置着一个类似大语言模型(LLM)的“生物处理器”。麻省理工学院认知神经科学家叶夫根尼娅·费多谢科(Ev Fedorenko)领导的研究团队通过脑成像技术,首次系统描绘出人脑中专门处理语言的核心网络。

这个被称为“语言网络”的系统位于大脑左半球,由额叶皮层三个区域和颞叶中回若干区域构成。其体积虽仅相当于一颗草莓,却承担着关键的语言解码功能——存储词语与意义的对应关系,并按照语法规则组合词语。费多谢科指出:“它就像一个高级解析器,负责将语言碎片组合成结构,而真正的思考过程发生在该网络之外。”

与人工智能语言模型不同,人脑语言网络并非简单生成流畅文本,而是充当外部感知(语音、文字、手语)与大脑内部意义表征(情景记忆、社会认知等)之间的翻译接口。研究显示,即使面对“无色的绿色想法愤怒地沉睡”这类无意义的句子,该网络的活动强度与处理正常语句时无异,说明其更侧重于语言结构本身而非内容逻辑。

费多谢科团队通过对1400多人的脑部扫描发现,虽然具体定位存在个体差异,但所有典型成年人大脑中都存在这一稳定网络。当该网络受损时,即使高级认知功能完好,患者仍可能出现失语症状——思想被困于大脑中无法表达,也无法准确理解他人话语。

这位精通六国语言的科学家强调,语言网络与负责发声运动的布罗卡区有本质区别。布罗卡区主要控制口腔肌肉运动,而语言网络的核心功能是建立形式与意义的映射关系,实现思维与语言序列的双向转换。“我们学习语言,本质上就是在不断更新这套映射规则库。”费多谢科解释道。

该研究为理解语言与思维的关系提供了新视角。费多谢科坦言,最初她也认为语言是高级思维的核心组成部分,但实证研究表明,人脑确实存在专门处理语言结构的“自动化系统”。这或许能解释为何有些人能说出流利却空洞的话语——正如早期大语言模型的表现。

这项发表于《自然·神经科学评论》的研究将人脑语言网络定义为“自然类别”,认为其与消化系统类似,是大脑中独立的功能单元。未来,团队计划进一步探索该网络中是否存在对特定语言特征响应的专属神经元,这将为揭示人类语言习得的生物机制打开新窗口。

中文翻译:

通晓多语的神经科学家,破解大脑如何解析语言之谜

即便在大语言模型和人工智能聊天机器人司空见惯的今天,人们仍难以完全接受流畅的文字可以出自一台没有思想的机器。因为对我们许多人而言,寻找恰当的词语是思维过程的关键部分——而非某个独立流程的产物。

但倘若我们的神经生物学现实,本就包含一个运作方式类似大语言模型的系统呢?早在ChatGPT崛起之前,认知神经科学家埃夫·费多谢科就已开始研究语言如何在成人大脑中运作。她所描述的这个专门系统,她称之为“语言网络”,负责映射词语与其含义之间的对应关系。她的研究表明,在某种程度上,我们大脑中确实携带着一个生物版本的大语言模型——即一个无意识的语言处理器。

“你可以把语言网络想象成一组指针,”费多谢科说。“它就像一张地图,告诉你在大脑的哪个位置可以找到不同类型的意义。它本质上是一个高级解析器,帮助我们整合碎片——然后所有的思考和有趣的事情都发生在它的边界之外。”

过去15年里,费多谢科在麻省理工学院的实验室里一直在收集关于这个语言网络的生物学证据。与大语言模型不同,人类的语言网络并非在“无人驾驶”状态下将词语串成听起来合理的模式;相反,它充当外部感知(如言语、文字和手语)与大脑其他部分编码的意义表征(包括情景记忆和社交认知,这些是大语言模型所不具备的)之间的翻译器。人类的语言网络也并非特别庞大:如果将其所有组织聚在一起,大约只有一颗草莓大小。但一旦受损,影响却极为深远。受损的语言网络可能导致各种形式的失语症,患者的高级认知能力可能完好无损,却困于一个无法表达或无法区分传入词语的大脑之中。

费多谢科很早就对语言产生了兴趣。20世纪80年代,她在苏联长大,母亲除了让她学习母语俄语外,还让她学习了五种语言(英语、法语、德语、西班牙语和波兰语)。尽管经历了该国共产主义解体带来的严重困苦——费多谢科说她“熬过了几年饥饿的日子”——她仍是一名优秀的学生,并获得了哈佛大学的全额奖学金。在那里,她最初计划学习语言学,但后来增加了心理学作为第二专业。“语言学课程很有趣,但感觉有点像解谜,并非真正探究事物在现实中如何运作,”她说。

在麻省理工学院攻读研究生三年后,费多谢科再次转向,这次投身于神经科学。她开始与南希·坎维舍合作,后者首次发现了梭状回面孔区——一个专门负责面部识别的大脑区域。费多谢科想为语言找到类似的东西。她面临着艰巨的任务。“那时,关于这个主题已发表的东西几乎都能读完,而我认为其基础相当薄弱,”费多谢科说。“你可以想象,这个评估并不那么受某些人欢迎。但过了一段时间,他们发现我并没有放弃。”

随着研究成果不断涌现,2024年,费多谢科在《自然综述:神经科学》上发表了一篇全面综述,将人类语言网络定义为一种“自然类别”:这是一组整合的区域,专门负责语言,存在于“每个典型的成人大脑”中,她写道。

《量子杂志》就语言网络如何类似消化系统、她对语言解码器工作原理的了解,以及她是否真的相信人们头脑中有大语言模型等问题,采访了费多谢科。为求清晰,对话内容经过压缩和编辑。

什么是语言网络?

成人大脑中有一组核心区域,作为一个相互连接的系统,负责计算语言结构。它们存储着词语与意义之间的映射关系,以及如何组合词语的规则。当你学习一门语言时,你学的就是这些:学习这些映射关系和规则。这使我们能够以极其灵活的方式使用这种“代码”。你可以在任何你懂的语言中,在思想和词语序列之间进行转换。

这听起来很抽象。但你称语言网络为“自然类别”——这是否意味着它是你可以指出的物理实体,就像消化系统一样?

完全正确。人们在大脑中发现的这些系统,包括语言网络和视觉系统的某些部分,就像器官一样。例如,梭状回面孔区就是一个自然类别:它可以有意义地被定义为一个单元。在语言网络中,大多数人主要有三个区域位于额叶皮层。这三个区域都在左额叶的侧面。还有几个区域沿着颞中回分布,那是沿着整个颞叶的一大块组织。这些是核心区域。

你可以通过几种不同的方式看到其统一性。例如,如果你让人进入功能磁共振成像扫描仪,可以观察他们对语言刺激与对照条件的反应。这些区域总是一起活动。我们现在已经扫描了大约1400人,可以建立一个概率图,来估计这些区域可能的位置。具体的地形因人而异,但总体模式非常一致。在那些广泛的额叶和颞叶区域内的某个地方,每个人都会有一些可靠地进行语言计算的组织。

这与已知与语言相关的其他大脑解剖部分(如布罗卡区)有何不同?

布罗卡区实际上极具争议。我不会称它为语言区域;它是一个发音运动规划区域。此刻,它正在参与规划我口腔肌肉的运动,以便我能说出正在说的话。但我也可以说一堆无意义的词,它同样会被激活。所以,这是一个接收语音的某种声音层面表征,并计算出你需要的一系列运动动作的区域。它是语言网络将信息发送到的一个下游区域。

你还说过,语言不等于思想。那么,如果语言网络不产生言语,也不参与思考,它在做什么?

语言网络基本上是低级感知和运动成分与高级、更抽象的意义和推理表征之间的接口。

我们用语言做两件事。在语言产生中,你有一个模糊的想法,然后你有一个词汇库——不仅仅是单词,还有更大的结构以及连接它们的规则。你在其中搜索,试图找到一个用结构化的词语序列来表达你想传达的意义的方式。一旦有了这个话语,你就去运动系统把它说出来、写下来或用手语表达。

在语言理解中,过程相反。它始于声波撞击你的耳朵或光线照射你的视网膜。你对输入进行一些基本的感知处理,以提取词语序列或话语。然后语言网络对其进行解析,在话语中找到熟悉的片段,并将它们用作指向存储的意义表征的指针。

在这两种情况下,语言网络都是这些形式到意义映射的存储库。这是一个流动的存储库,我们一生都在不断更新。但一旦我们掌握了这个代码,我们就可以灵活地用它来接收思想并表达出来,以及接收他人的词语序列并从中解码意义。

我们为什么有这个系统?这样我们就能分享我们的思想。没有心灵感应,对吧?

这种生物专门化深入到什么程度?语言网络中是否存在对特定话语做出反应的单个细胞,类似于概念神经元只对特定概念做出反应?

我怀疑它在系统内有点分布式,因为语言是非常情境化的。但没错,很可能存在对语言的特定方面做出反应的细胞。

加州大学洛杉矶分校伊扎克·弗里德团队有一篇预印本论文,研究单个细胞,并发现了与我们通过功能磁共振成像和群体水平颅内记录发现的相同特性。例如,细胞会以类似的方式对书面语言和听觉语言做出反应。而语言网络正是你会寻找这些细胞的地方。

大脑学习哪些类型的模式或特征?

大脑的一般物体识别机制与语言网络处于相同的抽象水平。它与一些高级视觉区域(如下颞叶皮层存储物体形状片段,或梭状回面孔区存储基本的面孔模板)没有太大不同。你使用这些表征来帮助你识别世界上的物体,但它们与我们的世界知识是分离的。

语言学家诺姆·乔姆斯基著名的无意义句子例子——“无色的绿色思想愤怒地睡觉”——在这里很适用。你大概知道它是什么意思,但你无法将其与世界上的任何事物联系起来,因为它没有意义。我们和其他几个团队有证据表明,语言网络对那些“无色的绿色”类型的句子,与对告诉我们有意义内容的合理句子的反应强度相同。我不想称之为“愚蠢”,但它是一个相当浅层的系统。

这听起来几乎像是你在说,每个人的大脑里基本上都有一个LLM。你是这个意思吗?

差不多。我认为语言网络在许多方面与早期的大语言模型非常相似,后者学习语言的规律以及词语之间的相互关系。这不难想象,对吧?我相信你遇到过一些人,他们能说出非常流利的语言,你听了一会儿,然后觉得:里面没有任何连贯的东西。但听起来非常流利。而且他们的大脑并没有受到物理损伤!

尽管如此,人类像ChatGPT一样用某种无意识的东西产生语言,这个想法似乎有违直觉。

是的——包括对我自己!当我开始这项研究时,我认为语言是高级思维的核心部分。当时有一种观点认为,也许人类只是非常擅长表征和提取层级结构,这当然是语言的一个关键特征,但也存在于数学、音乐和社交认知等其他领域。所以我完全期待这个网络的某些部分会是这些非常通用的、处理层级结构的处理器。但经验事实证明并非如此。早在2011年,就已经很清楚,该系统的所有部分都相当专门用于语言。如果你是一名科学家,你只需更新你的信念并接受它。

英文来源:

The Polyglot Neuroscientist Resolving How the Brain Parses Language
Introduction
Even in a world where large language models (LLMs) and AI chatbots are commonplace, it can be hard to fully accept that fluent writing can come from an unthinking machine. That’s because, to many of us, finding the right words is a crucial part of thought — not the outcome of some separate process.
But what if our neurobiological reality includes a system that behaves something like an LLM? Long before the rise of ChatGPT, the cognitive neuroscientist Ev Fedorenko began studying how language works in the adult human brain. The specialized system she has described, which she calls “the language network,” maps the correspondences between words and their meanings. Her research suggests that, in some ways, we do carry around a biological version of an LLM — that is, a mindless language processor — inside our own brains.
“You can think of the language network as a set of pointers,” Fedorenko said. “It’s like a map, and it tells you where in the brain you can find different kinds of meaning. It’s basically a glorified parser that helps us put the pieces together — and then all the thinking and interesting stuff happens outside of [its] boundaries.”
Fedorenko has been gathering biological evidence of this language network for the past 15 years in her lab at the Massachusetts Institute of Technology. Unlike a large language model, the human language network doesn’t string words into plausible-sounding patterns with nobody home; instead, it acts as a translator between external perceptions (such as speech, writing and sign language) and representations of meaning encoded in other parts of the brain (including episodic memory and social cognition, which LLMs don’t possess). Nor is the human language network particularly large: If all of its tissue were clumped together, it would be about the size of a strawberry. But when it is damaged, the effect is profound. An injured language network can result in forms of aphasia in which sophisticated cognition remains intact but trapped within a brain unable to express it or distinguish incoming words from others.
Fedorenko came by her interest in language early. In the 1980s, when she was growing up in the Soviet Union, her mother made her learn five languages (English, French, German, Spanish and Polish) in addition to her native Russian. Despite significant privations related to the fall of communism in that country — Fedorenko “lived through a few years of being hungry,” she said — she was a strong student and earned a full scholarship to Harvard University. There, she initially planned to study linguistics but later added a second major in psychology. “The [linguistics] classes were interesting, but they felt kind of like puzzle-solving, not really figuring out how things work in reality,” she said.
Three years into her graduate studies at MIT, Fedorenko pivoted again, this time into neuroscience. She began collaborating with Nancy Kanwisher, who had first identified the fusiform face area, a brain region specialized for facial recognition. Fedorenko wanted to find the same thing for language. She had her work cut out for her. “At that point, it was possible to read pretty much everything that was published [on the subject], and I thought the foundations were pretty weak,” Fedorenko said. “As you can imagine, that [assessment] was not so popular with some people. But after a while they saw I was not going away.”
Following a steady stream of findings, in 2024 Fedorenko published a comprehensive review in Nature Reviews Neuroscience defining the human language network as a “natural kind”: an integrated set of regions, exclusively specialized for language, that resides in “every typical adult human brain,” she wrote.
Quanta spoke to Fedorenko about how the language network is like the digestive system, what she knows about how the language decoder works, and whether she really believes that people have LLMs inside their heads. The conversation has been condensed and edited for clarity.
What is the language network?
There’s a core set of areas in adult brains that acts as an interconnected system for computing linguistic structure. They store the mappings between words and meanings, and rules for how to put words together. When you learn a language, that’s what you learn: You learn these mappings and the rules. And that allows us to use this “code” in incredibly flexible ways. You can convert between a thought and a word sequence in any language that you know.
That sounds very abstract. But you call the language network a “natural kind” — does that mean it’s something physical you can point to, like the digestive system?
That’s exactly right. These systems that people have discovered [in the brain], including the language network and some parts of the visual system, are like organs. For example, the fusiform face area is a natural kind: It’s meaningfully definable as a unit. In the language network, there are basically three areas in the frontal cortex in most people. All three of them are on the side of the left frontal lobe. There’s also a couple of areas that fall along the side of the middle temporal gyrus, this big hunk of meat that goes along the whole temporal lobe. Those are the core areas.
You can see the unity in a few different ways. For example, if you put people in an [fMRI, or functional magnetic resonance imaging], scanner, you can look at responses to language versus some control condition. Those regions always go together. We’ve now scanned about 1,400 people, and we can build up a probabilistic map, which estimates where those regions will tend to be. The topography is a little bit variable across people, but the general patterns are very consistent. Somewhere within those broad frontal and temporal areas, everybody will have some tissue that is reliably doing linguistic computations.
How is this different from other parts of brain anatomy known to be associated with language, such as Broca’s area?
Broca’s area is actually incredibly controversial. I would not call it a language region; it’s an articulatory motor-planning region. Right now, it’s being engaged to plan the movements of my mouth muscles in a way that allows me to say what I’m saying. But I could say a bunch of nonsense words, and it would be just as engaged. So it’s an area that takes some sound-level representation of speech and figures out the set of motor movements you would need [to produce it]. It’s a downstream region that the language network sends information to.
You’ve also said that language isn’t the same as thought. So if the language network isn’t producing speech, and it’s also not involved in thinking, what is it doing?
The language network is basically an interface between lower-level perceptual and motor components and the higher-level, more abstract representations of meaning and reasoning.
There are two things we do with language. In language production, you have this fuzzy thought, and then you have a vocabulary — not just of words, but larger constructions, and rules for how to connect them. You search through it to find a way to express the meaning you’re trying to convey using a structured sequence of words. Once you have that utterance, then you go to the motor system to say it out loud, write it or sign it.
In language comprehension, it’s the inverse. It starts with sound waves hitting your ear or light hitting your retina. You do some basic perceptual crunching of that input to extract a word sequence or utterance. Then the language network parses that, finding familiar chunks in the utterance and using them as pointers to stored representations of meaning.
For both cases, the language network is a store of these form-to-meaning mappings. It’s a fluid store that we keep updating throughout our lives. But as soon as we know this code, we can flexibly use it to both take a thought and express it, and take somebody else’s word sequence and decode meaning from it.
Why do we have this system? So we can take our thoughts and share them. There’s no telepathy, right?
How far down does this biological specialization go? Are there individual cells in the language network that respond to certain utterances, akin to how concept neurons only respond to specific concepts?
I suspect it’s a bit distributed within the system because language is very contextualized. But yes, there may well be cells that respond to particular aspects of language.
There’s a preprint, from Itzhak Fried’s group at UCLA, looking at single cells and finding some of the same properties that we found with [fMRI] imaging and population-level intracranial recordings. For example, cells will respond to both written and auditory language in similar ways. And the language network is where you would look for those cells.
What kinds of patterns or features get learned?
The brain’s general object-recognition machinery is at the same level of abstractness as the language network. It’s not so different from some higher-level visual areas such as the inferotemporal cortex storing bits of object shapes, or the fusiform face area storing a basic face template. You use those representations to help you recognize objects in the world, but they’re disconnected from our world knowledge.
[Linguist Noam] Chomsky’s famous example of a nonsense sentence — “Colorless green ideas sleep furiously” — comes in handy here. You kind of know what it means, but you can’t relate it to anything about the world because it doesn’t make sense. We and a few other groups have evidence that the language network will respond just as strongly to those “colorless green”–type sentences as it does to plausible sentences that tell us something meaningful. I don’t want to call it “dumb,” but it’s a pretty shallow system.
It almost sounds like you’re saying there’s essentially an LLM inside everyone’s brain. Is that what you’re saying?
Pretty much. I think the language network is very similar in many ways to early LLMs, which learn the regularities of language and how words relate to each other. It’s not so hard to imagine, right? I’m sure you’ve encountered people who produce very fluent language, and you kind of listen to it for a while, and you’re like: There’s nothing coherent there. But it sounds very fluent. And that’s with no physical injury to their brain!
Still, the idea that humans produce language with something mindless, like ChatGPT, seems counterintuitive.
Yes — including to me! When I started [this research], I thought that language is a really core part of high-level thought. There was this notion that maybe humans are just really good at representing and extracting hierarchical structures, which of course are a key signature of language, but are also present in other domains like math and music and aspects of social cognition. So I was fully expecting that some parts of this network would be these very domain-general, hierarchical processors. And that just turns out empirically not to be the case. Back in 2011, it was already clear that all parts of the system are quite specialized for language. If you’re a scientist, you just update your beliefs and roll with it.

quanta

文章目录


    扫描二维码,在手机上阅读