不同的人工智能模型在编码现实时似乎趋于一致。

qimuai 发布于 2026-1-9 19:10 阅读：28 一手编译

内容来源：https://www.quantamagazine.org/distinct-ai-models-seem-to-converge-on-how-they-encode-reality-20260107/

内容总结：

人工智能模型呈现“认知趋同”迹象，或揭示其内部表征存在统一规律

近期，一项由麻省理工学院研究团队提出的“柏拉图式表征假说”在人工智能学界引发广泛讨论。该假说认为，尽管不同类型的人工智能模型通过截然不同的数据（如文本、图像）进行训练，但随着模型性能提升，它们对现实世界的内部表征方式正逐渐趋向一致。

这一假说灵感来源于柏拉图著名的“洞穴寓言”：将人工智能模型比作洞穴中的囚徒，只能通过数据流（即现实世界的“影子”）感知世界。研究团队指出，当语言模型与视觉模型分别处理关于同一概念（如“狗”）的文本和图像信息时，其内部生成的数学表征（即高维向量）在几何结构上展现出相似性。论文资深作者菲利普·伊索拉比喻道：“语言模型和视觉模型之所以能对齐，是因为它们都是同一世界的影子。”

为验证这一现象，研究人员采用“表征相似性分析”方法，通过比较不同模型对同一组概念（如动物词汇）的内部向量聚类结构，发现性能更强的模型之间表征相似度更高。纽约大学的AI研究员伊利亚·苏霍卢茨基将此过程描述为“测量相似性的相似性”。

然而，该假说尚未形成共识。反对观点认为，当前实验多基于高度匹配的图文数据集（如维基百科配图），未能充分涵盖现实世界中难以跨模态转化的信息特征。加州大学伯克利分校研究员阿列克谢·埃夫罗斯指出：“人们去艺术博物馆而非仅阅读目录，正因有些体验无法被简单转换。”

尽管存在争议，该研究方向已催生实际应用探索。去年夏季，研究者成功实现了不同语言模型间句子表征的迁移转换；跨模态表征的互通性也为融合文本与视觉的多模态模型训练提供了新思路。

学界普遍认为，对于参数量达万亿级别的复杂AI系统，任何单一理论都难以完全解释其运作机制。不列颠哥伦比亚大学研究员杰夫·克伦强调：“答案注定是复杂的。”但正如伊索拉所言，科学探索的意义在于“发现普遍规律”——无论最终结论如何，这场关于AI认知本质的辩论正在推动学界更深入地窥探智能的数学内核。

中文翻译：

不同的人工智能模型似乎在如何编码现实方面趋于一致

引言

读一个关于狗的故事，下次你在公园里看到一只狗蹦跳而过时，你可能会想起这个故事。这之所以可能，是因为你对"狗"有一个统一的概念，这个概念并不局限于文字或图像。无论是斗牛犬还是边境牧羊犬，无论是吠叫还是被抚摸肚子，狗可以有很多种形态，但它始终是狗。

人工智能系统并不总是如此幸运。这些系统通过一个称为训练的过程来学习，这个过程需要消化大量的数据。通常，这些数据都是同一类型的——语言模型使用文本，计算机视觉系统使用图像，而设计用于预测分子气味或蛋白质结构的系统则使用更奇特的数据类型。那么，语言模型和视觉模型对狗的共同理解达到了什么程度呢？

研究人员通过窥探人工智能系统内部，研究它们如何表示场景和句子来探讨这些问题。越来越多的研究发现，不同的人工智能模型可以发展出相似的表示，即使它们使用不同的数据集或完全不同的数据类型进行训练。此外，一些研究表明，随着模型能力的增强，这些表示也变得越来越相似。在2024年的一篇论文中，麻省理工学院的四位人工智能研究人员认为，这些趋同的迹象并非偶然。他们的想法被称为"柏拉图式表示假说"，在研究人员中引发了热烈的辩论和一系列后续工作。

该团队的假说得名于希腊哲学家柏拉图在2400年前提出的一个寓言。在这个寓言中，被困在洞穴里的囚犯只能通过外部物体投射的影子来感知世界。柏拉图认为，我们都像那些不幸的囚犯。在他看来，我们在日常生活中遇到的物体，是存在于感官无法触及的超验领域中的理想"形式"的苍白影子。

柏拉图式表示假说则不那么抽象。在这个版本的比喻中，洞穴之外的是真实世界，它以数据流的形式投射出机器可读的影子。人工智能模型就是囚犯。麻省理工学院团队的主张是，仅接触到数据流的、非常不同的模型，正在开始就数据背后的世界形成一个共享的"柏拉图式表示"。

"为什么语言模型和视觉模型会一致？因为它们都是同一个世界的影子，"该论文的资深作者菲利普·伊索拉说。

并非所有人都信服。争论的一个主要焦点在于应该关注哪些表示。你无法检查一个语言模型对每个可能句子的内部表示，也无法检查一个视觉模型对每张图像的表示。那么，你如何决定哪些表示具有代表性呢？你从哪里寻找这些表示？又如何在截然不同的模型之间比较它们？研究人员不太可能很快就柏拉图式表示假说达成共识，但这并没有困扰到伊索拉。

"一半的研究人员说这是显而易见的，另一半说这显然是错的，"他说。"我们对这种反应感到满意。"

相伴之词

如果人工智能研究人员在柏拉图的问题上无法达成一致，他们可能会与他的前辈毕达哥拉斯找到更多共同点。毕达哥拉斯的哲学据称始于"万物皆数"的前提。这恰当地描述了驱动人工智能模型的神经网络。它们对单词或图片的表示，只是一长串数字，每个数字表示一个特定人工神经元的激活程度。

为了简化数学运算，研究人员通常单独关注神经网络的某一层，这类似于在特定时刻对特定区域的大脑活动进行快照。他们将这一层的神经元激活记录为一个称为向量的几何对象——一个在抽象空间中指向特定方向的箭头。现代人工智能模型的每一层都有成千上万个神经元，因此它们的表示是高维向量，无法直接可视化。但向量使得比较网络的表示变得容易：如果对应的向量指向相似的方向，那么两种表示就是相似的。

在单个AI模型内部，相似的输入往往具有相似的表示。例如，在语言模型中，表示"狗"这个词的向量将相对接近表示"宠物"、"吠叫"和"毛茸茸"的向量，而远离"柏拉图式的"和"糖蜜"。这是对60多年前英国语言学家约翰·鲁珀特·弗斯那句令人难忘的话的精确数学实现："观其伴，知其词。"

那么不同模型中的表示呢？直接比较来自不同网络的激活向量没有意义，但研究人员设计了间接方法来评估表示的相似性。一种流行的方法是采纳弗斯那句精辟引语的教训，衡量两个模型对某个输入的表示是否"相伴"相同。

假设你想比较两个语言模型如何表示动物词汇。首先，你会编制一个单词列表——狗、猫、狼、水母等等。然后，你将把这些单词输入两个网络，并记录它们对每个单词的表示。在每个网络中，这些表示将形成一组向量簇。然后你可以问：这两个向量簇的整体形状有多相似？

"这可以描述为测量相似性的相似性，"纽约大学的人工智能研究员伊利亚·苏霍卢茨基说。

在这个简单的例子中，你会预期两个模型之间存在一些相似性——例如，"猫"向量在两个网络中可能都接近"狗"向量，而"水母"向量则指向不同的方向。但这两个簇可能看起来并不完全相同。"狗"是更像"猫"还是更像"狼"？如果你的模型是用不同的数据集训练的，或者基于不同的网络架构构建的，它们可能无法达成一致。

研究人员在2010年代中期开始用这种方法探索AI模型之间的表示相似性，发现不同模型对相同概念的表示通常是相似的，尽管远非完全相同。有趣的是，一些研究发现，更强大的模型似乎比更弱的模型在表示上具有更多的相似性。一篇2021年的论文将这种现象称为"安娜·卡列尼娜情景"，暗指托尔斯泰经典小说的开篇第一句话。也许成功的人工智能模型都是相似的，而每一个不成功的模型都以自己的方式不成功。

那篇论文，就像许多关于表示相似性的早期工作一样，只关注计算机视觉，而当时计算机视觉是人工智能研究中最热门的分支。强大语言模型的出现即将改变这一点。对伊索拉来说，这也是一个机会，可以看看表示相似性到底能走多远。

趋同进化

柏拉图式表示假说论文的故事始于2023年初，那是人工智能研究人员经历动荡的时期。ChatGPT在几个月前发布，越来越明显的是，仅仅扩大人工智能模型的规模——用更多数据训练更大的神经网络——就能使它们在许多不同的任务上表现得更好。但原因尚不清楚。

"人工智能研究的每个人都在经历一场存在主义的生活危机，"OpenAI研究员、当时是伊索拉实验室研究生的闵英·胡说道。他开始定期与伊索拉以及他们的同事布莱恩·张和汪同舟会面，讨论扩展规模如何影响内部表示。

想象一种情况：多个模型在相同的数据上训练，更强的模型学到了更相似的表示。这不一定是因为这些模型创造了更准确的世界映像。它们可能只是更擅长掌握训练数据集的特性。

现在考虑在不同数据集上训练的模型。如果它们的表示也趋同，那将是更有力的证据，表明模型在掌握数据背后世界的共同特征方面变得更好。从完全不同的数据类型（如语言和视觉模型）中学习的模型之间的趋同，将提供更有力的证据。

在他们最初对话的一年之后，伊索拉和他的同事们决定写一篇论文，回顾关于趋同表示的证据，并提出支持柏拉图式表示假说的论点。

那时，其他研究人员已经开始研究视觉和语言模型表示之间的相似性。胡进行了自己的实验，他在一个来自维基百科的带标题图片数据集上测试了一组五个视觉模型和11个不同大小的语言模型。他将图片输入视觉模型，将标题输入语言模型，然后比较两种模型中向量簇的相似性。他观察到，随着模型变得更强大，表示相似性稳步增加。这正是柏拉图式表示假说所预测的。

寻找共性

当然，事情从来没那么简单。表示相似性的测量总是涉及许多可能影响结果的实验选择。你在每个网络中看哪一层？一旦你从每个模型中得到了一组向量簇，你使用众多数学方法中的哪一种来比较它们？首先，你测量哪些表示？

"如果你只测试一个数据集，你未必知道[结果]如何推广，"芝加哥大学研究员克里斯托弗·沃尔夫拉姆说，他研究过语言模型中的表示相似性。"谁知道如果你用一些更奇怪的数据集会怎样？"

伊索拉承认这个问题远未解决。这不是任何一篇论文能够解决的问题：原则上，你可以测量模型对任何图片或任何句子的表示。对他来说，模型确实表现出趋同的情况，比它们可能不趋同的情况更有说服力。

"科学的努力在于寻找共性，"伊索拉说。"我们可以研究模型不同或分歧的方式，但某种程度上，识别共同点比这更有解释力。"

其他研究人员则认为，关注模型表示的不同之处更有成效。其中包括加州大学伯克利分校的研究员阿列克谢·埃夫罗斯，他曾是麻省理工学院团队四名成员中三位的导师。

"他们都是好朋友，而且都非常非常聪明，"埃夫罗斯说。"我认为他们错了，但这就是科学的意义所在。"

埃夫罗斯指出，在胡使用的维基百科数据集中，图像和文本在设计上包含了非常相似的信息。但我们在世界上遇到的大多数数据都具有难以转换的特征。"你去艺术博物馆而不是仅仅阅读目录是有原因的，"他说。

任何模型之间内在的相同性不必完美就能有用。去年夏天，研究人员设计了一种方法，可以将句子的内部表示从一个语言模型翻译到另一个语言模型。如果语言和视觉模型的表示在某种程度上可以互换，那可能会导致新的训练方法，让模型从两种数据类型中学习。伊索拉等人在最近的一篇论文中探索了这样一种训练方案。

尽管有这些有希望的发展，其他研究人员认为，任何单一理论都不太可能完全捕捉现代人工智能模型的行为。

"你无法将一个万亿参数的系统简化为简单的解释，"不列颠哥伦比亚大学的人工智能研究员杰夫·克卢恩说。"答案将是复杂的。"

英文来源：

Distinct AI Models Seem To Converge On How They Encode Reality
Introduction
Read a story about dogs, and you may remember it the next time you see one bounding through a park. That’s only possible because you have a unified concept of “dog” that isn’t tied to words or images alone. Bulldog or border collie, barking or getting its belly rubbed, a dog can be many things while still remaining a dog.
Artificial intelligence systems aren’t always so lucky. These systems learn by ingesting vast troves of data in a process called training. Often, that data is all of the same type — text for language models, images for computer vision systems, and more exotic kinds of data for systems designed to predict the odor of molecules or the structure of proteins. So to what extent do language models and vision models have a shared understanding of dogs?
Researchers investigate such questions by peering inside AI systems and studying how they represent scenes and sentences. A growing body of research has found that different AI models can develop similar representations, even if they’re trained using different datasets or entirely different data types. What’s more, a few studies have suggested that those representations are growing more similar as models grow more capable. In a 2024 paper, four AI researchers at the Massachusetts Institute of Technology argued that these hints of convergence are no fluke. Their idea, dubbed the Platonic representation hypothesis, has inspired a lively debate among researchers and a slew of follow-up work.
The team’s hypothesis gets its name from a 2,400-year-old allegory by the Greek philosopher Plato. In it, prisoners trapped inside a cave perceive the world only through shadows cast by outside objects. Plato maintained that we’re all like those unfortunate prisoners. The objects we encounter in everyday life, in his view, are pale shadows of ideal “forms” that reside in some transcendent realm beyond the reach of the senses.
The Platonic representation hypothesis is less abstract. In this version of the metaphor, what’s outside the cave is the real world, and it casts machine-readable shadows in the form of streams of data. AI models are the prisoners. The MIT team’s claim is that very different models, exposed only to the data streams, are beginning to converge on a shared “Platonic representation” of the world behind the data.
“Why do the language model and the vision model align? Because they’re both shadows of the same world,” said Phillip Isola, the senior author of the paper.
Not everyone is convinced. One of the main points of contention involves which representations to focus on. You can’t inspect a language model’s internal representation of every conceivable sentence, or a vision model’s representation of every image. So how do you decide which ones are, well, representative? Where do you look for the representations, and how do you compare them across very different models? It’s unlikely that researchers will reach a consensus on the Platonic representation hypothesis anytime soon, but that doesn’t bother Isola.
“Half the community says this is obvious, and the other half says this is obviously wrong,” he said. “We were happy with that response.”
The Company Being Kept
If AI researchers don’t agree on Plato, they might find more common ground with his predecessor Pythagoras, whose philosophy supposedly started from the premise “All is number.” That’s an apt description of the neural networks that power AI models. Their representations of words or pictures are just long lists of numbers, each indicating the degree of activation of a specific artificial neuron.
To simplify the math, researchers typically focus on a single layer of a neural network in isolation, which is akin to taking a snapshot of brain activity in a specific region at a specific moment in time. They write down the neuron activations in this layer as a geometric object called a vector — an arrow that points in a particular direction in an abstract space. Modern AI models have many thousands of neurons in each layer, so their representations are high-dimensional vectors that are impossible to visualize directly. But vectors make it easy to compare a network’s representations: Two representations are similar if the corresponding vectors point in similar directions.
Within a single AI model, similar inputs tend to have similar representations. In a language model, for instance, the vector representing the word “dog” will be relatively close to vectors representing “pet,” “bark,” and “furry,” and farther from “Platonic” and “molasses.” It’s a precise mathematical realization of an idea memorably expressed more than 60 years ago by the British linguist John Rupert Firth: “You shall know a word by the company it keeps.”
What about representations in different models? It doesn’t make sense to directly compare activation vectors from separate networks, but researchers have devised indirect ways to assess representational similarity. One popular approach is to embrace the lesson of Firth’s pithy quote and measure whether two models’ representations of an input keep the same company.
Imagine that you want to compare how two language models represent words for animals. First, you’ll compile a list of words — dog, cat, wolf, jellyfish, and so on. You’ll then feed these words into both networks and record their representations of each word. In each network, the representations will form a cluster of vectors. You can then ask: How similar are the overall shapes of the two clusters?
“It can kind of be described as measuring the similarity of similarities,” said Ilia Sucholutsky, an AI researcher at New York University.
In this simple example, you’d expect some similarity between the two models — the “cat” vector would probably be close to the “dog” vector in both networks, for instance, and the “jellyfish” vector would point in a different direction. But the two clusters probably won’t look exactly the same. Is “dog” more like “cat” than “wolf,” or vice versa? If your models were trained on different datasets, or built on different network architectures, they might not agree.
Researchers began to explore representational similarity among AI models with this approach in the mid-2010s and found that different models’ representations of the same concepts were often similar, though far from identical. Intriguingly, a few studies found that more powerful models seemed to have more similarities in their representations than weaker ones. One 2021 paper dubbed this the “Anna Karenina scenario,” a nod to the opening line of the classic Tolstoy novel. Perhaps successful AI models are all alike, and every unsuccessful model is unsuccessful in its own way.
That paper, like much of the early work on representational similarity, focused only on computer vision, which was then the most popular branch of AI research. The advent of powerful language models was about to change that. For Isola, it was also an opportunity to see just how far representational similarity could go.
Convergent Evolution
The story of the Platonic representation hypothesis paper began in early 2023, a turbulent time for AI researchers. ChatGPT had been released a few months before, and it was increasingly clear that simply scaling up AI models — training larger neural networks on more data — made them better at many different tasks. But it was unclear why.
“Everyone in AI research was going through an existential life crisis,” said Minyoung Huh, an OpenAI researcher who was a graduate student in Isola’s lab at the time. He began meeting regularly with Isola and their colleagues Brian Cheung and Tongzhou Wang to discuss how scaling might affect internal representations.
From top right: Anna Decker; @by.h_official; Jiaxi Chen; Kris Brewer
Imagine a case where multiple models are trained on the same data, and the stronger models learn more similar representations. This isn’t necessarily because these models are creating a more accurate likeness of the world. They could just be better at grasping quirks of the training dataset.
Now consider models trained on different datasets. If their representations also converge, that would be more compelling evidence that models are getting better at grasping shared features of the world behind the data. Convergence between models that learned from entirely different data types, such as language and vision models, would provide even stronger evidence.
A year after their initial conversations, Isola and his colleagues decided to write a paper reviewing the evidence for convergent representations and presenting an argument for the Platonic representation hypothesis.
By then, other researchers had started studying similarities between vision and language model representations. Huh conducted his own experiment, in which he tested a set of five vision models and 11 language models of varying sizes on a dataset of captioned pictures from Wikipedia. He would feed the pictures into the vision models and the captions into the language models, and then compare clusters of vectors in the two types. He observed a steady increase in representational similarity as models became more powerful. It was exactly what the Platonic representation hypothesis predicted.
Find the Universals
Of course, it’s never so simple. Measurements of representational similarity invariably involve a host of experimental choices that can affect the outcome. Which layers do you look at in each network? Once you have a cluster of vectors from each model, which of the many mathematical methods do you use to compare them? And which representations do you measure in the first place?
“If you only test one dataset, you don’t necessarily know how [the result] generalizes,” said Christopher Wolfram, a researcher at the University of Chicago who has studied representational similarity in language models. “Who knows what would happen if you did some weirder dataset?”
Isola acknowledged that the issue is far from settled. It’s not a question that any one paper can resolve: In principle, you can measure models’ representations of any picture or any sentence. To him, cases where models do exhibit convergence are more compelling than cases where they may not.
Peter DaSilva for Quanta Magazine
“The endeavor of science is to find the universals,” Isola said. “We could study the ways in which models are different or disagree, but that somehow has less explanatory power than identifying the commonalities.”
Other researchers argue that it’s more productive to focus on where models’ representations differ. Among them is Alexei Efros, a researcher at the University of California, Berkeley, who has been an adviser to three of the four members of the MIT team.
“They’re all good friends and they’re all very, very smart people,” Efros said. “I think they’re wrong, but that’s what science is about.”
Efros noted that in the Wikipedia dataset that Huh used, the images and text contained very similar information by design. But most data we encounter in the world has features that resist translation. “There is a reason why you go to an art museum instead of just reading the catalog,” he said.
Any intrinsic sameness across models doesn’t have to be perfect to be useful. Last summer, researchers devised a method to translate internal representations of sentences from one language model to another. And if language and vision model representations are to some extent interchangeable, that could lead to new ways to train models that learn from both data types. Isola and others explored one such training scheme in a recent paper.
Despite these promising developments, other researchers think it’s unlikely that any single theory will fully capture the behavior of modern AI models.
“You can’t reduce a trillion-parameter system to simple explanations,” said Jeff Clune, an AI researcher at the University of British Columbia. “The answers are going to be complicated.”

quanta

文章目录

📚 推荐阅读

扫描二维码，在手机上阅读