«

OpenAI发布全新大语言模型,揭示人工智能运作机制背后的奥秘。

qimuai 发布于 阅读:36 一手编译


OpenAI发布全新大语言模型,揭示人工智能运作机制背后的奥秘。

内容来源:https://www.technologyreview.com/2025/11/13/1127914/openais-new-llm-exposes-the-secrets-of-how-ai-really-works/

内容总结:

近日,人工智能研究机构OpenAI公布了一项突破性研究成果:他们开发出一款名为"权重稀疏变换器"的实验性语言模型。该模型虽在性能上无法与当前主流产品媲美,但其独特价值在于首次实现了对人工智能运作机制的透明化解析。

当前主流大语言模型普遍存在"黑箱"问题,即便是研发者也无法完全理解其内部决策逻辑。这款新型模型通过特殊的神经网络架构,将特定功能与具体神经元群组明确对应,使研究人员能够清晰追踪模型完成任务的每个步骤。OpenAI科学家高磊表示:"当AI系统日益深入关键领域时,确保其安全可靠性至关重要。"

尽管该实验模型仅达到2018年GPT-1的水平,且运行速度远低于市面产品,但其开创性的可解释特性为破解AI"幻觉"、失控等难题提供了全新路径。波士顿学院数学家格里格斯比评价称:"这项研究提出的方法将产生重要影响。"

研究团队坦言,该技术目前尚难直接应用于GPT-5等尖端模型,但他们预计未来数年有望开发出完全可解释的GPT-3级别模型。这将为理解人工智能决策机制、构建安全可靠的AI系统奠定坚实基础。

中文翻译:

OpenAI新型大语言模型揭开AI运作之谜
这款实验模型虽无法与顶尖产品抗衡,却能揭示AI行为异常的根源及其可信度。

ChatGPT开发商OpenAI最新研发的实验性大语言模型,其内部机制比常规模型更易于解析。这一突破意义重大——当前主流大语言模型如同黑箱,无人能完全洞悉其运作原理。而透明化设计的模型将揭示大语言模型的通用工作机制,帮助研究人员破解模型产生幻觉、行为失控的根源,并评估其在关键任务中的可信度。

"随着人工智能系统日益强大,它们将更深融入重要领域。"OpenAI研究科学家高磊在接受《麻省理工科技评论》独家专访时强调,"确保系统安全性至关重要。"

目前这项研究仍处于早期阶段。这款名为"稀疏权重变压器"的新模型,在规模与性能上远逊于OpenAI的GPT-5、Anthropic的Claude及谷歌DeepMind的Gemini等主流产品。高磊表示,其上限仅接近OpenAI于2018年发布的GPT-1水平(尽管未进行直接对比)。

该研究的目标并非对标顶尖模型(至少现阶段如此),而是希望通过解析实验模型的运作机制,窥探那些更强大版本AI的隐藏工作原理。

波士顿学院数学家伊丽莎白·格里格斯比评价:"这项研究引人入胜,其开创的方法必将产生深远影响。"AI初创公司Goodfire的研究科学家李·沙基也认同:"该研究精准锁定目标且执行出色。"

模型难以解析的根源
OpenAI的探索属于"机制可解释性"这一新兴热门领域,旨在绘制模型执行任务时的内部机制图谱。

这远比听上去复杂。大语言模型基于神经网络构建,这些由神经元节点分层排列的网络,在常规密集架构中每个神经元都与相邻层全部神经元相连。虽然密集网络训练运行相对高效,但所学知识会分散在错综复杂的连接中,导致简单概念或功能可能被拆解到模型不同区域的神经元群,同时单个神经元又可能表征多种不同特征——这种被称为"叠加"(借用量子物理学术语)的现象,使得研究者无法将特定模型组件与具体概念建立对应关系。

OpenAI机制可解释性团队负责人丹·莫辛坦言:"神经网络庞大复杂且相互纠缠,理解难度极高。我们尝试反其道而行:能否打破这种困局?"

研究团队放弃密集网络,转而采用稀疏权重变压器架构——每个神经元仅与少量其他神经元连接。这种设计迫使模型在局部集群中表征特征,而非分散存储。虽然该模型运行速度远低于市面所有大语言模型,但其神经元与特定概念功能的对应关系更加清晰。高磊指出:"模型可解释性实现了质的飞跃。"

在针对简单任务的测试中(例如为以引号开头的文本块补全结尾引号),研究人员成功追踪到模型执行任务的具体步骤。"我们发现了完全符合手动编程逻辑的算法电路,而这完全由模型自主习得。"高磊表示,"这堪称突破性进展。"

技术升级的挑战
格里格斯比对该技术能否扩展到处理复杂任务的大型模型存疑。高磊和莫辛承认这确是当前模型的重大局限,且该路径永远无法造就媲美GPT-5的尖端产品。但OpenAI相信通过技术优化,未来有望开发出与2021年突破性模型GPT-3性能相当的透明模型。

高磊展望道:"或许数年之内,我们将拥有完全可解释的GPT-3模型——你能洞悉每个组件的运作机制。这样的系统将带来颠覆性认知突破。"

深度聚焦
· 人工智能伴侣:与AI聊天机器人建立情感联结竟如此简单
· 数字语言危机:AI与维基百科如何加速小语种消亡
· 通用人工智能:这个时代最具影响力的"阴谋论"
· 种姓偏见:OpenAI印度市场繁荣背后的隐忧

保持联系
订阅《麻省理工科技评论》
获取最新资讯、特惠活动、头条报道及近期活动预告

英文来源:

OpenAI’s new LLM exposes the secrets of how AI really works
The experimental model won't compete with the biggest and best, but it could tell us why they behave in weird ways—and how trustworthy they really are.
ChatGPT maker OpenAI has built an experimental large language model that is far easier to understand than typical models.
That’s a big deal, because today’s LLMs are black boxes: Nobody fully understands how they do what they do. Building a model that is more transparent sheds light on how LLMs work in general, helping researchers figure out why models hallucinate, why they go off the rails, and just how far we should trust them with critical tasks.
“As these AI systems get more powerful, they’re going to get integrated more and more into very important domains,” Leo Gao, a research scientist at OpenAI, told MIT Technology Review in an exclusive preview of the new work. “It’s very important to make sure they’re safe.”
This is still early research. The new model, called a weight-sparse transformer, is far smaller and far less capable than top-tier mass-market models like the firm’s GPT-5, Anthropic’s Claude, and Google DeepMind’s Gemini. At most it’s as capable as GPT-1, a model that OpenAI developed back in 2018, says Gao (though he and his colleagues haven’t done a direct comparison).
But the aim isn’t to compete with the best in class (at least, not yet). Instead, by looking at how this experimental model works, OpenAI hopes to learn about the hidden mechanisms inside those bigger and better versions of the technology.
It’s interesting research, says Elisenda Grigsby, a mathematician at Boston College who studies how LLMs work and who was not involved in the project: “I’m sure the methods it introduces will have a significant impact.”
Lee Sharkey, a research scientist at AI startup Goodfire, agrees. “This work aims at the right target and seems well executed,” he says.
Why models are so hard to understand
OpenAI’s work is part of a hot new field of research known as mechanistic interpretability, which is trying to map the internal mechanisms that models use when they carry out different tasks.
That’s harder than it sounds. LLMs are built from neural networks, which consist of nodes, called neurons, arranged in layers. In most networks, each neuron is connected to every other neuron in its adjacent layers. Such a network is known as a dense network.
Dense networks are relatively efficient to train and run, but they spread what they learn across a vast knot of connections. The result is that simple concepts or functions can be split up between neurons in different parts of a model. At the same time, specific neurons can also end up representing multiple different features, a phenomenon known as superposition (a term borrowed from quantum physics). The upshot is that you can’t relate specific parts of a model to specific concepts.
“Neural networks are big and complicated and tangled up and very difficult to understand,” says Dan Mossing, who leads the mechanistic interpretability team at OpenAI. “We’ve sort of said: ‘Okay, what if we tried to make that not the case?’”
Instead of building a model using a dense network, OpenAI started with a type of neural network known as a weight-sparse transformer, in which each neuron is connected to only a few other neurons. This forced the model to represent features in localized clusters rather than spread them out.
Their model is far slower than any LLM on the market. But it is easier to relate its neurons or groups of neurons to specific concepts and functions. “There’s a really drastic difference in how interpretable the model is,” says Gao.
Gao and his colleagues have tested the new model with very simple tasks. For example, they asked it to complete a block of text that opens with quotation marks by adding matching marks at the end.
It’s a trivial request for an LLM. The point is that figuring out how a model does even a straightforward task like that involves unpicking a complicated tangle of neurons and connections, says Gao. But with the new model, they were able to follow the exact steps the model took.
“We actually found a circuit that’s exactly the algorithm you would think to implement by hand, but it’s fully learned by the model,” he says. “I think this is really cool and exciting.”
Where will the research go next? Grigsby is not convinced the technique would scale up to larger models that have to handle a variety of more difficult tasks.
Gao and Mossing acknowledge that this is a big limitation of the model they have built so far and agree that the approach will never lead to models that match the performance of cutting-edge products like GPT-5. And yet OpenAI thinks it might be able to improve the technique enough to build a transparent model on a par with GPT-3, the firm’s breakthrough 2021 LLM.
“Maybe within a few years, we could have a fully interpretable GPT-3, so that you could go inside every single part of it and you could understand how it does every single thing,” says Gao. “If we had such a system, we would learn so much.”
Deep Dive
Artificial intelligence
It’s surprisingly easy to stumble into a relationship with an AI chatbot
We’re increasingly developing bonds with chatbots. While that’s safe for some, it’s dangerous for others.
How AI and Wikipedia have sent vulnerable languages into a doom spiral
Machine translators have made it easier than ever to create error-plagued Wikipedia articles in obscure languages. What happens when AI models get trained on junk pages?
How AGI became the most consequential conspiracy theory of our time
The idea that machines will be as smart as—or smarter than—humans has hijacked an entire industry. But look closely and you’ll see it’s a myth that persists for many of the same reasons conspiracies do.
OpenAI is huge in India. Its models are steeped in caste bias.
India is OpenAI’s second-largest market, but ChatGPT and Sora reproduce caste stereotypes that harm millions of people.
Stay connected
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.

MIT科技评论

文章目录


    扫描二维码,在手机上阅读