人工智能的黑客技术正接近一个'转折点'。

内容来源:https://www.wired.com/story/ai-models-hacking-inflection-point/
内容总结:
网络安全初创公司RunSybil的联合创始人弗拉德·约内斯库和阿里尔·赫伯特-沃斯去年11月曾感到一丝困惑:他们开发的人工智能工具Sybil突然提醒客户系统存在一个隐蔽漏洞。该漏洞涉及客户使用的联邦GraphQL(一种用于规范数据通过API进行网络访问的查询语言)部署问题,导致机密信息无意中暴露。
令人惊讶的是,发现该漏洞需要跨多个系统的深度专业知识。RunSybil团队表示,此后他们在其他GraphQL部署中也发现了同类问题,且早于公开披露。“我们搜索了整个互联网,此前没有任何相关记录,”赫伯特-沃斯说,“这标志着AI模型的推理能力实现了阶跃式突破。”
这一案例揭示出日益增长的风险:随着AI模型愈发智能,其发现零日漏洞等安全缺陷的能力也在快速提升。而检测漏洞的智能,同样可能被用于利用漏洞。
加州大学伯克利分校专注AI与安全领域的计算机科学家宋晓东指出,近期AI技术进步显著提升了模型发现缺陷的能力。通过模拟推理(将复杂问题拆解)和智能体能力(如网络搜索、安装运行软件工具),模型的网络安全分析水平得到强化。“前沿模型的网络安全能力在过去几个月急剧提升,这正是一个转折点。”她说。
宋晓东团队去年创建了名为CyberGym的基准测试,用于评估大语言模型在大型开源项目中查找漏洞的表现。该测试涵盖188个项目中的1507个已知漏洞。数据显示,截至2025年10月,新版Claude Sonnet 4.5模型已能识别测试中30%的漏洞,较同年7月旧版本20%的检测率大幅提升。“AI智能体能够以极低成本发现零日漏洞。”她警告道。
面对这一趋势,宋晓东呼吁发展新的应对策略,包括利用AI辅助网络安全专家防御工作。她提出两种路径:一是前沿AI公司在产品发布前向安全研究人员开放模型,以便提前发现并修复漏洞;二是从根本上重构软件开发模式。其实验室已证实,AI能够生成比当前多数程序员编写的代码更安全的程序,“从长远看,这种‘安全设计’理念将切实助力防御方。”
RunSybil团队则提醒,短期内AI的代码生成能力可能让攻击者占据上风。“AI能在计算机上执行操作并生成代码,这两者正是黑客常用的手段,”赫伯特-沃斯说,“如果这些能力加速发展,攻击性安全行为必然同步升级。”
(本文改编自威尔·奈特《AI实验室》时事通讯,过往内容可通过相关渠道查阅。)
中文翻译:
网络安全初创公司RunSybil的联合创始人弗拉德·伊奥内斯库和阿里尔·赫伯特-沃斯,去年11月曾因自家AI工具Sybil发出的一则警报陷入短暂困惑——该工具检测到客户系统中存在一个安全漏洞。
Sybil融合了多种AI模型及若干专有技术手段,能扫描计算机系统中黑客可能利用的漏洞,例如未打补丁的服务器或配置错误的数据库。此次它标记出的问题涉及客户部署的联合GraphQL(一种用于规范通过应用程序接口在网络中访问数据的语言),该漏洞导致客户无意间泄露了机密信息。
令两位创始人费解的是,发现此漏洞需要对多个不同系统及其交互方式具备极其深入的认知。RunSybil表示,此后他们在其他GraphQL部署中也发现了相同问题,且早于任何公开披露。"我们搜遍了互联网,此前完全不存在相关记录,"赫伯特-沃斯说,"这个发现标志着模型推理能力的阶段性飞跃。"
这一现象揭示着日益增长的风险:随着AI模型持续进化,其发现零日漏洞及其他安全缺陷的能力也在同步提升。用于检测漏洞的智能技术,同样可能被用于实施攻击。
专注于AI与安全领域的加州大学伯克利分校计算机科学家宋晓东指出,AI领域的最新进展催生了更擅长发现漏洞的模型。通过将问题拆解为组成部分的模拟推理技术,以及具备网络搜索、安装运行软件工具等能力的智能体AI,显著增强了模型的网络安全攻防能力。
"前沿模型的网络安全能力在过去几个月里急剧提升,"她强调,"这正是一个转折点。"
去年,宋晓东参与创建了名为"网络训练场"的基准测试体系,用于评估大语言模型在大型开源软件项目中查找漏洞的表现。该测试涵盖188个项目中的1507个已知漏洞。
至2025年7月,Anthropic公司的Claude Sonnet 4模型能发现基准测试中约20%的漏洞。到同年10月,新版Claude Sonnet 4.5的识别率已提升至30%。"AI智能体能够以极低成本发现零日漏洞,"宋晓东表示。
她认为这种趋势凸显了开发新防御手段的必要性,包括利用AI辅助网络安全专家。"我们需要思考如何让AI在防御端发挥更大作用,这方面可以探索多种路径。"她建议前沿AI公司在产品发布前向安全研究人员开放模型,使其能借助AI提前发现漏洞并加固系统。
宋晓东提出的另一项对策是从根本上重构软件开发模式。其实验室已证明,利用AI生成的代码比当前多数程序员编写的代码更具安全性。"长远来看,这种安全优先的设计理念将切实助力防御方。"
RunSybil团队指出,短期内AI模型的编码能力可能让黑客占据上风。"AI能执行计算机操作并生成代码,这两项正是黑客的惯用手段,"赫伯特-沃斯警示道,"如果这些能力加速发展,意味着攻击性安全行为也将同步升级。"
本文节选自威尔·奈特《AI实验室》通讯专栏,过往内容可通过此处查阅。
英文来源:
Vlad Ionescu and Ariel Herbert-Voss, cofounders of the cybersecurity startup RunSybil, were momentarily confused when their AI tool, Sybil, alerted them to a weakness in a customer’s systems last November.
Sybil uses a mix of different AI models—as well as a few proprietary technical tricks—to scan computer systems for issues that hackers might exploit, like an unpatched server or a misconfigured database.
In this case, Sybil flagged a problem with the customer’s deployment of federated GraphQL, a language used to specify how data is accessed over the web through application programming interfaces (APIs). The issue meant that the customer was inadvertently exposing confidential information.
What puzzled Ionescu and Herbert-Voss was that spotting the issue required a remarkably deep knowledge of several different systems and how those systems interact. RunSybil says it has since found the same problem with other deployments of GraphQL—before anybody else made it public “We scoured the internet, and it didn’t exist,” Herbert-Voss says. “Discovering it was a reasoning step in terms of models’ capabilities—a step change.”
The situation points to a growing risk. As AI models continue to get smarter, their ability to find zero-day bugs and other vulnerabilities also continues to grow. The same intelligence that can be used to detect vulnerabilities can also be used to exploit them.
Dawn Song, a computer scientist at UC Berkeley who specializes in both AI and security, says recent advances in AI have produced models that are better at finding flaws. Simulated reasoning, which involves splitting problems into constituent pieces, and agentic AI, like searching the web or installing and running software tools, have amped up models’ cyber abilities.
“The cyber security capabilities of frontier models have increased drastically in the last few months,” she says. “This is an inflection point.”
Last year, Song cocreated a benchmark called CyberGym to determine how well large language models find vulnerabilities in large open-source software projects. CyberGym includes 1,507 known vulnerabilities found in 188 projects.
In July 2025, Anthropic’s Claude Sonnet 4 was able to find about 20 percent of the vulnerabilities in the benchmark. By October 2025, a new model, Claude Sonnet 4.5, was able to identify 30 percent. “AI agents are able to find zero-days, and at very low cost,” Song says.
Song says this trend shows the need for new countermeasures, including having AI help cybersecurity experts. “We need to think about how to actually have AI help more on the defense side, and one can explore different approaches,” she says.
One idea is for frontier AI companies to share models with security researchers before launch, so they can use the models to find bugs and secure systems prior to a general release.
Another countermeasure, says Song, is to rethink how software is built in the first place. Her lab has shown that it is possible to use AI to generate code that is more secure than what most programmers use today. “In the long run we think this secure-by-design approach will really help defenders,” Song says.
The RunSybil team says that, in the near term, the coding skills of AI models could mean that hackers gain the upper hand. “AI can generate actions on a computer and generate code, and those are two things that hackers do,” Herbert-Voss says. “If those capabilities accelerate, that means offensive security actions will also accelerate.”
This is an edition of Will Knight’s AI Lab newsletter. Read previous newsletters here.