谷歌推出了迄今为止最深度的AI研究助手——而就在同一天，OpenAI发布了GPT-5.2。

qimuai 发布于 2025-12-12 10:01 阅读：90 一手编译

内容来源：https://techcrunch.com/2025/12/11/google-launched-its-deepest-ai-research-agent-yet-on-the-same-day-openai-dropped-gpt-5-2/

内容总结：

谷歌公司于周四发布了其研究智能体“双子座深度研究”的重构版本，该产品基于其备受瞩目的尖端基础模型“双子座3.0专业版”打造。新版智能体不仅能够生成研究报告，更首次通过全新的“交互API”向开发者开放其深度研究能力，允许他们将谷歌的SATA模型研究功能集成到自有应用中，为即将到来的智能体AI时代赋予开发者更多控制权。

该工具旨在高效整合海量信息，处理复杂的多步骤任务。谷歌表示，客户已将其用于尽职调查、药物毒性安全研究等多种场景。公司还计划很快将其整合至谷歌搜索、谷歌财经、双子座应用及备受欢迎的NotebookLM等服务中，这标志着向“由AI代理替代人类进行信息检索”的未来又迈进一步。

谷歌强调，该工具受益于“双子座3.0专业版”作为其“事实准确性最高”的模型特性，在复杂任务中能有效减少幻觉产生。针对长周期、多步骤的深度推理任务中AI“幻觉”可能导致的结论失效风险，这一特性尤为关键。

为验证性能，谷歌开源了名为DeepSearchQA的新基准测试，并在一项名为“人类终极考试”的独立通识基准测试中取得领先。不过，测试显示OpenAI的ChatGPT 5 Pro在多项指标上紧追其后，并在浏览器任务测试中略胜一筹。值得注意的是，就在谷歌发布成果同日，OpenAI同步推出了代号“Garlic”的GPT 5.2模型，宣称其在多项基准测试中已超越包括谷歌在内的竞争对手。

此次发布时机耐人寻味——在全球业界期待OpenAI新模型之际，谷歌率先公布自身进展，凸显了AI领域竞争的白热化态势。

中文翻译：

本周四，谷歌基于其备受瞩目的尖端基础模型Gemini 3 Pro，发布了其研究智能体Gemini深度研究的"重构版"。这款新型智能体不仅能生成研究报告——尽管它仍保留此功能——更允许开发者将谷歌的SATA模型研究能力嵌入自有应用程序。这一功能通过谷歌全新推出的Interactions API实现，旨在于即将到来的智能体AI时代赋予开发者更多控制权。

全新的Gemini深度研究工具被设计为能够整合海量信息、处理庞杂上下文提示的智能体。谷歌表示，客户已将其应用于从尽职调查到药物毒性安全研究等多种任务场景。

谷歌同时宣布，即将把这款深度研究智能体整合至谷歌搜索、谷歌财经、Gemini应用及其热门产品NotebookLM等服务中。这标志着人类迈向"无需亲自搜索，一切交由AI智能体代劳"时代的又一进展。

这家科技巨头强调，深度研究受益于Gemini 3 Pro作为其"事实准确性最高"模型的特性，该模型经过专门训练，能在复杂任务中最大限度减少幻觉现象。对于需要长时间深度推理的智能体任务而言，大语言模型凭空捏造信息的幻觉问题尤为关键——这类任务往往需要在数分钟、数小时甚至更长时间内持续做出自主决策。模型需要做出的选择越多，任何一个幻觉选择导致整体输出失效的风险就越高。

为证明其技术进步，谷歌创建了名为DeepSearchQA的新基准测试（尽管AI领域似乎并不缺基准测试）。该测试旨在评估智能体处理复杂多步骤信息检索任务的能力，目前已被开源。同时，谷歌还在两个独立基准测试中验证了深度研究的性能：其一是名为"人类终极考试"的综合性知识测试，包含大量极其冷僻的任务；其二是针对浏览器智能体任务的BrowserComp基准。

正如预期，谷歌新智能体在自研基准和"人类终极考试"中均领先竞争对手。但出人意料的是，OpenAI的ChatGPT 5 Pro在所有测试中均以微弱差距紧随其后，更在BrowserComp基准上略胜谷歌一筹。

然而这些基准比较在谷歌发布结果时几乎已过时——因为同日，OpenAI正式推出了万众期待的GPT 5.2（代号"Garlic"）。OpenAI宣称其最新模型在一系列标准基准测试（包括其自研测试）中全面超越竞争对手，尤其针对谷歌形成优势。

此次公告最耐人寻味的或许是发布时机。在预知全球翘首期待Garlic发布之际，谷歌选择同步放出自己的AI进展消息。

英文来源：

Google released on Thursday a “reimagined” version of its research agent Gemini Deep Research based on its much-ballyhooed state-of-the-art foundation model, Gemini 3 Pro.
This new agent isn’t just designed to produce research reports — although it can still do that. It now allows developers to embed Google’s SATA-model research capabilities into their own apps. That capability is made possible through Google’s new Interactions API, which is designed to give devs more control in the coming agentic AI era.
The new Gemini Deep Research tool is an agent equipped to synthesize mountains of information and handle a large context dump in the prompt. Google says it’s used by customers for tasks ranging from due diligence to drug toxicity safety research.
Google also says it will soon be integrating this new deep research agent into services, including Google Search, Google Finance, its Gemini App, and its popular NotebookLM. This is another step toward preparing for a world where humans don’t Google anything anymore — their AI agents do.
The tech giant says that Deep Research benefits from Gemini 3 Pro’s status as its “most factual” model that is trained to minimize hallucinations during complex tasks.
AI hallucinations — where the LLM just makes stuff up — are an especially crucial issue for long-running, deep reasoning agentic tasks, in which many autonomous decisions are made over minutes, hours, or longer. The more choices an LLM has to make, the greater the chance that even one hallucinated choice will invalidate the entire output.
To prove its progress claims, Google has also created yet another benchmark (as if the AI world needs another one). The new benchmark is unimaginatively named DeepSearchQA and is intended to test agents on complex, multi-step information-seeking tasks. Google has open sourced this benchmark.
Join the Disrupt 2026 Waitlist
Add yourself to the Disrupt 2026 waitlist to be first in line when Early Bird tickets drop. Past Disrupts have brought Google Cloud, Netflix, Microsoft, Box, Phia, a16z, ElevenLabs, Wayve, Hugging Face, Elad Gil, and Vinod Khosla to the stages — part of 250+ industry leaders driving 200+ sessions built to fuel your growth and sharpen your edge. Plus, meet the hundreds of startups innovating across every sector.
Join the Disrupt 2026 Waitlist
Add yourself to the Disrupt 2026 waitlist to be first in line when Early Bird tickets drop. Past Disrupts have brought Google Cloud, Netflix, Microsoft, Box, Phia, a16z, ElevenLabs, Wayve, Hugging Face, Elad Gil, and Vinod Khosla to the stages — part of 250+ industry leaders driving 200+ sessions built to fuel your growth and sharpen your edge. Plus, meet the hundreds of startups innovating across every sector.
It also tested Deep Research on Humanity’s Last Exam, a much more interestingly named, independent benchmark of general knowledge filled with impossibly niche tasks; and BrowserComp, a benchmark for browser-based agentic tasks.
As you might expect, Google’s new agent bested the competition on its own benchmark, and Humanity’s. However, OpenAI’s ChatGPT 5 Pro was a surprisingly close second all the way around and slightly bested Google on BrowserComp.
But those benchmark comparisons were obsolete almost the moment Google published them. Because on the same day, OpenAI launched its highly anticipated GPT 5.2 — codenamed Garlic. OpenAI says its newest model bests its rivals — especially Google — on a suite of the typical benchmarks, including OpenAI’s homegrown one.
Perhaps one of the most interesting parts of this announcement was the timing. Knowing that the world was awaiting the release of Garlic, Google dropped some AI news of its own.

TechCrunchAI大撞车

文章目录

📚 推荐阅读

扫描二维码，在手机上阅读