生成式用户界面:为任何指令提供丰富、定制化的可视化交互体验。

内容总结:
谷歌推出生成式交互界面 人工智能可实时创建动态用户体验
谷歌研究院于2025年11月18日宣布,由谷歌院士Yaniv Leviathan领衔的团队成功研发新型生成式交互界面技术。该突破性技术使AI模型能够根据用户指令实时生成沉浸式视觉体验和交互工具,目前已在Gemini应用和谷歌搜索的AI模式中启动实验性应用。
这项被称为"生成式UI"的创新技术标志着人机交互模式的重大变革。与传统AI模型仅生成静态内容不同,该系统能动态创建完整交互界面——包括网页、游戏工具及应用程序,且完全根据用户指令自动设计。研究团队在论文《生成式UI:大型语言模型作为高效界面生成器》中证实,在忽略生成速度的前提下,人类评估者对生成式UI的偏好度显著高于传统AI输出模式。
目前该技术通过两项实验功能落地:Gemini应用中的"动态视图"可根据不同用户场景智能调整交互设计,例如向儿童解释微生物概念时会采用与成人完全不同的视觉呈现方式;谷歌搜索AI模式则能为用户问题即时生成定制化的交互工具与模拟环境。值得注意的是,这些功能现阶段仅向美国地区的Google AI Pro和Ultra订阅用户开放。
技术实现核心依托三大创新要素:工具调用系统使模型能访问图像生成等关键资源;精密设计的系统指令规范确保输出质量;后处理模块有效修正常见错误。评估数据显示,生成式UI的输出质量仅次于人工设计的专业网站,显著优于传统搜索引擎结果和标准AI输出。
尽管该技术仍存在生成耗时较长、偶现准确性不足等早期发展阶段的问题,研究团队表示将持续优化性能。这项技术突破印证了"研究-产品-需求"的创新循环,未来有望接入更广泛的服务生态,通过人机协作不断升级交互体验。
中文翻译:
生成式交互界面:为任何指令打造丰富、定制化、可视化的互动用户体验
2025年11月18日
谷歌研究员雅尼夫·列维坦、高级软件工程师丹尼·瓦列夫斯基、谷歌研究院副总裁约西·马蒂亚斯
我们推出了一种创新的生成式交互界面实现方案,使AI模型能够即时根据任意指令创建沉浸式体验、交互工具及动态模拟。该功能现已在Gemini应用和谷歌搜索中率先通过AI模式推出。
快速入口
生成式交互界面是一项强大功能,AI模型不仅能生成内容,更能构建完整的用户体验。如今我们推出的创新实现方案,可动态创建沉浸式视觉体验与交互界面——包括网页、游戏、工具及应用程序——这些界面会根据任何问题、指示或指令自动设计并完全定制。指令可以简单到一个词语,也可根据需求详细展开。这类新型界面与传统AI模型输出内容的静态预定义界面有着本质区别。
在我们的新论文《生成式交互界面:大语言模型是有效的UI生成器》中,我们阐述了实现生成式交互界面的核心原理,并验证了这一新范式的可行性。评估数据显示,在忽略生成速度的情况下,人类评测者对生成式交互界面呈现效果的偏好度远超传统大语言模型输出。这项研究标志着向全AI生成用户体验迈出了第一步,用户将自动获得量身定制的动态界面,而无需从现有应用库中手动选择。
我们关于生成式交互界面(亦称生成式接口)的研究成果,现已通过Gemini应用中的“动态视图”实验功能及谷歌搜索的AI模式与用户见面。
生成式交互界面落地谷歌产品
该功能将以两项实验形式在Gemini应用中推出:动态视图与视觉布局。基于生成式交互界面实现的“动态视图”功能,会运用Gemini的智能编码能力为每个指令设计和编写完全定制化的交互响应。它能精准理解场景差异——向5岁孩童解释微生物组所需的内容和功能,与向成人解释截然不同;为企业创建社交媒体帖子画廊所需的界面,与制定旅行计划的界面也完全不同。
动态视图适用于多元场景,从学习概率知识到协助活动策划、获取时尚建议等实际任务。这些交互界面让用户能在互动中学习、娱乐和探索。动态视图与视觉布局功能于今日同步上线。为便于收集实验数据,用户初期可能仅体验到其中一项功能。
生成式交互界面体验也率先通过AI模式整合至谷歌搜索,为用户问题专属生成含交互工具与模拟场景的动态视觉体验。依托Gemini 3卓越的多模态理解能力与智能编码技术,AI模式下的Gemini 3能解析指令意图,即时构建定制化生成式用户界面。通过实时生成交互工具与模拟环境,它创造出专为深度理解与任务完成优化的动态场景。该功能即日起面向美国谷歌AI Pro和Ultra订阅用户开放,只需在AI模式的下拉菜单中选中“思考”选项即可体验。
技术实现原理
论文中描述的生成式交互界面方案采用谷歌Gemini 3 Pro模型,并新增三大要素:
- 工具调用:服务器提供图像生成、网络搜索等关键工具接口,既可提升结果质量,也能直接向用户浏览器传输数据以提高效率
- 精密设计的系统指令:通过包含目标、规划、示例与技术规范(含格式要求、工具手册及错误规避技巧)的详细指令引导系统
- 后处理环节:模型输出需经多道后处理器以修正潜在常见问题
针对需要统一视觉风格的产品,我们的方案可配置为所有用户生成风格一致的资源。若未指定风格要求,生成式交互界面将自动选择样式,或像Gemini动态视图那样允许用户通过指令影响风格设计。
用户显著偏好生成式界面
为规范评估标准,我们创建了由专业设计师构建的网站数据集PAGEN,即将向研究界开放。在用户偏好测试中,我们将生成式交互界面与多种形式对比:专家设计的专项网站、谷歌搜索顶部结果、原始文本及标准Markdown格式的基线模型输出。
专家设计网站获得最高偏好度,生成式交互界面结果以微小差距紧随其后,并显著领先其他输出形式。本次评估未计入生成速度因素。研究同时表明,生成式交互界面性能高度依赖底层模型,我们最新模型的表现大幅提升。详见论文全文。
未来展望
生成式交互界面尚处早期发展阶段,仍有重要改进空间。例如当前方案生成结果有时需一分钟以上,且偶有输出误差,这些正是持续研究的重点。生成式交互界面体现了研究的良性循环:突破性研究推动产品创新,新需求又反哺深入研究。我们看好其未来在扩展服务接入、适应多维度语境与人类反馈、提供更实用视觉交互界面等方面的潜力。对于生成式交互界面的前景,我们充满期待。
英文来源:
Generative UI: A rich, custom, visual interactive user experience for any prompt
November 18, 2025
Yaniv Leviathan, Google Fellow, Dani Valevski, Senior Staff Software Engineer, and Yossi Matias, Vice President & Head of Google Research
We introduce a novel implementation of generative UI, enabling AI models to create immersive experiences and interactive tools and simulations, all generated completely on the fly for any prompt. This is now rolling out in the Gemini app and Google Search, starting with AI Mode.
Quick links
Generative UI is a powerful capability in which an AI model generates not only content but an entire user experience. Today we introduce a novel implementation of generative UI, which dynamically creates immersive visual experiences and interactive interfaces — such as web pages, games, tools, and applications — that are automatically designed and fully customized in response to any question, instruction, or prompt. These prompts can be as simple as a single word, or as long as needed for detailed instructions. These new types of interfaces are markedly different from the static, predefined interfaces in which AI models typically render content.
In our new paper, “Generative UI: LLMs are Effective UI Generators”, we describe the core principles that enabled our implementation of generative UI and demonstrate the effective viability of this new paradigm. Our evaluations indicate that, when ignoring generation speed, the interfaces from our generative UI implementations are strongly preferred by human raters compared to standard LLM outputs. This work represents a first step toward fully AI-generated user experiences, where users automatically get dynamic interfaces tailored to their needs, rather than having to select from an existing catalog of applications.
Our research on generative UI, also referred to as generative interfaces, comes to life today in the Gemini app through an experiment called dynamic view and in AI Mode in Google Search.
Bringing generative UI to Google products
Generative UI capabilities will be rolled out as two experiments in the Gemini app: dynamic view and visual layout. When using dynamic view, an experience built upon our generative UI implementation, Gemini designs and codes a fully customized interactive response for each prompt, using Gemini’s agentic coding capabilities. It customizes the experience with an understanding that explaining the microbiome to a 5 year old requires different content and a different set of features than explaining it to an adult, just as creating a gallery of social media posts for a business requires a completely different interface to generating a plan for an upcoming trip.
Dynamic view can be used for a wide range of scenarios, from learning about probability to helping in practical tasks like event planning and getting fashion advice. The interfaces allow users to learn, play or explore interactively. Dynamic view, along with visual layout, are rolling out today. To help us learn about these experiments, users may initially see only one of them.
Generative UI experiences are also integrated into Google Search starting with AI Mode, unlocking dynamic visual experiences with interactive tools and simulations that are generated specifically for a user’s question. Now, thanks to Gemini 3’s unparalleled multimodal understanding and powerful agentic coding capabilities, Gemini 3 in AI Mode can interpret the intent behind any prompt to instantly build bespoke generative user interfaces. By generating interactive tools and simulations on the fly, it creates a dynamic environment optimized for deep comprehension and task completion. Generative UI capabilities in AI Mode are available for Google AI Pro and Ultra subscribers in the U.S. starting today. Select "Thinking" from the model drop-down menu in AI Mode to try it out.
How the generative UI implementation works
Our generative UI implementation, described in the paper, uses Google’s Gemini 3 Pro model with three important additions:
- Tool access: A server provides access to several key tools, like image generation and web search. This allows the results to be made accessible to the model to increase quality or sent directly to the user’s browser to improve efficiency.
- Carefully crafted system instructions: The system is guided by detailed instructions that include the goal, planning, examples and technical specifications, including formatting, tool manuals, and tips for avoiding common errors.
- Post-processing: The model’s outputs are passed through a set of post-processors to address potential common issues.
For some products, it might be preferable to consistently see results in specific styles. Our implementation could be configured for these products so that all results, including generated assets, are created in a consistent style for all users. Without specific styling instructions, the generative UI will select a style automatically, or the user can influence styling in their prompt, as in the case of dynamic view in the Gemini app.
Generative UI outputs are strongly preferred over standard formats
To facilitate consistent evaluations and comparisons of generative UI implementations, we created PAGEN, a dataset of human expert–made websites and will soon be releasing it to the research community.
To evaluate user preferences, we compared our new generative UI experience against various different formats: a website designed for a specific prompt by human-experts, the top Google Search result for the query, and baseline LLM outputs in raw text or the standard markdown formats.
The sites designed by human experts had the highest preference rates. These were followed closely by the results from our generative UI implementation, with a substantial gap from all other output methods. This evaluation did not take into account generation speed. We also show that the performance of generative UI strongly depends on the performance of the underlying model, and that our newest models perform substantially better. See more details in the paper.
Opportunities ahead
We are still in the early days of generative UI, and important opportunities for improvement remain. For example, our current implementation can sometimes take a minute or more to generate results, and there are occasional inaccuracies in the outputs; these are areas of ongoing research. Generative UI is an example of the magic cycle of research, where research breakthroughs lead to product innovation that opens up new opportunities for addressing user needs and in turn fuel further research. We see potential in extending generative UI to access a wider set of services, adapt to additional context and human feedback, and deliver increasingly more helpful visual and interactive interfaces. We are excited about the further opportunities ahead for generative UI.
文章标题:生成式用户界面:为任何指令提供丰富、定制化的可视化交互体验。
文章链接:https://blog.qimuai.cn/?post=2113
本站文章均为原创,未经授权请勿用于任何商业用途