《精灵宝可梦Go》如何为配送机器人提供精准无误的世界视角。

qimuai 发布于 2026-3-11 07:01 阅读：11 一手编译

内容来源：https://www.technologyreview.com/2026/03/10/1134099/how-pokemon-go-is-helping-robots-deliver-pizza-on-time/

内容总结：

从捕捉宝可梦到精准送餐：游戏数据如何重塑机器人导航版图

曾风靡全球的增强现实游戏《精灵宝可梦GO》，正将其积累的海量图像数据转化为下一代机器人的“眼睛”。这款由Niantic公司于2016年推出的游戏，激励全球数亿玩家走上街头，用手机摄像头捕捉叠加在现实场景中的虚拟宝可梦，无意间完成了一场史上规模最大的城市地标图像众包采集。

如今，从Niantic分拆而出的人工智能公司Niantic Spatial，正利用这份独一无二的数据宝藏——来自全球玩家、附带超精确位置标记的数百亿张城市环境图像——训练一种高精度的视觉定位模型。该公司称，该模型仅凭几张周围建筑或地标的快照，就能将用户在地图上的位置精确到厘米级。

解决“城市峡谷”导航难题

这项技术的首个重要应用场景，并非增强现实，而是物流机器人。Niantic Spatial已与美国及欧洲的“最后一公里”配送机器人公司Coco Robotics达成合作。Coco的机器人以约每小时8公里的速度在人行道上行驶，已完成超50万次配送。但其CEO扎克·拉什指出，在城市高楼、立交桥密集的“城市峡谷”区域，GPS信号常因建筑反射和干扰而失灵，定位漂移可达50米，导致机器人无法精准抵达。

“让皮卡丘逼真地奔跑，与让Coco机器人安全准确地穿越世界，实际上是同一个问题。”Niantic Spatial CEO约翰·汉克表示。该公司开发的视觉定位系统，正是通过识别摄像头“看到”的内容来确定自身位置。

数据基石：数百亿图像构建“活地图”

Niantic Spatial的优势在于其庞大的数据基础。其模型训练使用了在城市环境中捕获的300亿张图像，这些图像尤其聚集于游戏中的“热点”位置，如宝可梦对战竞技场。每个热点都积累了成千上万张在不同时间、角度和天气条件下拍摄的图像，并附有详细的手机位姿元数据。

“我们在全球拥有超过一百万个可以精确定位您的位置，”Niantic Spatial首席技术官布莱恩·麦克伦登说，“我们能以厘米级精度知道您站在哪里，更重要的是，您正在看向何处。”即使对于数据较少的非热点区域，模型也能通过已学习的内容进行推断。

Coco的机器人配备了四个摄像头，将在原有GPS基础上接入这一模型，以提升定位可靠性。拉什期待，新技术能让机器人精确停靠在餐厅外的指定取货点，并准确抵达客户门前，而非几步之外。

从AR到机器人：“寒武纪大爆发”中的地图进化

汉克透露，公司最初开发视觉定位系统是为增强现实应用做准备，但如今看到了机器人领域的“寒武纪大爆发”。要让机器人在与人类共存的空间（如人行道）中无缝运行，它们需要具备与人类相似的空间理解能力。

与Coco的合作只是一个开始。汉克 envision的是一张不断更新的“活地图”——一个随现实世界变化而同步更新的超详细虚拟仿真世界。未来，穿梭于街头的机器人本身也将成为新的地图数据来源，使数字世界副本愈发精细和实时。

与此同时，地图的角色正在发生根本性转变。对于机器而言，地图可能需要更像一本“指南手册”，包含人类认为理所当然的丰富语义信息。Niantic Spatial等公司正致力于为地图中的物体添加属性描述，构建机器能够理解的世界模型，让人工智能不仅拥有知识，更能具备对现实环境的常识感知。

在大型语言模型缺乏物理世界常识的当下，Niantic Spatial选择了一条与众不同的路径：从真实世界的海量视觉数据出发，致力于精确复现现实，为机器的自主行动奠定坚实的地理空间智能基础。这场始于抓宝可梦的游戏，正在悄然重塑机器感知现实的方式。

中文翻译：

《精灵宝可梦Go》如何为配送机器人提供厘米级世界视图
Niantic分拆的AI公司正利用玩家众包的300亿张城市地标图像，训练全新的世界模型。

《精灵宝可梦Go》曾是全球首款增强现实（AR）现象级产品。2016年由谷歌分拆公司Niantic推出，这款基于宝可梦超级IP的AR游戏迅速风靡全球。从芝加哥到奥斯陆再到江之岛，玩家们涌上街头，急切地希望捕捉胖丁、杰尼龟，或是（凭借绝佳运气）邂逅悬浮在现实世界中的超稀有伽勒尔闪电鸟。

简而言之，海量人群曾举着手机对准无数建筑。"这款应用在60天内安装了5亿次，"Niantic Spatial首席技术官布莱恩·麦克伦登表示。这家AI公司于去年5月从Niantic分拆独立。据同期收购《精灵宝可梦Go》的游戏公司Scopely称，该游戏在发行八年后，2024年仍吸引着超1亿玩家。

如今，Niantic Spatial正利用这批空前庞大的众包数据——来自全球数亿玩家手机拍摄、带有精准位置标签的城市地标图像——构建一种世界模型。这项热门新技术旨在将大语言模型的智能锚定于真实环境。

该公司最新推出的模型宣称，仅凭视野中建筑物或地标的几张快照，就能将用户在地图上的定位精确到厘米级。该技术旨在帮助机器人在GPS信号不稳的区域实现更高精度导航。

在首次重大技术测试中，Niantic Spatial刚与Coco Robotics达成合作。这家初创公司在美国和欧洲多座城市部署"最后一公里"配送机器人。"人们曾认为AR是未来，AR眼镜即将普及，"麦克伦登说，"但现在机器人成了新焦点。"

从皮卡丘到披萨配送
Coco Robotics在洛杉矶、芝加哥、泽西城、迈阿密和赫尔辛基部署了约1000台行李箱尺寸的机器人，每台最多可装载8个超大披萨或4袋杂货。据首席执行官扎克·拉什介绍，这些机器人已完成超50万次配送，在各种天气条件下累计行驶数百万英里。

但为了与人类配送员竞争，以约8公里时速在人行道上行驶的Coco机器人必须尽可能可靠。"我们工作的最高标准是准时抵达承诺地点，"拉什说，"这意味着绝不能迷路。"

Coco面临的难题在于无法依赖GPS——城市中无线电信号受建筑反射干扰，导致信号微弱。"我们在许多高楼、地下通道和高速公路密集区域配送，这些地方GPS根本难以生效，"拉什解释道。

"城市峡谷是GPS信号最差的环境，"麦克伦登指出，"观察手机上的蓝点，常会发现它漂移50米，可能让你误入错误街区、错误方向甚至错误街道侧。"这正是Niantic Spatial的用武之地。

过去几年，Niantic Spatial持续收集《精灵宝可梦Go》和《Ingress》（Niantic于2013年推出的前代手机AR游戏）玩家数据，构建视觉定位系统——通过识别视野内容确定位置的技术。"事实证明，让皮卡丘逼真奔跑与让Coco机器人安全精准移动，本质是同一类问题，"Niantic Spatial首席执行官约翰·汉克表示。

"视觉定位并非全新技术，"数字地图与地理空间分析软件公司ESRI的康拉德·温策尔指出，"但显然摄像头覆盖越广，技术效果越佳。"

Niantic Spatial已用城市环境中采集的3000亿张图像训练模型。这些图像尤其集中在热点区域——游戏中鼓励玩家访问的重要地点，如宝可梦对战竞技场。"我们在全球拥有超百万个可精确定位点，"麦克伦登说，"不仅能以厘米级精度确定站立位置，更重要的是能判断视线方向。"

这意味着每个定位点都拥有数千张图像，它们拍摄位置相近但角度各异，且涵盖不同时段与天气条件。每张图像都附带详细元数据，精确记录拍摄时手机的空间位置、朝向、倾斜度、移动状态、速度与方向等信息。

利用该数据集训练的模型，即使面对百万热点之外图像数据稀缺的区域，也能通过分析视觉信息精确预测位置。

除GPS外，配备四摄像头的Coco机器人现可通过该模型判断自身位置与行进方向。机器人摄像头位于臀部高度且全方位覆盖，视角虽与宝可梦玩家略有差异，但拉什表示数据适配十分便捷。

竞争对手同样采用视觉定位系统。例如2014年成立于爱沙尼亚的机器人配送公司Starship Technologies表示，其机器人通过传感器构建环境3D地图，标注建筑边缘与路灯位置。

但拉什相信Niantic Spatial的技术将为Coco带来优势。他宣称该技术能使机器人精准停靠在餐厅外指定取货点，避免妨碍行人，并准确抵达客户门前——而非像过去可能偏差数步。

机器人技术的寒武纪大爆发
汉克透露，Niantic Spatial最初开发视觉定位系统是为应用于增强现实。"佩戴AR眼镜时若希望虚拟内容锚定于现实视野，就需要此类技术，"他说，"但如今我们正见证机器人技术的寒武纪大爆发。"

部分机器人需与人类共享建筑工地、人行道等空间。"若要让机器人以非干扰方式融入人类环境，它们必须具备同等级的空间理解能力，"汉克指出，"当机器人受碰撞移位时，我们能帮助它们精准重定位。"

与Coco Robotics的合作仅是开端。汉克表示，Niantic Spatial正在构建他称为"活地图"的初始模块——随现实世界同步变化的超精细虚拟仿真系统。随着Coco及其他公司的机器人在全球移动，它们将持续提供新的地图数据，推动数字世界复刻版日益精细。

在汉克与麦克伦登看来，地图不仅愈发精细，正被机器越来越多地使用。这改变了地图的用途。长期以来地图帮助人类定位自身，从2D到3D再到4D（如数字孪生实时仿真），其核心原理未变：地图坐标始终对应时空节点。

然而机器所需的地图或许更需接近指南手册，满载人类视为常识的信息。Niantic Spatial与ESRI等公司希望添加描述，让机器理解所见之物，为每个物体标注属性列表。"这个时代的核心是为机器构建可理解的世界描述，"汉克说，"我们拥有的数据为了解世界运行机理提供了绝佳起点。"

世界模型当前备受瞩目——Niantic Spatial深谙此点。大语言模型看似无所不知，但在理解日常环境及交互方面常识匮乏。世界模型旨在弥补这一缺陷。谷歌DeepMind、World Labs等公司正开发能即时生成虚拟幻想世界的模型，作为AI智能体的训练场。

Niantic Spatial表示正从独特角度切入该领域。麦克伦登认为，地图绘制推向极致终将囊括万物："我们专注重现真实世界。虽未完全实现，但这是我们的目标。"

深度聚焦
人工智能
"退出GPT"运动呼吁用户取消ChatGPT订阅
对ICE的抵制正推动更广泛的反对AI公司与特朗普总统关联的运动。

Moltbook成为AI戏剧巅峰时刻
这个病毒式传播的机器人社交网络，既揭示了智能体未来，更折射出当前人类对AI的狂热。

认识将大语言模型视为外星生物的新派生物学家
通过将大语言模型视作生命体而非计算机程序进行研究，科学家首次揭示了它们的部分秘密。

杨立昆的新项目是对大语言模型的逆向押注
这位AI先驱在独家访谈中分享了其巴黎新公司AMI Labs的计划。

保持联系
获取《麻省理工科技评论》最新动态
探索特别优惠、头条新闻、近期活动及更多内容。

英文来源：

How Pokémon Go is giving delivery robots an inch-perfect view of the world
Niantic's AI spinout is training a new world model using 30 billion images of urban landmarks crowdsourced from players.
Pokémon Go was the world’s first augmented-reality megahit. Released in 2016 by the Google spinout Niantic, the AR twist on the juggernaut Pokémon franchise fast became a global phenomenon. From Chicago to Oslo to Enoshima, players hit the streets in the urgent hope of catching a Jigglypuff or a Squirtle or (with a huge amount of luck) an ultra-rare Galarian Zapdos hovering just out of reach, superimposed on the everyday world.
In short, we’re talking about a huge number of people pointing their phones at a huge number of buildings. “Five hundred million people installed that app in 60 days,” says Brian McClendon, CTO at Niantic Spatial, an AI company that Niantic spun out in May last year. According to the video-game firm Scopely, which bought Pokémon Go from Niantic at the same time, the game still drew more than 100 million players in 2024, eight years after it launched.
Now Niantic Spatial is using that vast and unparalleled trove of crowdsourced data—images of urban landmarks tagged with super-accurate location markers taken from the phones of hundreds of millions of Pokémon Go players around the world—to build a kind of world model, a buzzy new technology that grounds the smarts of LLMs in real environments.
The company’s latest product is a model that it says can pinpoint your location on a map to within a few centimeters, based on a handful of snapshots of the buildings or other landmarks in view. The firm wants to use it to help robots navigate with greater precision in places where GPS is unreliable.
In the first big test of its technology, Niantic Spatial has just teamed up with Coco Robotics, a startup that deploys last-mile delivery robots in a number of cities across the US and Europe. “Everybody thought that AR was the future, that AR glasses were coming,” says McClendon. “And then robots became the audience.”
From Pikachu to pizza delivery
Coco Robotics deploys around 1,000 flight-case-size robots—built to carry up to eight extra-large pizzas or four grocery bags—in Los Angeles, Chicago, Jersey City, Miami, and Helsinki. According to CEO Zach Rash, the robots have made more than half a million deliveries to date, covering a few million miles in all weather conditions.
But to compete with human couriers, Coco’s robots, which trundle along sidewalks at around five miles per hour, must be as reliable as possible. “The best way we can do our job is by arriving exactly when we told you we were going to arrive,” says Rash. And that means not getting lost.
The problem Coco faces is that it cannot rely on GPS, which can be weak in cities because radio signals bounce off buildings and interfere with each other. “We do deliveries in a lot of dense areas with high-rises and underpasses and freeways, and those are the areas where GPS just never really works,” says Rash.
“The urban canyon is the worst place in the world for GPS,” says McClendon. “If you look at that blue dot on your phone, you’ll often see it drift 50 meters, which puts you on a different block going a different direction on the wrong side of the street.” That’s where Niantic Spatial comes in.
For the last few years, Niantic Spatial has been taking the data collected from players of Pokémon Go and Ingress (Niantic’s previous phone-based AR game, launched in 2013) and building a visual positioning system, technology that tells you where you are based on what you can see. “It turns out that getting Pikachu to realistically run around and getting Coco’s robot to safely and accurately move through the world is actually the same problem,” says John Hanke, CEO of Niantic Spatial.
“Visual positioning is not a very new technology,” says Konrad Wenzel at ESRI, a company that develops digital mapping and geospatial analysis software. “But it’s obvious that the more cameras we have out there, the better it becomes.”
Niantic Spatial has trained its model on 30 billion images captured in urban environments. In particular, the images are clustered around hot spots—places that served as important locations in Niantic’s games that players were encouraged to visit, such as Pokémon battle arenas. “We had a million-plus locations around the world where we can locate you precisely,” says McClendon. “We know where you’re standing within several centimeters of accuracy and, most importantly, where you’re looking.”
The upshot is that for each of those million locations, Niantic Spatial has many thousands of images taken in more or less the same place but from different angles, at different times of day, and in different weather conditions. Each of those images comes with detailed metadata that pinpoints where in space the phone was at the time it captured the image, including which way the phone was facing, which way up it was, whether or not it was moving, how fast and in which direction, and more.
The firm has used this data set to train a model to predict exactly where it is by taking into account what it is looking at—even for locations other than those million hot spots, where good sources of image and location data are scarcer.
In addition to GPS, Coco’s robots, which are fitted with four cameras, will now use this model to try to figure out where they are and where they are headed. The robots’ cameras are hip-height and point in all directions at once, so their viewpoint is a little different from a Pokémon Go player’s, but adapting the data was straightforward, says Rash.
Rival companies use visual positioning systems too. For example, Starship Technologies, a robot delivery firm founded in Estonia in 2014, says its robots use their sensors to build a 3D map of their surroundings, plotting the edges of buildings and the position of streetlights.
But Rash is betting that Niantic Spatial’s tech will give Coco an edge. He claims it will allow his robots to position themselves in the correct pickup spots outside restaurants, making sure they don’t get in anybody’s way, and stop just outside the customer’s door instead of a few steps away, which might have happened in the past.
A Cambrian explosion in robotics
When Niantic Spatial started work on its visual positioning system, the idea was to apply it to augmented reality, says Hanke. “If you are wearing AR glasses and you want the world to lock in to where you're looking, then you need some method for doing that,” he says. “But now we’re seeing a Cambrian explosion in robotics.”
Some of those robots may need to share spaces with humans—spaces such as construction sites and sidewalks. “If robots are ever going to assimilate into that environment in a way that’s not disruptive for human beings, they’re going to have to have a similar level of spatial understanding,” says Hanke. “We can help robots find exactly where they are when they’ve been jostled and bumped.”
The Coco Robotics partnership is the start. What Niantic Spatial is putting in place, says Hanke, are the first pieces of what he calls a living map: a hyper-detailed virtual simulation of the world that changes as the world changes. As robots from Coco and other firms move about the world, they will provide new sources of map data, feeding into more and more detailed digital replicas of the world.
But the way Hanke and McClendon see it, maps are not only becoming more detailed; they are being used more and more by machines. That shifts what maps are for. Maps have long been used to help people locate themselves in the world. As they moved from 2D to 3D to 4D (think of real-time simulations, such as digital twins), the basic principle hasn’t changed: Points on the map correspond to points in space or time.
And yet maps for machines may need to become more like guidebooks, full of information that humans take for granted. Companies like Niantic Spatial and ESRI want to add descriptions that tell machines what they’re actually looking at, with every object tagged with a list of its properties. “This era is about building useful descriptions of the world for machines to comprehend,” says Hanke. “The data that we have is a great starting point in terms of building up an understanding of how the connective tissue of the world works.”
There is a lot of buzz about world models right now—and Niantic Spatial knows it. LLMs may seem like know-it-alls, but they have very little common sense when it comes to interpreting and interacting with everyday environments. World models aim to fix that. Some firms, such as Google DeepMind and World Labs, are developing models that generate virtual fantasy worlds on the fly, which can then be used as training dojos for AI agents.
Niantic Spatial says it is coming at the problem from a different angle. Push map-making far enough and you’ll end up capturing everything, says McClendon: “I’m very focused on trying to re-create the real world. We’re not there yet, but we want to be there.”
Deep Dive
Artificial intelligence
A “QuitGPT” campaign is urging people to cancel their ChatGPT subscriptions
Backlash against ICE is fueling a broader movement against AI companies’ ties to President Trump.
Moltbook was peak AI theater
The viral social network for bots reveals more about our own current mania for AI as it does about the future of agents.
Meet the new biologists treating LLMs like aliens
By studying large language models as if they were living things instead of computer programs, scientists are discovering some of their secrets for the first time.
Yann LeCun’s new venture is a contrarian bet against large language models
In an exclusive interview, the AI pioneer shares his plans for his new Paris-based company, AMI Labs.
Stay connected
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.

MIT科技评论

文章目录

📚 推荐阅读

扫描二维码，在手机上阅读