FF's Notes
← Home

AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents

Jul 8, 2024

AutoRT leverages vision-language models (VLMs) for scene understanding and grounding, and further uses large language models (LLMs) for proposing diverse and novel instructions to be performed by a fleet of robots.

It is the first system that LLM driven in real world settings, propose their own goals, and take actions toward those goals.

For navigation, they are using almost same way as # OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics.

It also proposes some rules, called Robot Constitution, consisting of Foundational rules (Asimov's three laws), Safety rules and Embodiment rules. The idea is great.