FF's Notes
← Home

VIMA: General Robot Mnipulation with Multimodal Prompts

Jan 27, 2024

This paper proposes a transformer decoder based network, which use multimodal prompts and historical interactions as inputs to predicts motor commands.