EgoScale: Scaling Dexterous Manipulation with Diverse Egocentric Human Data
The training of EgoScale:
- pre-training with 20854 hours in-the-wild egocentric recordings spanning diverse real-world environments, incorporating with 829 hours EgoDex dataset that covers 194 tasks. No freezing modules.
- mid-training with 344 teletop manipulation tasks, with each task captured in approximately 30 human trajectories and 5 robot trajectories, totaling about 50 hours of human data and only 4 hours of robot data. Freezing VLM backbone and only updating vision encoder and DiT head.
- post-training on task specific robot demonstrations.
The model architecture is nothing new here.