Refer to Imitation Learning for more details.

Four possible source of demonstrations for humanoid robots:

  • policy execution on hardware (robot)

    • collecting data on physical robots (requires a laborious setup of the environment and raises significant safety concerns)

    • collecting data in simulation (sim-to-real gap)

  • Teleoperation (robot)

    Control flow for learning from teleoperated demonstrations:

    1. human operator (VR, Exoskeleton, Optical Tracking, Motion Capture, Joystick, ALOHA)

    2. motion retargeting

    3. desired robot trajectory

    Some limitations:

    • a majority of teleoperation systems capture only manipulation skills, full-body sensing, including human gaits are missing (requires IMU or exoskeletons which are expensive)

    • the teleoperation data may limited if the robot’s kinematics do not enable seamless retargeting

    • time-consuming to scale

  • motion capture from human (3D) (human)

    Captures humans interacting with various objects while moving around.

    • require heavily instrumented environments and actors

    • less outdoor activities

  • human videos from internet (2D) (human)

    Obtain rich and diverse human motion datafrom the internet.

    • lower quality, containing noise, non-physical artifacts