RL promises an effective way to learn motor skills by rewarding desirable behaviors and penalizing undesired behaviors, with minimal or no supervision during training. The end-to-end RL policies translate raw sensory input to actuation and are executable in real time.
Some problems encountered:
-
meticulous design of reward functions
- reinforcement learning w/ sparse reward
-
sim-to-real gap
Humanoid robots posses higher DoFs and unstable dynamics, where the center of mass constantly moves out of the support polygon.
-
domain randomization by varing the properties of a robot model, such as mass, friction, and actuator dynamics to train a generalized policy.
-
system identification by approximating the system’s input-output behavior from real-world data.
-
domain adaptation by using real-world data directly to fine-tune a simulator-trained policy.
-