Reinforcement Learning for Humanoid Robots
RL promises an effective way to learn motor skills by rewarding desirable behaviors and penalizing undesired behaviors, with minimal or no supervision during training. The end-to-end RL policies translate raw sensory input to actuation and are executable in real time.
Some problems encountered:
-
meticulous design of reward functions
- reinforcement learning w/ sparse reward
-
sim-to-real gap
Humanoid robots posses higher DoFs and unstable dynamics, where the center of mass constantly moves out of the support polygon.
- domain randomization by varing the properties of a robot model, such as mass, friction, and actuator dynamics to train a generalized policy.
- system identification by approximating the system's input-output behavior from real-world data.
- domain adaptation by using real-world data directly to fine-tune a simulator-trained policy.