Visual Reinforcement Learning with Imagined Goals
This paper uses VAE to map the state space to latent representation space, and then do goal-conditioned reinforcement learning in latent space.
What's the contribution of VAE?
- generate state representation from a distribution
- calculate reward function for current state
- sample goal state. Goal state and other state are in the same latent space, thus we can sample goal state from latent space directly
VAE: S –encoder–> z –decoder–> S
We can sample latent goal directly from latent space z.
Brief introduction of algorithm RIG:
- collect data ${S}$ and train VAE encoder and decoder
- sample goal state $g$
- sample initial state $s_0$ and collect (s, a, g, r) into replay buffer
- minimize universal loss function by any off-policy algorithm
- use updated policy to collect data and refine the replay buffer