This paper uses VAE to map the state space to latent representation space, and then do goal-conditioned reinforcement learning in latent space.
What’s the contribution of VAE?
- generate state representation from a distribution
- calculate reward function for current state
- sample goal state. Goal state and other state are in the same latent space, thus we can sample goal state from latent space directly
VAE: S —encoder⇒ z —decoder⇒ S
We can sample latent goal directly from latent space z.
Brief introduction of algorithm RIG:
- collect data and train VAE encoder and decoder
- sample goal state
- sample initial state and collect (s, a, g, r) into replay buffer
- minimize universal loss function by any off-policy algorithm
- use updated policy to collect data and refine the replay buffer