Skew-Fit: State-Covering Self-Supervised Reinforcement Learning

Try to find subgoals by maximizing the mutual information:

I (S; G) = H (S) - H (S ∣ G) = H (G) - H (G ∣ S)

that is trying to maximizing $H (S)$ and minimizing $H (S ∣ G)$ .

To minimizing $H (S ∣ G)$ , we train a policy by maximizing the reward:

r = l o g (G ∣ S)

which is also the distance between $G$ and $S$ . This paper use RIG to deal with it.

To maximizing $H (S)$ , they add a weight on each state to make the probability of less visited state become bigger.

FF's Roam Notes