← Home

Skew-Fit: State-Covering Self-Supervised Reinforcement Learning

Nov 21, 2022

sequential

Try to find subgoals by maximizing the # mutual information:

$$ I(S;G) = H(S) - H(S|G) = H(G) - H(G|S) $$

that is trying to maximizing $H(S)$ and minimizing $H(S|G)$.

To minimizing $H(S|G)$, we train a policy by maximizing the reward: $$ r = log(G|S) $$ which is also the distance between $G$ and $S$. This paper use # RIG to deal with it.

To maximizing $H(S)$, they add a weight on each state to make the probability of less visited state become bigger.