Try to find subgoals by maximizing the mutual information:
that is trying to maximizing and minimizing .
To minimizing , we train a policy by maximizing the reward:
which is also the distance between and . This paper use RIG to deal with it.
To maximizing , they add a weight on each state to make the probability of less visited state become bigger.