Try to find subgoals by maximizing the mutual information:

that is trying to maximizing and minimizing .

To minimizing , we train a policy by maximizing the reward:

which is also the distance between and . This paper use RIG to deal with it.

To maximizing , they add a weight on each state to make the probability of less visited state become bigger.