They try to learn goal-conditioned policy and goal achievement reward policy by measuring how similar current state is to the goal state.

But they only consider how to approach the goal state, but not take into consideration how to do more exploration, that is, how to far way from the initial state.