Train a subgoal policy by using imitation learning. That is, by collecting a bunch of expert data.