Learning from Trajectories via Subgoal Discovery
Train a subgoal policy $\pi(s_g|s_t)$ by using imitation learning. That is, by collecting a bunch of expert data.
Train a subgoal policy $\pi(s_g|s_t)$ by using imitation learning. That is, by collecting a bunch of expert data.