Goal-Conditioned Reinforcement Learning with Imagined Subgoals
This paper
- tries to find subgoal $s_g$ from initial state $s_i$ to final state $s_j$ which have the maximum distance to $s_i$ and $s_j$
- train policy $\pi(a|s,g)$ by minimize the # KL Divergence between $\pi(a|s,g)$ and $\pi(a|s,s_g)$, and use $\pi(a|s,g)$ in test phase
The second part increase the training time and difficulty cause it need to fit two policy. But it reduce the subgoal inference phase in test cases. This is a compromise, and I don't think it deserves it.
Inferring subgoals and conducting subgoals are unsplitable in a complete task.
Besides, a lot of subgoal searching algorithms are specifying the threshold between two subgoals. Is there some way to automatic do this job?
Update @
- What is the subgoal here? Middle state between $s$ and $g$.
- How to generate subgoals? By using $\pi^{H}(s_g|s,g)$.
-
How to make sure subgoal is valid?
- Log probability of real subgoal.
- Weighted by advantage.
- How to use subgoal? Minimize the $D_{KL}(\pi(s,g) || \pi(s,s_g))$. Transfer the knowledge in subgoal to final goal.