← Home

Goal-Conditioned Reinforcement Learning with Imagined Subgoals

Nov 25, 2022

sequential

This paper

tries to find subgoal $s_g$ from initial state $s_i$ to final state $s_j$ which have the maximum distance to $s_i$ and $s_j$
train policy $\pi(a|s,g)$ by minimize the # KL Divergence between $\pi(a|s,g)$ and $\pi(a|s,s_g)$, and use $\pi(a|s,g)$ in test phase

The second part increase the training time and difficulty cause it need to fit two policy. But it reduce the subgoal inference phase in test cases. This is a compromise, and I don't think it deserves it.

Inferring subgoals and conducting subgoals are unsplitable in a complete task.

Besides, a lot of subgoal searching algorithms are specifying the threshold between two subgoals. Is there some way to automatic do this job?

Update @<2023-08-26 Sat>

What is the subgoal here? Middle state between $s$ and $g$.
How to generate subgoals? By using $\pi^{H}(s_g|s,g)$.
How to make sure subgoal is valid?
1. Log probability of real subgoal.
2. Weighted by advantage.
How to use subgoal? Minimize the $D_{KL}(\pi(s,g) || \pi(s,s_g))$. Transfer the knowledge in subgoal to final goal.