Goal-Conditioned Reinforcement Learning with Imagined Subgoals

This paper

tries to find subgoal $s_{g}$ from initial state $s_{i}$ to final state $s_{j}$ which have the maximum distance to $s_{i}$ and $s_{j}$
train policy $π (a ∣ s, g)$ by minimize the KL Divergence between $π (a ∣ s, g)$ and $π (a ∣ s, s_{g})$ , and use $π (a ∣ s, g)$ in test phase

The second part increase the training time and difficulty cause it need to fit two policy. But it reduce the subgoal inference phase in test cases. This is a compromise, and I don’t think it deserves it.

Inferring subgoals and conducting subgoals are unsplitable in a complete task.

Besides, a lot of subgoal searching algorithms are specifying the threshold between two subgoals. Is there some way to automatic do this job?

Update @<2023-08-26 Sat>

What is the subgoal here?

Middle state between $s$ and $g$ .
How to generate subgoals?

By using $π^{H} (s_{g} ∣ s, g)$ .
How to make sure subgoal is valid?
1. Log probability of real subgoal.
2. Weighted by advantage.
How to use subgoal?

Minimize the $D_{K L} (π (s, g) ∣∣ π (s, s_{g}))$ . Transfer the knowledge in subgoal to final goal.

FF's Roam Notes

Explorer

Goal-Conditioned Reinforcement Learning with Imagined Subgoals

Graph View