This paper

  1. tries to find subgoal from initial state to final state which have the maximum distance to and

  2. train policy by minimize the KL Divergence between and , and use in test phase

The second part increase the training time and difficulty cause it need to fit two policy. But it reduce the subgoal inference phase in test cases. This is a compromise, and I don’t think it deserves it.

Inferring subgoals and conducting subgoals are unsplitable in a complete task.

Besides, a lot of subgoal searching algorithms are specifying the threshold between two subgoals. Is there some way to automatic do this job?

Update @<2023-08-26 Sat>

  1. What is the subgoal here?

    Middle state between and .

  2. How to generate subgoals?

    By using .

  3. How to make sure subgoal is valid?

    1. Log probability of real subgoal.
    2. Weighted by advantage.
  4. How to use subgoal?

    Minimize the . Transfer the knowledge in subgoal to final goal.