Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in Latent Space

This paper is exactly what I what do and published recently. That gives me a lot of confidence to continue this work. At least this is not some simple or unnecessary work.(Jeffery do give me big help! The partial order idea!)

This paper consists of two parts. First one to decompose the long-horizon tasks (maybe contains order, sequential) to goal-reaching problem, and use a high-level planner to generate subgoals for low-level policy. Another part try to train the low-level policy by using offline data and fine-tuned by online data which trained on a shorten task (subgoal reinforcement learning problem).

Previous methods either propose subgoals from the set of previously seen states, or directly optimize over subgoals, often by utilizing a latent variable model to obtain a concise representation of image-based states.

Train low-level policy $π (a ∣ s, s_{g})$ by offline data. (Q: How to promise all goal has been trained? ⇒ Exploration, Distribution of goal)
Use conditional variational encoder to sample a set of subgoals, and choose the optimal subgoals as subgoals
For each subgoals, use $π$ to generate actions and fine-tune it by online data

Recommend to check the part how to generate a set of subgoals which I think is the most novel part of this paper.

FF's Roam Notes

Explorer

Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in Latent Space

Graph View