Goal-Reaching Policy Learning from Non-Expert Observations via Effective Subgoal Guidance

This work tries to tackle long-horizon tasks by leveraging non-expert, action-free observation data, which is novel (idea is used before I think but no one described like that).

It involves training a state-goal value function to encourage informative exploration (used before) and learning a high-level policy that generates reasonable subgoals. For generating subgoals, it uses Hindsight Experience Replay to sample goals from either the future states within the same trajectory or random states in the dataset (which I think is not so good).

FF's Roam Notes

Explorer

Goal-Reaching Policy Learning from Non-Expert Observations via Effective Subgoal Guidance

Graph View