The method proposed is a hierarchical RL algorithm as the title said, but the main part is focus on the ‘high’ level, which is subtask option choice.
So, the simple and basic solution is:
- build graph (not mentioned but I guess it’s build by hand)
- use R3NN to construct policy model, which use the graph and observation as input, and subtask option as output
- use pre-trained(non parameterized) policy to accelerate the training process of R3NN model (hard to understand)
- use GAE to refine the policy model
The most novel idea this paper proposed is the subtask graph, which consist of several basic but useful definition: precondition, eligibility and completion vector. It actually gives a clear solution for others to solve this problem.
Hierarchical methods are easy to find out, is this the only method we can use to solve this problem?