Give the policy sketches, which means the order (or the graph) of subtasks is known as prior. Thus, this paper is mainly focus on the ‘low’ level of this hierarchical problem, but from a global perspective.
The idea is simple:
-
train policy for each subtask , thus the policy of whole task is
-
collect data and curriculum learning methods to learn each sub policy
The most novel idea of this paper is that they use a novel MDP, more specifically, a novel state transition probability to describe the relationship in multi-task MDP.