Modular Multitask Reinforcement Learning with Policy Sketches

Give the policy sketches, which means the order (or the graph) of subtasks is known as prior. Thus, this paper is mainly focus on the ‘low’ level of this hierarchical problem, but from a global perspective.

The idea is simple:

train policy $π_{i}$ for each subtask $i$ , thus the policy of whole task is $π = [π_{i}]$
collect data and curriculum learning methods to learn each sub policy

The most novel idea of this paper is that they use a novel MDP, more specifically, a novel state transition probability to describe the relationship in multi-task MDP.

FF's Roam Notes

Explorer

Modular Multitask Reinforcement Learning with Policy Sketches

Graph View