Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning
This paper taking the idea from behavior-cloning algorithms that predicting a sequence of actions enables policies to effectively approximate noisy, multi-modal distribution s of expert demonstrations, learns a critic network that outputs Q-values over a sequence of actions.