This paper taking the idea from behavior-cloning algorithms that predicting a sequence of actions enables policies to effectively approximate noisy, multi-modal distribution s of expert demonstrations, learns a critic network that outputs Q-values over a sequence of actions.