Key idea
Behavioral cloning policies are often represented by explicit continuous feed-forward models that mappingg directly from input observations to output action .
In this work, they reformulate BC using the composition of argmin with an energy function to represent the policy,
Then train the model with different EBM training methods.
What you can learn?
- From explicit to implicit. If there’s a way to transfer the methods/optimization goal from explicit to implicit, there’s a way to write a paper.
Drawbacks
Not found.