Current imitation learning requires large amount of expert demonstrations. This paper proposes Latent Diffusion Planning (LDP) to leverage action-free demonstrations for planning, and sub-optimal data for inverse dynamics model (IDM).
-
Train the latent encoding of the observation by using VAE loss;
-
Use diffusion model as planner to forecasting a dense trajectory of short-future latent states;
-
Train diffusion model as IDM to generate actions based on latent states.