FF's Notes
← Home

ManiFlow: A General Robot Manipulation Policy via Consistency Flow Training

This paper propose DiT-X, which is a diffusion transformer architecture with adaptive cross-attention and AdaLN-Zero conditioning that enables fine-gained feature interactions between action tokens and multi-modal observations: vision, language, and proprioceptive state.