#VLA
19 notes
- VLAS: Vision-Language-Action Model With Speech Instructions For Customized Robot Manipulation
- MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation
- DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
- Accelerating Vision-Language-Action Model Integrated with Action Chunking via Parallel Decoding
- SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
- GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
- HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
- CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
- Real-World Robot Applications of Foundation Models: A Review
- Improving Vision-Language-Action Model with Online Reinforcement Learning
- Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better
- OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation
- $\pi_{0.6}$: A VLA That Learns From Experience
- Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression
- TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policiy
- $\pi_0$: A Vision-Language-Action Flow Model for General Robot Control
- Scaling proprioceptive-Visual Learning with heterogeneous Pre-trained Transformers
- Vision-Language-Action Models
- From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control