FF's Roam Notes

❯

Build & Train VLA Models

Build & Train VLA Models

Jun 05, 20251 min read

LLM
VLA

From MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation:

A 7B VLA model running on a commercial-grade GPU like the RTX 4090 generally achieves an inference frequency of approximately 5-12 Hz.

Training Parameters

Data Collator

Core functionality:

Batch creation.
Padding. Ensuring all samples in a batch (such as variable-length text inputs) have the same dimensions.
Convert Python objects to PyTorch.
Create attention masks or add special tokens.

Graph View

Training Parameters
Data Collator

Created with Quartz v4.5.1 © 2025

GitHub
Discord Community