This paper is using FFTAI GR1 humanoid robot to do manipulation tasks in diverse unseen scenarios by collecting data from teleoperation and diffusion policy.
-
For removing preparation works on precise camera calibration and fine-grained point cloud segmentation, they use so called egocentric 3D representation, which is the view of the camera itself, rather than the world frame.
-
They have tested that using end-effector poses or robot joint positions as actions has no significant difference in performance.
-
Teleoperation by using Apple Vision Pro has a very low latency of approximately 0.5 seconds.