This paper introduce a Challenge in NIPS, with two baseline methods introduced for solving OVMM problems.
Open-Vocabulary Mobile Manipulation (OVMM) is the problem of picking any object in any unseen environment, and placing it in a commanded location.
For heurstic solution, they use scripted motion planner for real-world object search
For RL solution, they learn a model to navigate to objects.
They use open-vocbulary object detector DETIC and ground truth to provide object segmentation.
The default baseline used here is a state-machine that calls FindObj
, Gaze
, Pick
, FindRec
and Place
in that order, where Pick
is a grasping policy provided by the robot library.
Performance Results
- Ground truth results better than DETIC detected
- Heurstic better than RL
Brainstoms
-
The proposed state-machine order is not optimal, and can be improved by:
- What if failed? How to replan?
- What if interupt or cooperated by humans?
-
Gaze
is the most important part here. The goal ofGaze
is to improve the success rate of grasp by move close enough to an object, and orient head to get a good view of the object, and then generate good pose to grasp it. -
For the low level part (navigation), I don’t want to focus on that.