Explainer Video
Real-World Demonstration
(All robot videos are 1x
speed)
Below are additional real world examples of LIMP generating task and motion plans to follow expressive instructions with complex spatiotemporal constraints. Each demonstration has two videos, the top video visualizes instruction translation, referent grounding, task progression semantic maps and computed motion plans for each example. The bottom video shows a robot executing the generated plan in the real world. Please see our paper for more details on our approach.
Baseline Comparison
(All robot videos are 1x
speed)
We compare LIMP with baseline implementations of an LLM task planner (NLmap-Saycan) and an LLM code-writing planner (Code-as-policies), representing state-of-the-art approaches for open-ended language-conditioned robot instruction following. To ensure competitive performance, we integrate our spatial grounding module and low-level robot control into these baselines, allowing them to query our module for 3D object positions, execute manipulation options, and use our path planner. We observe that LLM and code-writing planners are quite adept at generating sequential subgoals, but struggle with temporal constraint adherence. In contrast, our approach ensures each robot step adheres to constraints while achieving subgoals, as illustrated in the example below.
Instruction
"Hey, I
want you to bring the plush toy on the table to the tree, make sure to
avoid the trash bin when bringing the toy"
More Demonstration Videos
We run a comprehensive evaluation on 150 natural language instructions in multiple real world environments. See below for some additional instruction following videos.