CAREER: Learning Predictive Models for Visual Navigation and Object Interaction

Efficiently moving around and interacting with objects in novel environments requires building expectations about people (e.g., which side will an oncoming person pass by), places (e.g., where are car keys likely to be in a home), and things (e.g., which way will a door open). However, manually building such expectations into decision-making systems is challenging. At the same time, machine learning has been shown to be successful in extracting representative patterns from training datasets in many related application domains. While the use of machine learning to learn predictive models for decision-making seems promising; the design choices (data sources, forms of supervision, architectures for the predictive models, and interaction of the predictive models with decision-making) are deeply intertwined. As part of this project, investigators will identify the precise aspects in which machine learning benefits navigation and object interaction; and co-design datasets, models, and learning algorithms to build systems that realize these benefits. The project will improve the state-of-the-art of predictive reasoning for navigation and object interaction by designing approaches that can leverage large-scale diverse data sources for training. Models, datasets, and systems developed in this project will advance navigation and mobile manipulation capabilities. These will enable practical downstream applications (e.g., assistive robots, telepresence), and open up avenues for follow-up research (e.g., human-robot interaction). The project will contribute to the education of students and the broader community through curriculum development, engagement in research projects, and accessible dissemination of research.

The project will co-design data collection methods, learning techniques, and policy architectures to enable large-scale learning of predictive models for people, places, and things for problems involving navigation and mobile manipulation. Investigators will tackle the following three research tasks: (1) designing predictive models for people, places, and objects that are necessary for decision making; (2) identifying data sources and generating supervision to learn these predictive models at-scale; and (3) hierarchical and modular policy architectures that effectively use the learned predictive models. Investigators will re-use existing sense-plan-control components (motion planners, feedback controllers) where applicable (e.g., motion in free space), and introduce learning in modules that require speculation (i.e., high-level decision-making modules, e.g., identifying promising directions for exploration, predicting where will an oncoming human go next, what is a good position to open a drawer from). Investigators will evaluate the effectiveness of proposed methods by comparing the efficiency of systems with and without predictive reasoning.


Predicting Motion Plans for Articulating Everyday Objects
Arjun Gupta, Max Shepherd, Saurabh Gupta
International Conference on Robotics and Automation (ICRA), 2023

Abstract: Mobile manipulation tasks such as opening a door, pulling open a drawer, or lifting a toilet lid require constrained motion of the end-effector under environmental and task constraints. This, coupled with partial information in novel environments, makes it challenging to employ classical motion planning approaches at test time. Our key insight is to cast it as a learning problem to leverage past experience of solving similar planning problems to directly predict motion plans for mobile manipulation tasks in novel situations at test time. To enable this, we develop a simulator, ArtObjSim, that simulates articulated objects placed in real scenes. We then introduce SeqIK\(+\theta_0\), a fast and flexible representation for motion plans. Finally, we learn models that use SeqIK\(+\theta_0\) to quickly predict motion plans for articulating novel objects at test time. Experimental evaluation shows improved speed and accuracy at generating motion plans than pure search-based methods and pure learning methods.

Building Rearticulable Models for Arbitrary 3D Objects from 4D Point Clouds
Shaowei Liu, Saurabh Gupta*, Shenlong Wang*
Computer Vision and Pattern Recognition (CVPR), 2023

Abstract: We build rearticulable models for arbitrary everyday man-made objects containing an arbitrary number of parts that are connected together in arbitrary ways via 1 degreeof- freedom joints. Given point cloud videos of such everyday objects, our method identifies the distinct object parts, what parts are connected to what other parts, and the properties of the joints connecting each part pair. We do this by jointly optimizing the part segmentation, transformation, and kinematics using a novel energy minimization framework. Our inferred animatable models, enables retargeting to novel poses with sparse point correspondences guidance. We test our method on a new articulating robot dataset, and the Sapiens dataset with common daily objects, as well as real-world scans. Experiments show that our method outperforms two leading prior works on various metrics.




This material is based upon work supported by the National Science Foundation under Grant No. IIS-2143873 (Project Title: CAREER: Learning Predictive Models for Visual Navigation and Object Interaction , PI: Saurabh Gupta). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.