Zhanpeng He

Email: [firstname] at cs dot columbia dot edu

CV / Google Scholar / GitHub

I am a third-year Ph.D student advised by Professor Matei Ciocarlie and Professor Shuran Song at Columbia University. I am interested in robotics, reinforcement learning and computer vision.

Before joining CU, I recieved a master's degree from University of Southern California, where I worked as a research assistant at the Robotic Embedded System Laboratory and advised by Professor Gaurav Sukhatme and Professor Stefan Schaal. Before joining USC, I received a Bachelor of Science degree in Computer Science from the Rutgers University.

Research Interests

Research Projects (* - indicates equal contribution)

Discovering Synergies for Robot Manipulation with Multi-Task Reinforcement Learning

Zhanpeng He, and Matei Ciocarlie

website / paper

2022 IEEE International Conference on Robotics and Automation (ICRA)

Abstract: Controlling robotic manipulators with high-dimensional action spaces for dexterous tasks is a challenging problem. Inspired by human manipulation, researchers have studied generating and using postural synergies for robot hands to accomplish manipulation tasks, leveraging the lower dimensional nature of synergistic action spaces. However, many of these works require pre-collected data from an existing controller in order to derive such a subspace by means of dimensionality reduction. In this paper, we present a framework that simultaneously discovers a synergy space and a multi-task policy that operates on this low-dimensional action space to accomplish diverse manipulation tasks. We demonstrate that our end-to-end method is able to perform multiple tasks using few synergies, and outperforms sequential methods that apply dimensionality reduction to independently collected data. We also show that deriving synergies using multiple tasks can lead to a subspace that enables robots to efficiently learn new manipulation tasks and interactions with new objects.

UMPNet: Universal Manipulation Policy Network for Articulated Objects

Zhenjia Xu, Zhanpeng He, and Shuran Song

website / paper

2022 IEEE International Conference on Robotics and Automation (ICRA) RA-L

Abstract: We introduce the Universal Manipulation Policy Network (UMPNet) -- a single image-based policy network that infers closed-loop action sequences for manipulating arbitrary articulated objects. To infer a wide range of action trajectories, the policy supports 6DoF action representation and varying trajectory length. To handle a diverse set of objects, the policy learns from objects with different articulation structures and generalizes to unseen objects or categories. The policy is trained with self-guided exploration without any human demonstrations, scripted policy, or pre-defined goal conditions. To support effective multi-step interaction, we introduce a novel Arrow-of-Time action attribute that indicates whether an action will change the object state back to the past or forward into the future. With the Arrow-of-Time inference at each interaction step, the learned policy is able to select actions that consistently lead towards or away from a given state, thereby, enabling both effective state exploration and goal-conditioned manipulation.

Hardware as Policy: Mechanical and ComputationalCo-Optimization using Deep Reinforcement Learning

Tianjian Chen*, Zhanpeng He* , and Matei Ciocarlie

website / paper

Conference on Robot Learning (CoRL) 2020

Abstract: Deep Reinforcement Learning (RL) has shown great success in learning complex control policies for a variety of applications in robotics. However, in most such cases, the hardware of the robot has been considered immutable, modeled as part of the environment. In this study, we explore the problem of learning hardware and control parameters together in a unified RL framework. To achieve this, we propose to model the robot body as a ''hardware policy'', analogous to and optimized jointly with its computational counterpart. We show that, by modeling such hardware policies as auto-differentiable computational graphs, the ensuing optimization problem can be solved efficiently by gradient-based algorithms from the Policy Optimization family. We present two such design examples: a toy mass-spring problem, and a real-world problem of designing an underactuated hand. We compare our method against traditional co-optimization approaches, and also demonstrate its effectiveness by building a physical prototype based on the learned hardware parameters.

Learning 3D Dynamic Scene Representations for Robot Manipulation

Zhenjia Xu*, Zhanpeng He* , Jiajun Wu, and Shuran Song

website / codes / paper

Conference on Robot Learning (CoRL) 2020

Abstract: 3D scene representations for robotic manipulation should capture three key object properties: permanency - objects that become occluded over time continue to exist; amodal completeness - objects have 3D occupancy, even if only partial observations are available; spatiotemporal continuity - the movement of each object is continuous over space and time. In this paper, we introduce 3D Dynamic Scene Representation (DSR), a 3D volumetric scene representation that simultaneously discovers, tracks, reconstructs objects and predicts their dynamics, while capturing all three properties. We further propose DSR-Net, which learns to aggregate visual observations over multiple interactions to gradually build and refine DSR. Our model achieves state-of-the-art performance in modeling 3D scene dynamics with DSR on both simulated and real data. Combined with model predictive control, DSR-Net enables accurate planning in downstream robotic manipulation tasks such as planar pushing.

SQUIRL: Robust and Efficient Learning from Video Demonstration of Long-Horizon Robotic Manipulation Tasks

Bohan Wu, Feng Xu, Zhanpeng He, Abhi Gupta, and Peter K. Allen

website / video / paper

International Conference on Intelligent Robots and Systems (IROS) 2020

Abstract: This paper aims to address this scalability challenge with a robust, sample-efficient, and general meta-IRL algorithm, SQUIRL, that performs a new but related long-horizon task robustly given only a single video demonstration. First, this algorithm bootstraps the learning of a task encoder and a task-conditioned policy using behavioral cloning (BC). It then collects real-robot experiences and bypasses reward learning by directly recovering a Q-function from the combined robot and expert trajectories. Next, this algorithm uses the Q-function to re-evaluate all cumulative experiences collected by the robot to improve the policy quickly. In the end, the policy performs more robustly (90%+ success) than BC on new tasks while requiring no trial-and-errors at test time. Finally, our real-robot and simulated experiments demonstrate our algorithm's generality across different state spaces, action spaces, and vision-based manipulation tasks, e.g., pick-pour-place and pick-carry-drop.

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta- Reinforcement Learning

Tianhe Yu*, Deirdre Quillen*, Zhanpeng He*, Ryan C Julian, Karol Hausman, Sergey Levine, and Chelsea Finn

website / codes / paper

Conference on Robot Learning (CoRL) 2019

Abstract: In this paper, we propose an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic manipulation tasks, with the aim of making it possible to develop algorithms that generalize to accelerate the acquisition of entirely new, held-out tasks. We evaluate 6 state-of-the-art meta-reinforcement learning and multi-task learning algorithms on these tasks. Surprisingly, while each task and its variations (e.g., with different object positions) can be learned with reasonable success, these algorithms struggle to learn with multiple tasks at the same time, even with as few as nine distinct training tasks. Our analysis and open-source environments pave the way for future research in multi-task learning and meta-learning that can enable meaningful generalization, thereby unlocking the full potential of these methods.

Simulator Predictive Control: Using Learned Task Representations and MPC for Zero-Shot Generalization and Sequencing

Zhanpeng He*, Ryan C Julian*, Eric Heiden, Hejia Zhang, Stefan Schaal, Joseph Lim, Gaurav S Sukhatme, and Karol Hausman

arXiv / code / video

Neural Information Processing Systems (NIPS) 2018 Deep RL Workshop

Abstract: Simulation-to-real transfer is an important strategy for making reinforcement learning practical with real robots. Successful sim-to-real transfer systems have difficulty producing policies which generalize across tasks, despite training for thousands of hours equivalent real robot time. To address this shortcoming, we present a novel approach to efficiently learning new robotic skills directly on a real robot, based on model-predictive control (MPC) and an algorithm for learning task representations. In short, we show how to reuse the simulation from the pre-training step of sim-to-real methods as a tool for foresight, allowing the sim-to-real policy adapt to unseen tasks.

Scaling Simulation-to-real Transfer by Learning Composable Robot Skills

Ryan C Julian*, Eric Heiden*, Zhanpeng He, Hejia Zhang, Stefan Schaal, Joseph Lim, Gaurav S Sukhatme, and Karol Hausman

arXiv / code / video / journal version

International Journal of Robotics Research (IJRR), International Symposium on Experimental Robotics (ISER) 2018

Abstract: This paper presents a novel solution to the problem of simulation-to-real transfer, which builds on recent advances in robot skill decomposition. Rather than focusing on minimizing the simulation-reality gap, we learn a set of diverse policies that are parameterized in a way that makes them easily reusable. This diversity and parameterization of low-level skills allows us to find a transferable policy that is able to use combinations and variations of different skills to solve more complex, high-level tasks.


I am a member of rlworkgroup and take part in development of several robot-learning-related open-source projects.