The key problem being solved is enabling humanoid robots to perform diverse loco-manipulation tasks (involving both locomotion and manipulation) by allowing an operator to seamlessly control the robot's full-body movements, including walking, squatting to varying heights, and coordinating arm/hand motions. Existing approaches have limitations in operational workspace, precision, or requiring multiple operators.
The proposed solution has two core components:
1) A reinforcement learning training framework that produces robust loco-manipulation policies. This uses techniques like an upper-body pose curriculum, height tracking reward, and symmetry utilization to enable stable walking, squatting to any height, and dynamic arm movements. Unlike prior work, it does not require motion capture data.
2) A low-cost isomorphic exoskeleton hardware system for teleoperation. This includes 3D-printed exoskeleton arms matched to the robot, motion-sensing gloves for dexterous control, and a pedal for locomotion commands. By directly mapping the operator's motions, it achieves over 200% faster and more precise control than vision-based methods.
Key findings and implications:
- The RL-trained policies allow diverse loco-manipulation capabilities like squatting while moving arms, adapting smoothly to changing poses, and balancing during locomotion. This significantly expands the robot's workspace.
- The exoskeleton system enables highly responsive teleoperation, reducing task times by nearly 50% compared to VR methods. Its low cost ($500) makes it accessible.
- Real-world experiments validate the system's robustness across environments. Demonstrations collected via teleoperation can also train autonomous policies via imitation learning.
- Incorporating techniques like the pose curriculum, height reward, and symmetry losses improves training convergence and policy performance, as shown in ablation studies.
The system supports seamless control of humanoid robots in both real and simulated settings for complex loco-manipulation tasks using a single operator. This overcomes key limitations of previous approaches and has implications for robotic teleoperation, data collection, and autonomous policy learning.
Authors:
Qingwei Ben, Feiyu Jia, Jia Zeng, Junting Dong, Dahua Lin, Jiangmiao Pang
Original paper :