MoCam-VR:

An Occlusion-Resilient Low-Cost Dexterous Hand-arm Teleoperation System for Imitation Learning

1Beijing Institute of Technology

Abstract

We introduce MoCam-VR (Motion Capture Camera System), a low-cost teleoperation system designed for hand-arm manipulation.

MoCam-VR integrates a dynamic camera setup that can maintain an optimal perspective of the hand. This innovative design eliminates finger occlusion and expands the range of motion. By leveraging multi-perspective views displayed in VR, MoCam-VR effectively addresses environmental occlusion. With properly configured camera perspectives and visual feedback, it ensures a correct human-robot mapping without requiring any calibration, enabling natural and intuitive teleoperation regardless of the operator's position—whether facing the robot's front, back, or side.

Experimental results demonstrate that MoCam-VR outperforms other low-cost vision-based systems in terms of dexterity and reliability, while achieving performance comparable to high-end motion capture solutions across a variety of tasks. Constructed using off-the-shelf components and 3D printed parts, MoCam-VR is affordable and easily replicable.

Introductioin

Method

System Layout

System Layout: Left: The user wears a VR headset and gloves to capture data. To mitigate environmental occlusion, the VR headset receives multi-Perspective from additional RGB cameras. Right: With support for multiple operation modes, the system enables multi end-effectors manipulation.

Teleoperation Results

We evaluate MoCam-VR by completing all ten tasks from the Telekinesis benchmark. Each task is executed ten times consecutively, and the success rate (SR) is recorded. Our results highlight MoCam-VR’s exceptional performance among vision-based low-cost teleoperation systems. As shown in the table above, MoCam-VR achieves a perfect success rate (10/10) in eight out of the ten tasks, outperforming both Telekinesis and AnyTeleop in complex tasks such as Box Rotation, Cup Stack, and Open Drawer & Pickup Cup.

MoCam-VR demonstrates consistent superiority in tasks requiring precise dexterity and stability, such as Scissor Pickup and Two Cup Stacking, where even a minor error in hand reconstruction could lead to failure. Notably, in the Open Drawer task, MoCam-VR achieved a perfect score despite the added complexity of requiring the drawer to be opened by grasping the handle, compared to AnyTeleop’s simpler approach.

Compared to AnyTeleop, which relies solely on fixed RGB or RGB-D cameras, MoCam-VR’s hand-mounted camera maintains an optimal viewpoint, minimizing finger occlusion and enabling more accurate hand pose estimation. In tasks prone to environmental occlusion, such as Open Drawer & Pickup Cup, MoCam-VR achieves a 100% success rate due to the additional perspectives provided by multiple cameras, offering a clearer view of objects and enabling more precise grasping.

Video

Task Description

Success Rate Comparison

Task Telekinesis AnyTeleop Ours
Pickup Dice Toy 9/10 10/10 10/10
Pickup Fabric Toy 9/10 10/10 10/10
Box Rotation 6/10 6/10 10/10
Scissor Pickup 7/10 8/10 9/10
Cup Stack 6/10 9/10 10/10
Two Cup Stacking 3/10 7/10 9/10
Pouring Cubes onto Plate 7/10 7/10 10/10
Cup Into Plate 8/10 10/10 10/10
Open Drawer 9/10 10/10 10/10
Open Drawer & Pickup Cup 6/10 9/10 10/10