AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Deep Reinforcement Learning for Robotic Grasping from Octrees: Learning Manipulation from Compact 3D Observations

Translated title

Deep Reinforcement Learning for Robotic Grasping from Octrees

Author

Term

4. semester

Education

Publication year

2021

Submitted on

Pages

69

Abstract

This thesis explores whether deep reinforcement learning can enable vision-based robotic grasping of diverse objects using compact 3D representations called octrees. A new simulation environment with photorealistic rendering and domain randomization is created to train agents with model-free, off-policy actor–critic algorithms (a common trial-and-error approach that does not rely on a predefined model). In this setting, the agent learns an end-to-end policy that maps 3D observations directly to continuous arm and gripper motions. A 3D convolutional neural network serves as a feature extractor for stacked octree inputs and is trained jointly with the actor–critic networks. A policy trained on octree observations achieves successful grasps in novel scenes with previously unseen objects, material textures, and random camera poses. Experiments indicate that 3D data representations offer advantages over commonly used 2D RGB and 2.5D RGB-D image inputs. Finally, sim-to-real transfer is demonstrated by running the simulation-trained agent on a real robot without additional retraining.

Denne afhandling undersøger, om dyb forstærkningslæring kan muliggøre synsbaseret robotgribning af mange forskellige objekter ud fra kompakte 3D-repræsentationer kaldet octrees. Der udvikles et nyt simuleringsmiljø med fotorealistisk rendering og domænerandomisering, som bruges til at træne agenter med model-frie, off-policy actor–critic-algoritmer (en udbredt prøve-og-fejl-metode uden en foruddefineret model). I dette miljø lærer agenten en end-to-end-styringspolitik, der kortlægger 3D-observationer direkte til kontinuerte arm- og gribebevægelser. Et 3D-konvolutionsneuralt netværk fungerer som feature-ekstraktor for stakkede octree-input og trænes sammen med actor–critic-netværkene. En politik trænet på octree-observationer opnår vellykkede greb i nye scener med hidtil usete objekter, materialeteksturer og tilfældige kameraposeringer. Eksperimenterne tyder på, at 3D-datapræsentationer har fordele frem for almindeligt brugte 2D RGB- og 2,5D RGB-D-billedinput. Endelig demonstreres sim-til-virkelighed-overførsel ved at køre den simulerings-trænede agent på en rigtig robot uden yderligere retræning.

[This apstract has been rewritten with the help of AI based on the project's original abstract]