DON6D: a Decoupled One-stage Network for 6D Pose Estimation

Overview

Journal Sci Rep

Specialty Science

Date 2024 Apr 10

PMID 38600244

Authors

Zheng Wang

Hangyao Tu

Yutong Qian

Yanwei Zhao

Affiliations

Soon will be listed here.

Abstract

The six-dimensional (6D) pose object estimation is a key task in robotic manipulation and grasping scenes. Many existing two-stage solutions with a slow inference speed require extra refinement to handle the challenges of variations in lighting, sensor noise, object occlusion, and truncation. To address these challenges, this work proposes a decoupled one-stage network (DON6D) model for 6D pose estimation that improves inference speed on the premise of maintaining accuracy. Particularly, since the RGB images are aligned with the RGB-D images, the proposed DON6D first uses a two-dimensional detection network to locate the interested objects in RGB-D images. Then, a module of feature extraction and fusion is used to extract color and geometric features fully. Further, dual data augmentation is performed to enhance the generalization ability of the proposed model. Finally, the features are fused, and an attention residual encoder-decoder, which can improve the pose estimation performance to obtain an accurate 6D pose, is introduced. The proposed DON6D model is evaluated on the LINEMOD and YCB-Video datasets. The results demonstrate that the proposed DON6D is superior to several state-of-the-art methods regarding the ADD(-S) and ADD(-S) AUC metrics.

References

He K, Zhang X, Ren S, Sun J . Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans Pattern Anal Mach Intell. 2015; 37(9):1904-16. DOI: 10.1109/TPAMI.2015.2389824. View

Liang G, Chen F, Liang Y, Feng Y, Wang C, Wu X . A Manufacturing-Oriented Intelligent Vision System Based on Deep Neural Network for Object Recognition and 6D Pose Estimation. Front Neurorobot. 2021; 14:616775. PMC: 7817625. DOI: 10.3389/fnbot.2020.616775. View

Xie S . Research on the Industrial Robot Grasping Method Based on Multisensor Data Fusion and Binocular Vision. Comput Intell Neurosci. 2022; 2022:4443100. PMC: 9159861. DOI: 10.1155/2022/4443100. View

Zou L, Huang Z, Gu N, Wang G . 6D-ViT: Category-Level 6D Object Pose Estimation via Transformer-Based Instance Representation Learning. IEEE Trans Image Process. 2022; 31:6907-6921. DOI: 10.1109/TIP.2022.3216980. View