» Articles » PMID: 38600244

DON6D: a Decoupled One-stage Network for 6D Pose Estimation

Overview
Journal Sci Rep
Specialty Science
Date 2024 Apr 10
PMID 38600244
Authors
Affiliations
Soon will be listed here.
Abstract

The six-dimensional (6D) pose object estimation is a key task in robotic manipulation and grasping scenes. Many existing two-stage solutions with a slow inference speed require extra refinement to handle the challenges of variations in lighting, sensor noise, object occlusion, and truncation. To address these challenges, this work proposes a decoupled one-stage network (DON6D) model for 6D pose estimation that improves inference speed on the premise of maintaining accuracy. Particularly, since the RGB images are aligned with the RGB-D images, the proposed DON6D first uses a two-dimensional detection network to locate the interested objects in RGB-D images. Then, a module of feature extraction and fusion is used to extract color and geometric features fully. Further, dual data augmentation is performed to enhance the generalization ability of the proposed model. Finally, the features are fused, and an attention residual encoder-decoder, which can improve the pose estimation performance to obtain an accurate 6D pose, is introduced. The proposed DON6D model is evaluated on the LINEMOD and YCB-Video datasets. The results demonstrate that the proposed DON6D is superior to several state-of-the-art methods regarding the ADD(-S) and ADD(-S) AUC metrics.

References
1.
He K, Zhang X, Ren S, Sun J . Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans Pattern Anal Mach Intell. 2015; 37(9):1904-16. DOI: 10.1109/TPAMI.2015.2389824. View

2.
Liang G, Chen F, Liang Y, Feng Y, Wang C, Wu X . A Manufacturing-Oriented Intelligent Vision System Based on Deep Neural Network for Object Recognition and 6D Pose Estimation. Front Neurorobot. 2021; 14:616775. PMC: 7817625. DOI: 10.3389/fnbot.2020.616775. View

3.
Xie S . Research on the Industrial Robot Grasping Method Based on Multisensor Data Fusion and Binocular Vision. Comput Intell Neurosci. 2022; 2022:4443100. PMC: 9159861. DOI: 10.1155/2022/4443100. View

4.
Zou L, Huang Z, Gu N, Wang G . 6D-ViT: Category-Level 6D Object Pose Estimation via Transformer-Based Instance Representation Learning. IEEE Trans Image Process. 2022; 31:6907-6921. DOI: 10.1109/TIP.2022.3216980. View