Article: A Manufacturing-Oriented Intelligent Vision System Based on Deep Neural Network for Object Recognition and 6D Pose Estimation

Nowadays, intelligent robots are widely applied in the manufacturing industry, in various working places or assembly lines. In most manufacturing tasks, determining the category and pose of parts is important, yet challenging, due to complex environments. This paper presents a new two-stage intelligent vision system based on a deep neural network with RGB-D image inputs for object recognition and 6D pose estimation. A dense-connected network fusing multi-scale features is first built to segment the objects from the background. The 2D pixels and 3D points in cropped object regions are then fed into a pose estimation network to make object pose predictions based on fusion of color and geometry features. By introducing the channel and position attention modules, the pose estimation network presents an effective feature extraction method, by stressing important features whilst suppressing unnecessary ones. Comparative experiments with several state-of-the-art networks conducted on two well-known benchmark datasets, YCB-Video and LineMOD, verified the effectiveness and superior performance of the proposed method. Moreover, we built a vision-guided robotic grasping system based on the proposed method using a Kinova Jaco2 manipulator with an RGB-D camera installed. Grasping experiments proved that the robot system can effectively implement common operations such as picking up and moving objects, thereby demonstrating its potential to be applied in all kinds of real-time manufacturing applications.

Citing Articles

A Real-Time Approach for Assessing Rodent Engagement in a Nose-Poking Go/No-Go Behavioral Task Using ArUco Markers.

Smith T, Smith T, Faruk F, Bendea M, Kumara S, Capadona J Bio Protoc. 2024; 14(21):e5098.

PMID: 39525969 PMC: 11543608. DOI: 10.21769/BioProtoc.5098.

Dexterous Manipulation Based on Object Recognition and Accurate Pose Estimation Using RGB-D Data.

Manawadu U, Keitaro N Sensors (Basel). 2024; 24(21).

PMID: 39517721 PMC: 11548730. DOI: 10.3390/s24216823.

DON6D: a decoupled one-stage network for 6D pose estimation.

Wang Z, Tu H, Qian Y, Zhao Y Sci Rep. 2024; 14(1):8410.

PMID: 38600244 PMC: 11385229. DOI: 10.1038/s41598-024-59152-x.

Real-Time Assessment of Rodent Engagement Using ArUco Markers: A Scalable and Accessible Approach for Scoring Behavior in a Nose-Poking Go/No-Go Task.

Smith T, Smith T, Faruk F, Bendea M, Kumara S, Capadona J eNeuro. 2024; 11(3).

PMID: 38351132 PMC: 11046262. DOI: 10.1523/ENEURO.0500-23.2024.

DOPE++: 6D pose estimation algorithm for weakly textured objects based on deep neural networks.

Jin M, Li J, Zhang L PLoS One. 2022; 17(6):e0269175.

PMID: 35675352 PMC: 9176784. DOI: 10.1371/journal.pone.0269175.

References

1.

Shelhamer E, Long J, Darrell T . Fully Convolutional Networks for Semantic Segmentation. IEEE Trans Pattern Anal Mach Intell. 2016; 39(4):640-651. DOI: 10.1109/TPAMI.2016.2572683. View

2.

Badrinarayanan V, Kendall A, Cipolla R . SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans Pattern Anal Mach Intell. 2017; 39(12):2481-2495. DOI: 10.1109/TPAMI.2016.2644615. View

3.

Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille A . DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans Pattern Anal Mach Intell. 2017; 40(4):834-848. DOI: 10.1109/TPAMI.2017.2699184. View