KSL-POSE: A Real-Time 2D Human Pose Estimation Method Based on Modified YOLOv8-Pose Framework

Overview

Journal Sensors (Basel)

Publisher MDPI

Specialty Biotechnology

Date 2024 Oct 16

PMID 39409288

Authors

Tianyi Lu

Ke Cheng

Xuecheng Hua

Suning Qin

Affiliations

Soon will be listed here.

Abstract

Two-dimensional human pose estimation aims to equip computers with the ability to accurately recognize human keypoints and comprehend their spatial contexts within media content. However, the accuracy of real-time human pose estimation diminishes when processing images with occluded body parts or overlapped individuals. To address these issues, we propose a method based on the YOLO framework. We integrate the convolutional concepts of Kolmogorov-Arnold Networks (KANs) through introducing non-linear activation functions to enhance the feature extraction capabilities of the convolutional kernels. Moreover, to improve the detection of small target keypoints, we integrate the cross-stage partial (CSP) approach and utilize the small object enhance pyramid (SOEP) module for feature integration. We also innovatively incorporate a layered shared convolution with batch normalization detection head (LSCB), consisting of multiple shared convolutional layers and batch normalization layers, to enable cross-stage feature fusion and address the low utilization of model parameters. Given the structure and purpose of the proposed model, we name it KSL-POSE. Compared to the baseline model YOLOv8l-POSE, KSL-POSE achieves significant improvements, increasing the average detection accuracy by 1.5% on the public MS COCO 2017 data set. Furthermore, the model also demonstrates competitive performance on the CrowdPOSE data set, thus validating its generalization ability.

Citing Articles

MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and Mamba.

Zhang J, Hou J, He Q, Yuan Z, Xue H Sensors (Basel). 2025; 24(24.

PMID: 39771893 PMC: 11679066. DOI: 10.3390/s24248158.

References

Donahue J, Hendricks L, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K . Long-Term Recurrent Convolutional Networks for Visual Recognition and Description. IEEE Trans Pattern Anal Mach Intell. 2016; 39(4):677-691. DOI: 10.1109/TPAMI.2016.2599174. View

Li Y, Jia S, Li Q . BalanceHRNet: An effective network for bottom-up human pose estimation. Neural Netw. 2023; 161:297-305. DOI: 10.1016/j.neunet.2023.01.036. View