Article: Classification approach for automatic laparoscopic video database organization

Purpose: One of the advantages of minimally invasive surgery (MIS) is that the underlying digitization provides invaluable information regarding the execution of procedures in various patient-specific conditions. However, such information can only be obtained conveniently if the laparoscopic video database comes with semantic annotations, which are typically provided manually by experts. Considering the growing popularity of MIS, manual annotation becomes a laborious and costly task. In this paper, we tackle the problem of laparoscopic video classification, which consists of automatically identifying the type of abdominal surgery performed in a video. In addition to performing classifications on the full recordings of the procedures, we also carry out sub-video and video clip classifications. These classifications are carried out to investigate how many frames from a video are needed to get a good classification performance and which parts of the procedures contain more discriminative features.

Method: Our classification pipeline is as follows. First, we reject the irrelevant frames from the videos using the color properties of the video frames. Second, we extract visual features from the relevant frames. Third, we quantize the features using several feature encoding methods, i.e., vector quantization, sparse coding (SC), and Fisher encoding. Fourth, we carry out the classification using support vector machines. While the sub-video classification is carried out by uniformly downsampling the video frames, the video clip classification is carried out by taking three parts of the videos (i.e., beginning, middle, and end) and running the classification pipeline separately for every video part. Ultimately, we build our final classification model by combining the features using a multiple kernel learning (MKL) approach.

Results: To carry out the experiments, we use a dataset containing 208 videos of eight different surgeries performed by 10 different surgeons. The results show that SC with K-singular value decomposition (K-SVD) yields the best classification accuracy. The results also demonstrate that the classification accuracy only decreases by 3 % when solely 60 % of the video frames are utilized. Furthermore, it is also shown that the end part of the procedures is the most discriminative part of the surgery. Specifically, by using only the last 20 % of the video frames, a classification accuracy greater than 70 % can be achieved. Finally, the combination of all features yields the best performance of 90.38 % accuracy.

Conclusions: The SC with K-SVD provides the best representation of our videos, yielding the best accuracies for all features. In terms of information, the end part of the laparoscopic videos is the most discriminative compared to the other parts of the videos. In addition to their good performance individually, the features yield even better classification results when all of them are combined using the MKL approach.

Citing Articles

Preserving privacy in surgical video analysis using a deep learning classifier to identify out-of-body scenes in endoscopic videos.

Lavanchy J, Vardazaryan A, Mascagni P, Mutter D, Padoy N Sci Rep. 2023; 13(1):9235.

PMID: 37286660 PMC: 10247775. DOI: 10.1038/s41598-023-36453-1.

Multispectral Image under Tissue Classification Algorithm in Screening of Cervical Cancer.

Wang P, Wang S, Zhang Y, Duan X J Healthc Eng. 2022; 2022:9048123.

PMID: 35035863 PMC: 8759862. DOI: 10.1155/2022/9048123.

Utilising an Accelerated Delphi Process to Develop Guidance and Protocols for Telepresence Applications in Remote Robotic Surgery Training.

Collins J, Ghazi A, Stoyanov D, Hung A, Coleman M, Cecil T Eur Urol Open Sci. 2021; 22:23-33.

PMID: 34337475 PMC: 8317899. DOI: 10.1016/j.euros.2020.09.005.

Video content analysis of surgical procedures.

Loukas C Surg Endosc. 2017; 32(2):553-568.

PMID: 29075965 DOI: 10.1007/s00464-017-5878-1.

Shot boundary detection in endoscopic surgery videos using a variational Bayesian framework.

Loukas C, Nikiteas N, Schizas D, Georgiou E Int J Comput Assist Radiol Surg. 2016; 11(11):1937-1949.

PMID: 27289240 DOI: 10.1007/s11548-016-1431-2.

References

1.

Atasoy S, Mateus D, Meining A, Yang G, Navab N . Endoscopic video manifolds for targeted optical biopsy. IEEE Trans Med Imaging. 2011; 31(3):637-53. DOI: 10.1109/TMI.2011.2174252. View

2.

Lalys F, Riffaud L, Bouget D, Jannin P . A framework for the recognition of high-level surgical tasks from video images for cataract surgeries. IEEE Trans Biomed Eng. 2011; 59(4):966-76. PMC: 3432023. DOI: 10.1109/TBME.2011.2181168. View

3.

Reiter A, Allen P, Zhao T . Feature classification for tracking articulated surgical tools. Med Image Comput Comput Assist Interv. 2013; 15(Pt 2):592-600. DOI: 10.1007/978-3-642-33418-4_73. View

4.

Blum T, Feussner H, Navab N . Modeling and segmentation of surgical workflow from laparoscopic video. Med Image Comput Comput Assist Interv. 2010; 13(Pt 3):400-7. DOI: 10.1007/978-3-642-15711-0_50. View

5.

Zappella L, Bejar B, Hager G, Vidal R . Surgical gesture classification from video and kinematic data. Med Image Anal. 2013; 17(7):732-45. DOI: 10.1016/j.media.2013.04.007. View

Classification Approach for Automatic Laparoscopic Video Database Organization