» Articles » PMID: 20175497

Exploring Nonlinear Feature Space Dimension Reduction and Data Representation in Breast Cadx with Laplacian Eigenmaps and T-SNE

Overview
Journal Med Phys
Specialty Biophysics
Date 2010 Feb 24
PMID 20175497
Citations 57
Authors
Affiliations
Soon will be listed here.
Abstract

Purpose: In this preliminary study, recently developed unsupervised nonlinear dimension reduction (DR) and data representation techniques were applied to computer-extracted breast lesion feature spaces across three separate imaging modalities: Ultrasound (U.S.) with 1126 cases, dynamic contrast enhanced magnetic resonance imaging with 356 cases, and full-field digital mammography with 245 cases. Two methods for nonlinear DR were explored: Laplacian eigenmaps [M. Belkin and P. Niyogi, "Laplacian eigenmaps for dimensionality reduction and data representation," Neural Comput. 15, 1373-1396 (2003)] and t-distributed stochastic neighbor embedding (t-SNE) [L. van der Maaten and G. Hinton, "Visualizing data using t-SNE," J. Mach. Learn. Res. 9, 2579-2605 (2008)].

Methods: These methods attempt to map originally high dimensional feature spaces to more human interpretable lower dimensional spaces while preserving both local and global information. The properties of these methods as applied to breast computer-aided diagnosis (CADx) were evaluated in the context of malignancy classification performance as well as in the visual inspection of the sparseness within the two-dimensional and three-dimensional mappings. Classification performance was estimated by using the reduced dimension mapped feature output as input into both linear and nonlinear classifiers: Markov chain Monte Carlo based Bayesian artificial neural network (MCMC-BANN) and linear discriminant analysis. The new techniques were compared to previously developed breast CADx methodologies, including automatic relevance determination and linear stepwise (LSW) feature selection, as well as a linear DR method based on principal component analysis. Using ROC analysis and 0.632+bootstrap validation, 95% empirical confidence intervals were computed for the each classifier's AUC performance.

Results: In the large U.S. data set, sample high performance results include, AUC0.632+ = 0.88 with 95% empirical bootstrap interval [0.787;0.895] for 13 ARD selected features and AUC0.632+ = 0.87 with interval [0.817;0.906] for four LSW selected features compared to 4D t-SNE mapping (from the original 81D feature space) giving AUC0.632+ = 0.90 with interval [0.847;0.919], all using the MCMC-BANN.

Conclusions: Preliminary results appear to indicate capability for the new methods to match or exceed classification performance of current advanced breast lesion CADx algorithms. While not appropriate as a complete replacement of feature selection in CADx problems, DR techniques offer a complementary approach, which can aid elucidation of additional properties associated with the data. Specifically, the new techniques were shown to possess the added benefit of delivering sparse lower dimensional representations for visual interpretation, revealing intricate data structure of the feature space.

Citing Articles

Assessing Pollution with Heavy Metals and Its Impact on Population Health.

Saliba Y, Barbulescu A Toxics. 2025; 13(1).

PMID: 39853050 PMC: 11768440. DOI: 10.3390/toxics13010052.


Multi-omic insights into molecular mechanism and therapeutic targets in spinocerebellar ataxia type 7.

Ahn S, Jang Y, Jang B, Moon J, Lee W, Park D Mol Ther Nucleic Acids. 2025; 36(1):102414.

PMID: 39817193 PMC: 11733039. DOI: 10.1016/j.omtn.2024.102414.


Camrelizumab plus apatinib for previously treated advanced adrenocortical carcinoma: a single-arm phase 2 trial.

Zhu Y, Wei Z, Wang J, Pei Y, Jin J, Li D Nat Commun. 2024; 15(1):10371.

PMID: 39609453 PMC: 11604670. DOI: 10.1038/s41467-024-54661-9.


A generalizable and easy-to-use COVID-19 stratification model for the next pandemic via immune-phenotyping and machine learning.

He X, Cui X, Zhao Z, Wu R, Zhang Q, Xue L Front Immunol. 2024; 15:1372539.

PMID: 38601145 PMC: 11004273. DOI: 10.3389/fimmu.2024.1372539.


CD57-positive CD8 + T cells define the response to anti-programmed cell death protein-1 immunotherapy in patients with advanced non-small cell lung cancer.

Sun W, Qiu F, Zheng J, Fang L, Qu J, Zhang S NPJ Precis Oncol. 2024; 8(1):25.

PMID: 38297019 PMC: 10830454. DOI: 10.1038/s41698-024-00513-0.


References
1.
Kupinski M, Giger M . Feature selection with limited datasets. Med Phys. 1999; 26(10):2176-82. DOI: 10.1118/1.598821. View

2.
Chen W, Giger M, Bick U, Newstead G . Automatic identification and classification of characteristic kinetic curves of breast lesions on DCE-MRI. Med Phys. 2006; 33(8):2878-87. DOI: 10.1118/1.2210568. View

3.
Pesce L, Metz C . Reliable and computationally efficient maximum-likelihood estimation of "proper" binormal ROC curves. Acad Radiol. 2007; 14(7):814-29. PMC: 2693394. DOI: 10.1016/j.acra.2007.03.012. View

4.
Chen W, Giger M, Lan L, Bick U . Computerized interpretation of breast MRI: investigation of enhancement-variance dynamics. Med Phys. 2004; 31(5):1076-82. DOI: 10.1118/1.1695652. View

5.
Anastasio M, Yoshida H, Nagel R, Nishikawa R, Doi K . A genetic algorithm-based method for optimizing the performance of a computer-aided diagnosis scheme for detection of clustered microcalcifications in mammograms. Med Phys. 1998; 25(9):1613-20. DOI: 10.1118/1.598341. View