» Articles » PMID: 39854337

Exploiting the Similarity of Dissimilarities for Biomedical Applications and Enhanced Machine Learning

Overview
Date 2025 Jan 24
PMID 39854337
Authors
Affiliations
Soon will be listed here.
Abstract

The "similarity of dissimilarities" is an emerging paradigm in biomedical science with significant implications for protein function prediction, machine learning (ML), and personalized medicine. In protein function prediction, recognizing dissimilarities alongside similarities provides a more detailed understanding of evolutionary processes, allowing for a deeper exploration of regions that influence biological functionality. For ML models, incorporating dissimilarity measures helps avoid misleading results caused by highly correlated or similar data, addressing confounding issues like the Doppelgänger Effect. This leads to more accurate insights and a stronger understanding of complex biological systems. In the realm of personalized AI and precision medicine, the importance of dissimilarities is paramount. Personalized AI builds local models for each sample by identifying a network of neighboring samples. However, if the neighboring samples are too similar, it becomes difficult to identify factors critical to disease onset for the individual, limiting the effectiveness of personalized interventions or treatments. This paper discusses the "similarity of dissimilarities" concept, using protein function prediction, ML, and personalized AI as key examples. Integrating this approach into an analysis allows for the design of better, more meaningful experiments and the development of smarter validation methods, ensuring that the models learn in a meaningful way.

References
1.
Soding J . Protein homology detection by HMM-HMM comparison. Bioinformatics. 2004; 21(7):951-60. DOI: 10.1093/bioinformatics/bti125. View

2.
Saeedinia S, Jahed-Motlagh M, Tafakhori A, Kasabov N . Design of MRI structured spiking neural networks and learning algorithms for personalized modelling, analysis, and prediction of EEG signals. Sci Rep. 2021; 11(1):12064. PMC: 8187669. DOI: 10.1038/s41598-021-90029-5. View

3.
Koonin E . Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet. 2005; 39:309-38. DOI: 10.1146/annurev.genet.39.073003.114725. View

4.
Galperin M, Makarova K, Wolf Y, Koonin E . Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res. 2014; 43(Database issue):D261-9. PMC: 4383993. DOI: 10.1093/nar/gku1223. View

5.
Edgar R, Batzoglou S . Multiple sequence alignment. Curr Opin Struct Biol. 2006; 16(3):368-73. DOI: 10.1016/j.sbi.2006.04.004. View