» Articles » PMID: 37958843

Deep Learning for Genomics: From Early Neural Nets to Modern Large Language Models

Overview
Journal Int J Mol Sci
Publisher MDPI
Date 2023 Nov 14
PMID 37958843
Authors
Affiliations
Soon will be listed here.
Abstract

The data explosion driven by advancements in genomic research, such as high-throughput sequencing techniques, is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in various fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning, since we expect a superhuman intelligence that explores beyond our knowledge to interpret the genome from deep learning. A powerful deep learning model should rely on the insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with proper deep learning-based architecture, and we remark on practical considerations of developing deep learning architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research and point out current challenges and potential research directions for future genomics applications. We believe the collaborative use of ever-growing diverse data and the fast iteration of deep learning models will continue to contribute to the future of genomics.

Citing Articles

A review of AI-based radiogenomics in neurodegenerative disease.

Liu H, Zhang X, Liu Q Front Big Data. 2025; 8:1515341.

PMID: 40052173 PMC: 11882605. DOI: 10.3389/fdata.2025.1515341.


Looking Forward to AI and Medicine: Where Are We, and Where Are We Going?.

Wilcox A, Griffith M, Griffith O Mo Med. 2025; 122(1):34-38.

PMID: 39958602 PMC: 11827648.


ProPr54 web server: predicting σ promoters and regulon with a hybrid convolutional and recurrent deep neural network.

Achterberg T, de Jong A NAR Genom Bioinform. 2025; 7(1):lqae188.

PMID: 39781509 PMC: 11704786. DOI: 10.1093/nargab/lqae188.


SpecGMM: Integrating Spectral analysis and Gaussian Mixture Models for taxonomic classification and identification of discriminative DNA regions.

Jaiswal S, Murthy H, Narayanan M Bioinform Adv. 2024; 4(1):vbae171.

PMID: 39659586 PMC: 11631429. DOI: 10.1093/bioadv/vbae171.


Sub-sampling graph neural networks for genomic prediction of quantitative phenotypes.

Kihlman R, Launonen I, Sillanpaa M, Waldmann P G3 (Bethesda). 2024; 14(11).

PMID: 39250757 PMC: 11540326. DOI: 10.1093/g3journal/jkae216.

References
1.
Stormo G . DNA binding sites: representation and discovery. Bioinformatics. 2000; 16(1):16-23. DOI: 10.1093/bioinformatics/16.1.16. View

2.
Lim P, Hardy K, Bunting K, Ma L, Peng K, Chen X . Defining the chromatin signature of inducible genes in T cells. Genome Biol. 2009; 10(10):R107. PMC: 2784322. DOI: 10.1186/gb-2009-10-10-r107. View

3.
Stephens Z, Lee S, Faghri F, Campbell R, Zhai C, Efron M . Big Data: Astronomical or Genomical?. PLoS Biol. 2015; 13(7):e1002195. PMC: 4494865. DOI: 10.1371/journal.pbio.1002195. View

4.
Rost B, Sander C . Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol. 1993; 232(2):584-99. DOI: 10.1006/jmbi.1993.1413. View

5.
Koo P, Majdandzic A, Ploenzke M, Anand P, Paul S . Global importance analysis: An interpretability method to quantify importance of genomic features in deep neural networks. PLoS Comput Biol. 2021; 17(5):e1008925. PMC: 8118286. DOI: 10.1371/journal.pcbi.1008925. View