» Articles » PMID: 36174101

Generative and Interpretable Machine Learning for Aptamer Design and Analysis of in Vitro Sequence Selection

Overview
Specialty Biology
Date 2022 Sep 29
PMID 36174101
Authors
Affiliations
Soon will be listed here.
Abstract

Selection protocols such as SELEX, where molecules are selected over multiple rounds for their ability to bind to a target of interest, are popular methods for obtaining binders for diagnostic and therapeutic purposes. We show that Restricted Boltzmann Machines (RBMs), an unsupervised two-layer neural network architecture, can successfully be trained on sequence ensembles from single rounds of SELEX experiments for thrombin aptamers. RBMs assign scores to sequences that can be directly related to their fitnesses estimated through experimental enrichment ratios. Hence, RBMs trained from sequence data at a given round can be used to predict the effects of selection at later rounds. Moreover, the parameters of the trained RBMs are interpretable and identify functional features contributing most to sequence fitness. To exploit the generative capabilities of RBMs, we introduce two different training protocols: one taking into account sequence counts, capable of identifying the few best binders, and another based on unique sequences only, generating more diverse binders. We then use RBMs model to generate novel aptamers with putative disruptive mutations or good binding properties, and validate the generated sequences with gel shift assay experiments. Finally, we compare the RBM's performance with different supervised learning approaches that include random forests and several deep neural network architectures.

Citing Articles

Combination of Coevolutionary Information and Supervised Learning Enables Generation of Cyclic Peptide Inhibitors with Enhanced Potency from a Small Data Set.

Mazzocato Y, Frasson N, Sample M, Fregonese C, Pavan A, Caregnato A ACS Cent Sci. 2024; 10(12):2242-2252.

PMID: 39735311 PMC: 11672547. DOI: 10.1021/acscentsci.4c01428.


Machine Learning for RNA Design: LEARNA.

Runge F, Hutter F Methods Mol Biol. 2024; 2847:63-93.

PMID: 39312137 DOI: 10.1007/978-1-0716-4079-1_5.


Computational Frontiers in Aptamer-Based Nanomedicine for Precision Therapeutics: A Comprehensive Review.

Kumar S, Mohan A, Sharma N, Kumar A, Girdhar M, Malik T ACS Omega. 2024; 9(25):26838-26862.

PMID: 38947800 PMC: 11209897. DOI: 10.1021/acsomega.4c02466.


Inference of annealed protein fitness landscapes with AnnealDCA.

Sesta L, Pagnani A, Fernandez-de-Cossio-Diaz J, Uguzzoni G PLoS Comput Biol. 2024; 20(2):e1011812.

PMID: 38377054 PMC: 10878520. DOI: 10.1371/journal.pcbi.1011812.


ACIDES: on-line monitoring of forward genetic screens for protein engineering.

Nemoto T, Ocari T, Planul A, Tekinsoy M, Zin E, Dalkara D Nat Commun. 2023; 14(1):8504.

PMID: 38148337 PMC: 10751290. DOI: 10.1038/s41467-023-43967-9.


References
1.
Roussel C, Cocco S, Monasson R . Barriers and dynamical paths in alternating Gibbs sampling of restricted Boltzmann machines. Phys Rev E. 2021; 104(3-1):034109. DOI: 10.1103/PhysRevE.104.034109. View

2.
Domenyuk V, Gatalica Z, Santhanam R, Wei X, Stark A, Kennedy P . Poly-ligand profiling differentiates trastuzumab-treated breast cancer patients according to their outcomes. Nat Commun. 2018; 9(1):1219. PMC: 5865185. DOI: 10.1038/s41467-018-03631-z. View

3.
Morcos F, Pagnani A, Lunt B, Bertolino A, Marks D, Sander C . Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci U S A. 2011; 108(49):E1293-301. PMC: 3241805. DOI: 10.1073/pnas.1111471108. View

4.
Tubiana J, Cocco S, Monasson R . Learning protein constitutive motifs from sequence data. Elife. 2019; 8. PMC: 6436896. DOI: 10.7554/eLife.39397. View

5.
Pressman A, Liu Z, Janzen E, Blanco C, Muller U, Joyce G . Mapping a Systematic Ribozyme Fitness Landscape Reveals a Frustrated Evolutionary Network for Self-Aminoacylating RNA. J Am Chem Soc. 2019; 141(15):6213-6223. PMC: 6548421. DOI: 10.1021/jacs.8b13298. View