» Articles » PMID: 30657870

OLGA: Fast Computation of Generation Probabilities of B- and T-cell Receptor Amino Acid Sequences and Motifs

Overview
Journal Bioinformatics
Specialty Biology
Date 2019 Jan 19
PMID 30657870
Citations 99
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: High-throughput sequencing of large immune repertoires has enabled the development of methods to predict the probability of generation by V(D)J recombination of T- and B-cell receptors of any specific nucleotide sequence. These generation probabilities are very non-homogeneous, ranging over 20 orders of magnitude in real repertoires. Since the function of a receptor really depends on its protein sequence, it is important to be able to predict this probability of generation at the amino acid level. However, brute-force summation over all the nucleotide sequences with the correct amino acid translation is computationally intractable. The purpose of this paper is to present a solution to this problem.

Results: We use dynamic programming to construct an efficient and flexible algorithm, called OLGA (Optimized Likelihood estimate of immunoGlobulin Amino-acid sequences), for calculating the probability of generating a given CDR3 amino acid sequence or motif, with or without V/J restriction, as a result of V(D)J recombination in B or T cells. We apply it to databases of epitope-specific T-cell receptors to evaluate the probability that a typical human subject will possess T cells responsive to specific disease-associated epitopes. The model prediction shows an excellent agreement with published data. We suggest that OLGA may be a useful tool to guide vaccine design.

Availability And Implementation: Source code is available at https://github.com/zsethna/OLGA.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

Simulation of adaptive immune receptors and repertoires with complex immune information to guide the development and benchmarking of AIRR machine learning.

Chernigovskaya M, Pavlovic M, Kanduri C, Gielis S, Robert P, Scheffer L Nucleic Acids Res. 2025; 53(3).

PMID: 39873270 PMC: 11773363. DOI: 10.1093/nar/gkaf025.


T cell receptor-centric perspective to multimodal single-cell data analysis.

Mullan K, Ha M, Valkiers S, de Vrij N, Ogunjimi B, Laukens K Sci Adv. 2024; 10(48):eadr3196.

PMID: 39612336 PMC: 11606500. DOI: 10.1126/sciadv.adr3196.


Local and Global Variability in Developing Human T-Cell Repertoires.

Isacchini G, Quiniou V, Barennes P, Mhanna V, Vantomme H, Stys P PRX Life. 2024; 2(1).

PMID: 39582620 PMC: 11583800. DOI: 10.1103/prxlife.2.013011.


An unbiased comparison of immunoglobulin sequence aligners.

Konstantinovsky T, Peres A, Polak P, Yaari G Brief Bioinform. 2024; 25(6).

PMID: 39489605 PMC: 11531861. DOI: 10.1093/bib/bbae556.


The Type 1 Diabetes T Cell Receptor and B Cell Receptor Repository in the AIRR Data Commons: a practical guide for access, use and contributions through the Type 1 Diabetes AIRR Consortium.

Hanna S, Bonami R, Corrie B, Westley M, Posgai A, Luning Prak E Diabetologia. 2024; 68(1):186-202.

PMID: 39467874 PMC: 11663175. DOI: 10.1007/s00125-024-06298-y.


References
1.
Venturi V, Chin H, Price D, Douek D, Davenport M . The role of production frequency in the sharing of simian immunodeficiency virus-specific CD8+ TCRs between macaques. J Immunol. 2008; 181(4):2597-609. DOI: 10.4049/jimmunol.181.4.2597. View

2.
Weinstein J, Jiang N, White 3rd R, Fisher D, Quake S . High-throughput sequencing of the zebrafish antibody repertoire. Science. 2009; 324(5928):807-10. PMC: 3086368. DOI: 10.1126/science.1170020. View

3.
Freeman J, Warren R, Webb J, Nelson B, Holt R . Profiling the T-cell receptor beta-chain repertoire by massively parallel sequencing. Genome Res. 2009; 19(10):1817-24. PMC: 2765271. DOI: 10.1101/gr.092924.109. View

4.
Robins H, Campregher P, Srivastava S, Wacher A, Turtle C, Kahsai O . Comprehensive assessment of T-cell receptor beta-chain diversity in alphabeta T cells. Blood. 2009; 114(19):4099-107. PMC: 2774550. DOI: 10.1182/blood-2009-04-217604. View

5.
Wang C, Sanders C, Yang Q, Schroeder Jr H, Wang E, Babrzadeh F . High throughput sequencing reveals a complex pattern of dynamic interrelationships among human T cell subsets. Proc Natl Acad Sci U S A. 2010; 107(4):1518-23. PMC: 2824416. DOI: 10.1073/pnas.0913939107. View