BCR CDR3 Length Distributions Differ Between Blood and Spleen and Between Old and Young Patients, and TCR Distributions Can Be Used to Detect Myelodysplastic Syndrome
Overview
Affiliations
Complementarity-determining region 3 (CDR3) is the most hyper-variable region in B cell receptor (BCR) and T cell receptor (TCR) genes, and the most critical structure in antigen recognition and thereby in determining the fates of developing and responding lymphocytes. There are millions of different TCR Vβ chain or BCR heavy chain CDR3 sequences in human blood. Even now, when high-throughput sequencing becomes widely used, CDR3 length distributions (also called spectratypes) are still a much quicker and cheaper method of assessing repertoire diversity. However, distribution complexity and the large amount of information per sample (e.g. 32 distributions of the TCRα chain, and 24 of TCRβ) calls for the use of machine learning tools for full exploration. We have examined the ability of supervised machine learning, which uses computational models to find hidden patterns in predefined biological groups, to analyze CDR3 length distributions from various sources, and distinguish between experimental groups. We found that (a) splenic BCR CDR3 length distributions are characterized by low standard deviations and few local maxima, compared to peripheral blood distributions; (b) healthy elderly people's BCR CDR3 length distributions can be distinguished from those of the young; and (c) a machine learning model based on TCR CDR3 distribution features can detect myelodysplastic syndrome with approximately 93% accuracy. Overall, we demonstrate that using supervised machine learning methods can contribute to our understanding of lymphocyte repertoire diversity.
Liang H, Wang H, Liang M, Zhang X, Dai M, Li H Clin Exp Med. 2025; 25(1):32.
PMID: 39775320 PMC: 11711149. DOI: 10.1007/s10238-024-01537-3.
Shared bias in H chain V-J pairing in naive and memory B cells.
Levi R, Dvorkin S, Louzoun Y Front Immunol. 2023; 14:1166116.
PMID: 37790930 PMC: 10543446. DOI: 10.3389/fimmu.2023.1166116.
Zheng B, Yang Y, Chen L, Wu M, Zhou S iScience. 2022; 25(10):105002.
PMID: 36157582 PMC: 9494237. DOI: 10.1016/j.isci.2022.105002.
Comprehensive analysis of TCR repertoire of COVID-19 patients in different infected stage.
Wang G, Wang Y, Jiang S, Fan W, Mo C, Gong W Genes Genomics. 2022; 44(7):813-822.
PMID: 35567717 PMC: 9107015. DOI: 10.1007/s13258-022-01261-w.
Rodriguez-Caballero A, Fuentes Herrero B, Oliva Ariza G, Criado I, Alcoceba M, Prieto C Front Oncol. 2021; 11:723722.
PMID: 34765543 PMC: 8577851. DOI: 10.3389/fonc.2021.723722.