Biobank-scale Inference of Multi-individual Identity by Descent and Gene Conversion
Overview
Authors
Affiliations
We present a method for efficiently identifying clusters of identical-by-descent haplotypes in biobank-scale sequence data. Our multi-individual approach enables much more computationally efficient inference of identity by descent (IBD) than approaches that infer pairwise IBD segments and provides locus-specific IBD clusters rather than IBD segments. Our method's computation time, memory requirements, and output size scale linearly with the number of individuals in the dataset. We also present a method for using multi-individual IBD to detect alleles changed by gene conversion. Application of our methods to the autosomal sequence data for 125,361 White British individuals in the UK Biobank detects more than 9 million converted alleles. This is 2,900 times more alleles changed by gene conversion than were detected in a previous analysis of familial data. We estimate that more than 250,000 sequenced probands and a much larger number of additional genomes from multi-generational family members would be required to find a similar number of alleles changed by gene conversion using a family-based approach. Our IBD clustering method is implemented in the open-source ibd-cluster software package.
Mean gene conversion tract length in humans estimated to be 459 bp from UK Biobank sequence data.
Masaki N, Browning S bioRxiv. 2025; .
PMID: 39868294 PMC: 11761487. DOI: 10.1101/2024.12.30.630818.
Complete human recombination maps.
Palsson G, Hardarson M, Jonsson H, Steinthorsdottir V, Stefansson O, Eggertsson H Nature. 2025; .
PMID: 39843742 DOI: 10.1038/s41586-024-08450-5.
Fast simulation of identity-by-descent segments.
Temple S, Browning S, Thompson E bioRxiv. 2025; .
PMID: 39829821 PMC: 11741331. DOI: 10.1101/2024.12.13.628449.
Identity-by-descent segments in large samples.
Temple S, Thompson E bioRxiv. 2024; .
PMID: 38895476 PMC: 11185678. DOI: 10.1101/2024.06.05.597656.