» Articles » PMID: 31178127

Fast and Accurate Shared Segment Detection and Relatedness Estimation in Un-phased Genetic Data Via TRUFFLE

Overview
Journal Am J Hum Genet
Publisher Cell Press
Specialty Genetics
Date 2019 Jun 11
PMID 31178127
Citations 24
Authors
Affiliations
Soon will be listed here.
Abstract

Relationship estimation and segment detection between individuals is an important aspect of disease gene mapping. Existing methods are either tailored for computational efficiency or require phasing to improve accuracy. We developed TRUFFLE, a method that integrates computational techniques and statistical principles for the identification and visualization of identity-by-descent (IBD) segments using un-phased data. By skipping the haplotype phasing step and, instead, relying on a simpler region-based approach, our method is computationally efficient while maintaining inferential accuracy. In addition, an error model corrects for segment break-ups that occur as a consequence of genotyping errors. TRUFFLE can estimate relatedness for 3.1 million pairs from the 1000 Genomes Project data in a few minutes on a typical laptop computer. Consistent with expectation, we identified only three second cousin or closer pairs across different populations, while commonly used methods identified a large number of such pairs. Similarly, within populations, we identified many fewer related pairs. Compared to methods relying on phased data, TRUFFLE has comparable accuracy but is drastically faster and has fewer broken segments. We also identified specific local genomic regions that are commonly shared within populations, suggesting selection. When applied to pedigree data, we observed 99.6% accuracy in detecting 1 to 5 degree relationships. As genomic datasets become much larger, TRUFFLE can enable disease gene mapping through implicit shared haplotypes by accurate IBD segment detection.

Citing Articles

Comparative Study of Statistical Approaches and SNP Panels to Infer Distant Relationships in Forensic Genetics.

Tillmar A, Kling D Genes (Basel). 2025; 16(2).

PMID: 40004443 PMC: 11855180. DOI: 10.3390/genes16020114.


Scalable analysis of large multi-ancestry biobanks by leveraging sparse ancestry-adjusted sample-relatedness.

Lin X, Dey R, Li X, Li Z Res Sq. 2024; .

PMID: 39606480 PMC: 11601839. DOI: 10.21203/rs.3.rs-5343361/v1.


Evaluation of Four Forensic Investigative Genetic Genealogy Analysis Approaches with Decreased Numbers of SNPs and Increased Genotyping Errors.

Zang Y, Wu E, Li T, Liu J, Wu R, Li R Genes (Basel). 2024; 15(10).

PMID: 39457453 PMC: 11507463. DOI: 10.3390/genes15101329.


Unraveling the genomic diversity and admixture history of captive tigers in the United States.

Armstrong E, Mooney J, Solari K, Kim B, Barsh G, Grant V Proc Natl Acad Sci U S A. 2024; 121(39):e2402924121.

PMID: 39298482 PMC: 11441546. DOI: 10.1073/pnas.2402924121.


Biobank-scale inference of multi-individual identity by descent and gene conversion.

Browning S, Browning B Am J Hum Genet. 2024; 111(4):691-700.

PMID: 38513668 PMC: 11023918. DOI: 10.1016/j.ajhg.2024.02.015.


References
1.
McPeek M, Sun L . Statistical tests for detection of misspecified relationships by use of genome-screen data. Am J Hum Genet. 2000; 66(3):1076-94. PMC: 1288143. DOI: 10.1086/302800. View

2.
Sachidanandam R, Weissman D, Schmidt S, Kakol J, Stein L, Marth G . A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature. 2001; 409(6822):928-33. DOI: 10.1038/35057149. View

3.
Peng B, Kimmel M . simuPOP: a forward-time population genetics simulation environment. Bioinformatics. 2005; 21(18):3686-7. DOI: 10.1093/bioinformatics/bti584. View

4.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira M, Bender D . PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007; 81(3):559-75. PMC: 1950838. DOI: 10.1086/519795. View

5.
Frazer K, Ballinger D, Cox D, Hinds D, Stuve L, Boudreau A . A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007; 449(7164):851-61. PMC: 2689609. DOI: 10.1038/nature06258. View