» Articles » PMID: 25164068

Strategies for Imputation to Whole Genome Sequence Using a Single or Multi-breed Reference Population in Cattle

Overview
Journal BMC Genomics
Publisher Biomed Central
Specialty Genetics
Date 2014 Aug 29
PMID 25164068
Citations 68
Authors
Affiliations
Soon will be listed here.
Abstract

Background: The advent of low cost next generation sequencing has made it possible to sequence a large number of dairy and beef bulls which can be used as a reference for imputation of whole genome sequence data. The aim of this study was to investigate the accuracy and speed of imputation from a high density SNP marker panel to whole genome sequence level. Data contained 132 Holstein, 42 Jersey, 52 Nordic Red and 16 Brown Swiss bulls with whole genome sequence data; 16 Holstein, 27 Jersey and 29 Nordic Reds had previously been typed with the bovine high density SNP panel and were used for validation. We investigated the effect of enlarging the reference population by combining data across breeds on the accuracy of imputation, and the accuracy and speed of both IMPUTE2 and BEAGLE using either genotype probability reference data or pre-phased reference data. All analyses were done on Bovine autosome 29 using 387,436 bi-allelic variants and 13,612 SNP markers from the bovine HD panel.

Results: A combined breed reference population led to higher imputation accuracies than did a single breed reference. The highest accuracy of imputation for all three test breeds was achieved when using BEAGLE with un-phased reference data (mean genotype correlations of 0.90, 0.89 and 0.87 for Holstein, Jersey and Nordic Red respectively) but IMPUTE2 with un-phased reference data gave similar accuracies for Holsteins and Nordic Red. Pre-phasing the reference data only lead to a minor decrease in the imputation accuracy, but gave a large improvement in computation time. Pre-phasing with BEAGLE was substantially faster than pre-phasing with SHAPEIT2 (2.5 hours vs. 52 hours for 242 individuals), and imputation with pre-phased data was faster in IMPUTE2 than in BEAGLE (5 minutes vs. 50 minutes per individual).

Conclusion: Combining reference populations across breeds is a good option to increase the size of the reference data and in turn the accuracy of imputation when only few animals are available. Pre-phasing the reference data only slightly decreases the accuracy but gives substantial improvements in speed. Using BEAGLE for pre-phasing and IMPUTE2 for imputation is a fast and accurate strategy.

Citing Articles

Empirical versus estimated accuracy of imputation: optimising filtering thresholds for sequence imputation.

Nguyen T, Bolormaa S, Reich C, Chamberlain A, Vander Jagt C, Daetwyler H Genet Sel Evol. 2024; 56(1):72.

PMID: 39548370 PMC: 11566673. DOI: 10.1186/s12711-024-00942-2.


Meta-analysis of six dairy cattle breeds reveals biologically relevant candidate genes for mastitis resistance.

Cai Z, Iso-Touru T, Sanchez M, Kadri N, Bouwman A, Chitneedi P Genet Sel Evol. 2024; 56(1):54.

PMID: 39009986 PMC: 11247842. DOI: 10.1186/s12711-024-00920-8.


Sequenced-based GWAS for linear classification traits in Belgian Blue beef cattle reveals new coding variants in genes regulating body size in mammals.

Gualdron Duarte J, Yuan C, Gori A, Moreira G, Takeda H, Coppieters W Genet Sel Evol. 2023; 55(1):83.

PMID: 38017417 PMC: 10683324. DOI: 10.1186/s12711-023-00857-4.


Multi-breed genomic evaluation for tropical beef cattle when no pedigree information is available.

Hayes B, Copley J, Dodd E, Ross E, Speight S, Fordyce G Genet Sel Evol. 2023; 55(1):71.

PMID: 37845626 PMC: 10578004. DOI: 10.1186/s12711-023-00847-6.


Imputation to whole-genome sequence and its use in genome-wide association studies for pork colour traits in crossbred and purebred pigs.

Heidaritabar M, Huisman A, Krivushin K, Stothard P, Dervishi E, Charagu P Front Genet. 2022; 13:1022681.

PMID: 36303553 PMC: 9593086. DOI: 10.3389/fgene.2022.1022681.


References
1.
Abecasis G, Altshuler D, Auton A, Brooks L, Durbin R, Gibbs R . A map of human genome variation from population-scale sequencing. Nature. 2010; 467(7319):1061-73. PMC: 3042601. DOI: 10.1038/nature09534. View

2.
Sherry S, Ward M, Kholodov M, Baker J, Phan L, Smigielski E . dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2000; 29(1):308-11. PMC: 29783. DOI: 10.1093/nar/29.1.308. View

3.
Hoze C, Fouilloux M, Venot E, Guillaume F, Dassonneville R, Fritz S . High-density marker imputation accuracy in sixteen French cattle breeds. Genet Sel Evol. 2013; 45:33. PMC: 3846489. DOI: 10.1186/1297-9686-45-33. View

4.
de Roos A, Hayes B, Spelman R, Goddard M . Linkage disequilibrium and persistence of phase in Holstein-Friesian, Jersey and Angus cattle. Genetics. 2008; 179(3):1503-12. PMC: 2475750. DOI: 10.1534/genetics.107.084301. View

5.
Browning B, Browning S . A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet. 2009; 84(2):210-23. PMC: 2668004. DOI: 10.1016/j.ajhg.2009.01.005. View