Is Single-step Genomic REML with the Algorithm for Proven and Young More Computationally Efficient when Less Generations of Data Are Present?

Overview

Journal J Anim Sci

Date 2022 Mar 15

PMID 35289906

Authors

Vinicius Silva Junqueira

Daniela Lourenco

Yutaka Masuda

Fernando Flores Cardoso

Paulo Savio Lopes

Fabyano Fonseca E Silva

Ignacy Misztal

Affiliations

Soon will be listed here.

Abstract

Efficient computing techniques allow the estimation of variance components for virtually any traditional dataset. When genomic information is available, variance components can be estimated using genomic REML (GREML). If only a portion of the animals have genotypes, single-step GREML (ssGREML) is the method of choice. The genomic relationship matrix (G) used in both cases is dense, limiting computations depending on the number of genotyped animals. The algorithm for proven and young (APY) can be used to create a sparse inverse of G (GAPY~-1) with close to linear memory and computing requirements. In ssGREML, the inverse of the realized relationship matrix (H-1) also includes the inverse of the pedigree relationship matrix, which can be dense with a long pedigree, but sparser with short. The main purpose of this study was to investigate whether costs of ssGREML can be reduced using APY with truncated pedigree and phenotypes. We also investigated the impact of truncation on variance components estimation when different numbers of core animals are used in APY. Simulations included 150K animals from 10 generations, with selection. Phenotypes (h2 = 0.3) were available for all animals in generations 1-9. A total of 30K animals in generations 8 and 9, and 15K validation animals in generation 10 were genotyped for 52,890 SNP. Average information REML and ssGREML with G-1 and GAPY~-1 using 1K, 5K, 9K, and 14K core animals were compared. Variance components are impacted when the core group in APY represents the number of eigenvalues explaining a small fraction of the total variation in G. The most time-consuming operation was the inversion of G, with more than 50% of the total time. Next, numerical factorization consumed nearly 30% of the total computing time. On average, a 7% decrease in the computing time for ordering was observed by removing each generation of data. APY can be successfully applied to create the inverse of the genomic relationship matrix used in ssGREML for estimating variance components. To ensure reliable variance component estimation, it is important to use a core size that corresponds to the number of largest eigenvalues explaining around 98% of total variation in G. When APY is used, pedigrees can be truncated to increase the sparsity of H and slightly reduce computing time for ordering and symbolic factorization, with no impact on the estimates.

Citing Articles

Marker effect p-values for single-step GWAS with the algorithm for proven and young in large genotyped populations.

Leite N, Bermann M, Tsuruta S, Misztal I, Lourenco D Genet Sel Evol. 2024; 56(1):59.

PMID: 39174924 PMC: 11340074. DOI: 10.1186/s12711-024-00925-3.

HIBLUP: an integration of statistical models on the BLUP framework for efficient genetic evaluation using big genomic data.

Yin L, Zhang H, Tang Z, Yin D, Fu Y, Yuan X Nucleic Acids Res. 2023; 51(8):3501-3512.

PMID: 36809800 PMC: 10164590. DOI: 10.1093/nar/gkad074.

Reducing computational demands of restricted maximum likelihood estimation with genomic relationship matrices.

Meyer K Genet Sel Evol. 2023; 55(1):7.

PMID: 36698054 PMC: 9875494. DOI: 10.1186/s12711-023-00781-7.

Theoretical accuracy for indirect predictions based on SNP effects from single-step GBLUP.

Garcia A, Aguilar I, Legarra A, Tsuruta S, Misztal I, Lourenco D Genet Sel Evol. 2022; 54(1):66.

PMID: 36162979 PMC: 9513904. DOI: 10.1186/s12711-022-00752-4.

References

Henderson C . Best linear unbiased estimation and prediction under a selection model. Biometrics. 1975; 31(2):423-47. View

Patry C, Ducrocq V . Evidence of biases in genetic evaluations due to genomic preselection in dairy cattle. J Dairy Sci. 2011; 94(2):1011-20. DOI: 10.3168/jds.2010-3804. View

Misztal I, Tsuruta S, Pocrnic I, Lourenco D . Core-dependent changes in genomic predictions using the Algorithm for Proven and Young in single-step genomic best linear unbiased prediction. J Anim Sci. 2020; 98(12). PMC: 7739885. DOI: 10.1093/jas/skaa374. View

Fragomeni B, Lourenco D, Tsuruta S, Masuda Y, Aguilar I, Legarra A . Hot topic: Use of genomic recursions in single-step genomic best linear unbiased predictor (BLUP) with a large number of genotypes. J Dairy Sci. 2015; 98(6):4090-4. DOI: 10.3168/jds.2014-9125. View

Cesarani A, Gaspa G, Correddu F, Cellesi M, Dimauro C, Macciotta N . Genomic selection of milk fatty acid composition in Sarda dairy sheep: Effect of different phenotypes and relationship matrices on heritability and breeding value accuracy. J Dairy Sci. 2019; 102(4):3189-3203. DOI: 10.3168/jds.2018-15333. View

Masuda Y, Baba T, Suzuki M . Application of supernodal sparse factorization and inversion to the estimation of (co)variance components by residual maximum likelihood. J Anim Breed Genet. 2014; 131(3):227-36. DOI: 10.1111/jbg.12058. View

Tsuruta S, Misztal I, Stranden I . Use of the preconditioned conjugate gradient algorithm as a generic solver for mixed-model equations in animal breeding applications. J Anim Sci. 2001; 79(5):1166-72. DOI: 10.2527/2001.7951166x. View

Misztal I . Inexpensive Computation of the Inverse of the Genomic Relationship Matrix in Populations with Small Effective Population Size. Genetics. 2015; 202(2):401-9. PMC: 4788224. DOI: 10.1534/genetics.115.182089. View

Bradford H, Pocrnic I, Fragomeni B, Lourenco D, Misztal I . Selection of core animals in the Algorithm for Proven and Young using a simulation model. J Anim Breed Genet. 2017; 134(6):545-552. DOI: 10.1111/jbg.12276. View

10.

Misztal I, Legarra A, Aguilar I . Using recursion to compute the inverse of the genomic relationship matrix. J Dairy Sci. 2014; 97(6):3943-52. DOI: 10.3168/jds.2013-7752. View

11.

Pocrnic I, Lourenco D, Masuda Y, Misztal I . Dimensionality of genomic information and performance of the Algorithm for Proven and Young for different livestock species. Genet Sel Evol. 2016; 48(1):82. PMC: 5088690. DOI: 10.1186/s12711-016-0261-6. View

12.

Junqueira V, Cardoso F, Oliveira M, Sollero B, Silva F, Lopes P . Use of molecular markers to improve relationship information in the genetic evaluation of beef cattle tick resistance under pedigree-based models. J Anim Breed Genet. 2016; 134(1):14-26. DOI: 10.1111/jbg.12239. View

13.

Hidalgo J, Lourenco D, Tsuruta S, Masuda Y, Miller S, Bermann M . Changes in genomic predictions when new information is added. J Anim Sci. 2021; 99(2). PMC: 7867035. DOI: 10.1093/jas/skab004. View

14.

Lourenco D, Fragomeni B, Tsuruta S, Aguilar I, Zumbach B, Hawken R . Accuracy of estimated breeding values with genomic information on males, females, or both: an example on broiler chicken. Genet Sel Evol. 2015; 47:56. PMC: 4487961. DOI: 10.1186/s12711-015-0137-1. View

15.

Masuda Y, Misztal I, Tsuruta S, Legarra A, Aguilar I, Lourenco D . Implementation of genomic recursions in single-step genomic best linear unbiased predictor for US Holsteins with a large number of genotyped animals. J Dairy Sci. 2016; 99(3):1968-1974. DOI: 10.3168/jds.2015-10540. View

16.

Junqueira V, Lopes P, Lourenco D, Silva F, Cardoso F . Applying the Metafounders Approach for Genomic Evaluation in a Multibreed Beef Cattle Population. Front Genet. 2021; 11:556399. PMC: 7793833. DOI: 10.3389/fgene.2020.556399. View

17.

Lidauer M, Stranden I, Mantysaari E, Poso J, Kettunen A . Solving large test-day models by iteration on data and preconditioned conjugate gradient. J Dairy Sci. 2000; 82(12):2788-96. DOI: 10.3168/jds.S0022-0302(99)75536-0. View

18.

Vandenplas J, Calus M, Ten Napel J . Sparse single-step genomic BLUP in crossbreeding schemes. J Anim Sci. 2018; 96(6):2060-2073. PMC: 6095390. DOI: 10.1093/jas/sky136. View

19.

Masuda Y, Misztal I, Legarra A, Tsuruta S, Lourenco D, Fragomeni B . Technical note: Avoiding the direct inversion of the numerator relationship matrix for genotyped animals in single-step genomic best linear unbiased prediction solved with the preconditioned conjugate gradient. J Anim Sci. 2017; 95(1):49-52. DOI: 10.2527/jas.2016.0699. View

20.

Aguilar I, Misztal I, Johnson D, Legarra A, Tsuruta S, Lawlor T . Hot topic: a unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. J Dairy Sci. 2010; 93(2):743-52. DOI: 10.3168/jds.2009-2730. View