» Articles » PMID: 27436121

Systematic Bias of Correlation Coefficient May Explain Negative Accuracy of Genomic Prediction

Overview
Journal Brief Bioinform
Specialty Biology
Date 2016 Jul 21
PMID 27436121
Citations 10
Authors
Affiliations
Soon will be listed here.
Abstract

Accuracy of genomic prediction is commonly calculated as the Pearson correlation coefficient between the predicted and observed phenotypes in the inference population by using cross-validation analysis. More frequently than expected, significant negative accuracies of genomic prediction have been reported in genomic selection studies. These negative values are surprising, given that the minimum value for prediction accuracy should hover around zero when randomly permuted data sets are analyzed. We reviewed the two common approaches for calculating the Pearson correlation and hypothesized that these negative accuracy values reflect potential bias owing to artifacts caused by the mathematical formulas used to calculate prediction accuracy. The first approach, Instant accuracy, calculates correlations for each fold and reports prediction accuracy as the mean of correlations across fold. The other approach, Hold accuracy, predicts all phenotypes in all fold and calculates correlation between the observed and predicted phenotypes at the end of the cross-validation process. Using simulated and real data, we demonstrated that our hypothesis is true. Both approaches are biased downward under certain conditions. The biases become larger when more fold are employed and when the expected accuracy is low. The bias of Instant accuracy can be corrected using a modified formula.

Citing Articles

Genome-Wide Association Mapping of Macronutrient Mineral Accumulation in Wheat ( L.) Grain.

Aljabri M, El-Soda M Plants (Basel). 2025; 13(24.

PMID: 39771170 PMC: 11728464. DOI: 10.3390/plants13243472.


Asymptotic Properties of Matthews Correlation Coefficient.

Itaya Y, Tamura J, Hayashi K, Yamamoto K Stat Med. 2024; 44(1-2):e10303.

PMID: 39682035 PMC: 11702320. DOI: 10.1002/sim.10303.


Genomic Prediction Strategies for Dry-Down-Related Traits in Maize.

Ni P, Anche M, Ruan Y, Dang D, Morales N, Li L Front Plant Sci. 2022; 13:930429.

PMID: 35845649 PMC: 9280646. DOI: 10.3389/fpls.2022.930429.


A comparative analysis of genomic and phenomic predictions of growth-related traits in 3-way coffee hybrids.

Mbebi A, Breitler J, Bordeaux M, Sulpice R, McHale M, Tong H G3 (Bethesda). 2022; 12(9).

PMID: 35792875 PMC: 9434219. DOI: 10.1093/g3journal/jkac170.


Genomic inbreeding and population structure of northern pike () in Xinjiang, China.

Luan P, Huo T, Ma B, Song D, Zhang X, Hu G Ecol Evol. 2021; 11(10):5657-5668.

PMID: 34026037 PMC: 8131772. DOI: 10.1002/ece3.7469.