» Articles » PMID: 20075479

Sensitivity Analysis of Kappa-fold Cross Validation in Prediction Error Estimation

Overview
Date 2010 Jan 16
PMID 20075479
Citations 193
Authors
Affiliations
Soon will be listed here.
Abstract

In the machine learning field, the performance of a classifier is usually measured in terms of prediction error. In most real-world problems, the error cannot be exactly calculated and it must be estimated. Therefore, it is important to choose an appropriate estimator of the error. This paper analyzes the statistical properties, bias and variance, of the kappa-fold cross-validation classification error estimator (kappa-cv). Our main contribution is a novel theoretical decomposition of the variance of the kappa-cv considering its sources of variance: sensitivity to changes in the training set and sensitivity to changes in the folds. The paper also compares the bias and variance of the estimator for different values of kappa. The experimental study has been performed in artificial domains because they allow the exact computation of the implied quantities and we can rigorously specify the conditions of experimentation. The experimentation has been performed for two classifiers (naive Bayes and nearest neighbor), different numbers of folds, sample sizes, and training sets coming from assorted probability distributions. We conclude by including some practical recommendation on the use of kappa-fold cross validation.

Citing Articles

Prediction model for type 2 diabetes mellitus and its association with mortality using machine learning in three independent cohorts from South Korea, Japan, and the UK: a model development and validation study.

Lee H, Hwang S, Park S, Choi Y, Lee S, Park J EClinicalMedicine. 2025; 80:103069.

PMID: 39896872 PMC: 11787438. DOI: 10.1016/j.eclinm.2025.103069.


Convolutional neural networks for accurate real-time diagnosis of oral epithelial dysplasia and oral squamous cell carcinoma using high-resolution in vivo confocal microscopy.

Ramani R, Tan I, Bussau L, OReilly L, Silke J, Angel C Sci Rep. 2025; 15(1):2555.

PMID: 39833362 PMC: 11746977. DOI: 10.1038/s41598-025-86400-5.


Improving early prediction of crop yield in Spanish olive groves using satellite imagery and machine learning.

Ramos M, Cubillas J, Cordoba R, Ortega L PLoS One. 2025; 20(1):e0311530.

PMID: 39813256 PMC: 11734994. DOI: 10.1371/journal.pone.0311530.


Blood-Based Epigenetic Biomarkers Associated With Incident Chronic Kidney Disease in Individuals With Type 2 Diabetes.

Marchiori M, Maguolo A, Perfilyev A, Maziarz M, Martinell M, Gomez M Diabetes. 2024; 74(3):439-450.

PMID: 39715581 PMC: 11842608. DOI: 10.2337/db24-0483.


Model-Based Prioritization of Adolescent Girls and Young Women for HIV Prevention Services Based on Data From 13 Sub-Saharan African Countries.

Gutreuter S, Denhard L, Logan J, Blanton J, Cham H J Acquir Immune Defic Syndr. 2024; 98(4):363-371.

PMID: 39705108 PMC: 11839325. DOI: 10.1097/QAI.0000000000003588.