» Articles » PMID: 31093548

The Brier Score Does Not Evaluate the Clinical Utility of Diagnostic Tests or Prediction Models

Overview
Journal Diagn Progn Res
Publisher Biomed Central
Date 2019 May 17
PMID 31093548
Citations 37
Authors
Affiliations
Soon will be listed here.
Abstract

Background: A variety of statistics have been proposed as tools to help investigators assess the value of diagnostic tests or prediction models. The Brier score has been recommended on the grounds that it is a proper scoring rule that is affected by both discrimination and calibration. However, the Brier score is prevalence dependent in such a way that the rank ordering of tests or models may inappropriately vary by prevalence.

Methods: We explored four common clinical scenarios: comparison of a highly accurate binary test with a continuous prediction model of moderate predictiveness; comparison of two binary tests where the importance of sensitivity versus specificity is inversely associated with prevalence; comparison of models and tests to default strategies of assuming that all or no patients are positive; and comparison of two models with miscalibration in opposite directions.

Results: In each case, we found that the Brier score gave an inappropriate rank ordering of the tests and models. Conversely, net benefit, a decision-analytic measure, gave results that always favored the preferable test or model.

Conclusions: Brier score does not evaluate clinical value of diagnostic tests or prediction models. We advocate, as an alternative, the use of decision-analytic measures such as net benefit.

Trial Registration: Not applicable.

Citing Articles

Machine learning approaches for risk prediction after percutaneous coronary intervention: a systematic review and meta-analysis.

Zaka A, Mutahar D, Gorcilov J, Gupta A, Kovoor J, Stretton B Eur Heart J Digit Health. 2025; 6(1):23-44.

PMID: 39846069 PMC: 11750198. DOI: 10.1093/ehjdh/ztae074.


Machine-learning versus traditional methods for prediction of all-cause mortality after transcatheter aortic valve implantation: a systematic review and meta-analysis.

Zaka A, Mustafiz C, Mutahar D, Sinhal S, Gorcilov J, Muston B Open Heart. 2025; 12(1).

PMID: 39842939 PMC: 11784135. DOI: 10.1136/openhrt-2024-002779.


Predictive modeling and interpretative analysis of risks of instability in patients with Myasthenia Gravis requiring intensive care unit admission.

Kuo C, Su E, Yeh H, Yeh J, Chiu H, Chung C Heliyon. 2025; 10(24):e41084.

PMID: 39759343 PMC: 11700255. DOI: 10.1016/j.heliyon.2024.e41084.


Pregnancy-Associated Plasma Protein A (PAPP-A) as a Predictor of Third Trimester Obesity: Insights from the CRIOBES Project.

Gabaldon-Rodriguez I, de Francisco-Montero C, Menendez-Moreno I, Balongo-Molina A, Gomez-Lorenzo A, Rodriguez-Garcia R Pathophysiology. 2024; 31(4):631-642.

PMID: 39585163 PMC: 11587435. DOI: 10.3390/pathophysiology31040046.


Machine Learning Algorithms Can Be Reliably Leveraged to Identify Patients at High Risk of Prolonged Postoperative Opioid Use Following Orthopedic Surgery: A Systematic Review.

Krivicich L, Jan K, Kunze K, Rice M, Nho S HSS J. 2024; 20(4):589-599.

PMID: 39479504 PMC: 11520020. DOI: 10.1177/15563316231164138.


References
1.
Baker S . The central role of receiver operating characteristic (ROC) curves in evaluating tests for the early detection of cancer. J Natl Cancer Inst. 2003; 95(7):511-5. DOI: 10.1093/jnci/95.7.511. View

2.
Vickers A, Elkin E . Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making. 2006; 26(6):565-74. PMC: 2577036. DOI: 10.1177/0272989X06295361. View

3.
Cook N . Statistical evaluation of prognostic versus diagnostic models: beyond the ROC curve. Clin Chem. 2007; 54(1):17-23. DOI: 10.1373/clinchem.2007.096529. View

4.
Pencina M, DAgostino R, Vasan R . Statistical methods for assessment of added usefulness of new biomarkers. Clin Chem Lab Med. 2010; 48(12):1703-11. PMC: 3155999. DOI: 10.1515/CCLM.2010.340. View

5.
la Cour Freiesleben N, Gerds T, Forman J, Silver J, Nyboe Andersen A, Popovic-Todorovic B . Risk charts to identify low and excessive responders among first-cycle IVF/ICSI standard patients. Reprod Biomed Online. 2010; 22(1):50-8. DOI: 10.1016/j.rbmo.2010.08.010. View