» Articles » PMID: 15327684

Reliability: on the Reproducibility of Assessment Data

Overview
Journal Med Educ
Specialty Medical Education
Date 2004 Aug 26
PMID 15327684
Citations 130
Authors
Affiliations
Soon will be listed here.
Abstract

Context: All assessment data, like other scientific experimental data, must be reproducible in order to be meaningfully interpreted.

Purpose: The purpose of this paper is to discuss applications of reliability to the most common assessment methods in medical education. Typical methods of estimating reliability are discussed intuitively and non-mathematically.

Summary: Reliability refers to the consistency of assessment outcomes. The exact type of consistency of greatest interest depends on the type of assessment, its purpose and the consequential use of the data. Written tests of cognitive achievement look to internal test consistency, using estimation methods derived from the test-retest design. Rater-based assessment data, such as ratings of clinical performance on the wards, require interrater consistency or agreement. Objective structured clinical examinations, simulated patient examinations and other performance-type assessments generally require generalisability theory analysis to account for various sources of measurement error in complex designs and to estimate the consistency of the generalisations to a universe or domain of skills.

Conclusions: Reliability is a major source of validity evidence for assessments. Low reliability indicates that large variations in scores can be expected upon retesting. Inconsistent assessment scores are difficult or impossible to interpret meaningfully and thus reduce validity evidence. Reliability coefficients allow the quantification and estimation of the random errors of measurement in assessments, such that overall assessment can be improved.

Citing Articles

Maternal knowledge of pediatric first aid in Riyadh: Addressing gaps for improved child safety and women's health outcomes.

Alwasedi A, Al-Wathinani A, Gomez-Salgado J, Abahussain M, Alnajada A, Goniewicz K Medicine (Baltimore). 2025; 104(7):e41611.

PMID: 39960898 PMC: 11835115. DOI: 10.1097/MD.0000000000041611.


Evaluating the multiple-choice questions quality at the College of Medicine, University of Bisha, Saudi Arabia: a three-year experience.

Eleragi A, Miskeen E, Hussein K, Rezigalla A, Adam M, Al-Faifi J BMC Med Educ. 2025; 25(1):233.

PMID: 39948528 PMC: 11827222. DOI: 10.1186/s12909-025-06700-2.


Beyond reliability: assessing rater competence when using a behavioural marker system.

Smith S, McColgan-Smith S, Stewart F, Mardon J, Tallentire V Adv Simul (Lond). 2024; 9(1):55.

PMID: 39736776 PMC: 11687013. DOI: 10.1186/s41077-024-00329-9.


Test-Retest Reliability of a Physical Activity Behavior, Health and Wellbeing Questionnaire in Adolescents.

Rocliffe P, Sherwin I, Mannix-McNamara P, MacDonncha C, T O Keeffe B Open Res Eur. 2024; 3:154.

PMID: 39246696 PMC: 11380079. DOI: 10.12688/openreseurope.16535.3.


Whole Day Workload: Evaluation of a New Outcome Measure in Occupational Therapy for Adults With Type 1 Diabetes.

Hernandez R, Schneider S, Jin H, Hoogendoorn C, Lee P, Pham L Am J Occup Ther. 2024; 78(5).

PMID: 39029102 PMC: 11881070. DOI: 10.5014/ajot.2024.050527.