Characterizing Sources of Uncertainty in IRT Scale Scores

Overview

Journal Educ Psychol Meas

Publisher Sage Publications

Date 2012 Oct 11

PMID 23049139

Citations 17

Authors

Ji Seung Yang

Mark Hansen

Li Cai

Affiliations

Soon will be listed here.

Abstract

Traditional estimators of item response theory (IRT) scale scores ignore uncertainty carried over from the item calibration process, which can lead to incorrect estimates of standard errors of measurement (SEM). Here, we review a variety of approaches that have been applied to this problem and compare them on the basis of their statistical methods and goals. We then elaborate on the particular flexibility and usefulness of a Multiple Imputation (MI) based approach, which can be easily applied to tests with mixed item types and multiple underlying dimensions. This proposed method obtains corrected estimates of individual scale scores, as well as their SEM. Furthermore, this approach enables a more complete characterization of the impact of parameter uncertainty by generating confidence envelopes (intervals) for item tracelines, test information functions, conditional SEM curves, and the marginal reliability coefficient. The MI based approach is illustrated through the analysis of an artificial data set, then applied to data from a large educational assessment. A simulation study was also conducted to examine the relative contribution of item parameter uncertainty to the variability in score estimates under various conditions. We found that the impact of item parameter uncertainty is generally quite small, though there are some conditions under which the uncertainty carried over from item calibration contributes substantially to variability in the scores. This may be the case when the calibration sample is small relative to the number of item parameters to be estimated, or when the IRT model fit to the data is multidimensional.

Citing Articles

What Can We Learn from a Semiparametric Factor Analysis of Item Responses and Response Time? An Illustration with the PISA 2015 Data.

Liu Y, Wang W Psychometrika. 2023; 89(2):386-410.

PMID: 37973773 DOI: 10.1007/s11336-023-09936-3.

A random item effects generalized partial credit model with a multiple imputation-based scoring procedure.

Huang S, Chung S, Cai L Qual Life Res. 2023; 33(3):637-651.

PMID: 37950818 DOI: 10.1007/s11136-023-03551-6.

Measuring individual true change with PROMIS using IRT-based plausible values.

Ho E, Verkuilen J, Fischer F Qual Life Res. 2022; 32(5):1369-1379.

PMID: 36282446 PMC: 10849110. DOI: 10.1007/s11136-022-03264-2.

Evaluating sensitivity to classification uncertainty in latent subgroup effect analyses.

Loh W, Kim J BMC Med Res Methodol. 2022; 22(1):247.

PMID: 36153493 PMC: 9508766. DOI: 10.1186/s12874-022-01720-8.

Improving reliability estimation in cognitive diagnosis modeling.

Schames Kreitchmann R, de la Torre J, Sorrel M, Najera P, Abad F Behav Res Methods. 2022; 55(7):3446-3460.

PMID: 36127563 PMC: 10615987. DOI: 10.3758/s13428-022-01967-5.

References

Cai L . SEM of another flavour: two new applications of the supplemented EM algorithm. Br J Math Stat Psychol. 2007; 61(Pt 2):309-29. DOI: 10.1348/000711007X249603. View

Cheng Y, Yuan K . THE IMPACT OF FALLIBLE ITEM PARAMETER ESTIMATES ON LATENT TRAIT RECOVERY. Psychometrika. 2010; 75(2):280-291. PMC: 2976519. DOI: 10.1007/s11336-009-9144-x. View

Cai L, Yang J, Hansen M . Generalized full-information item bifactor analysis. Psychol Methods. 2011; 16(3):221-48. PMC: 3150629. DOI: 10.1037/a0023350. View