Characterizing Sources of Uncertainty in IRT Scale Scores
Overview
Authors
Affiliations
Traditional estimators of item response theory (IRT) scale scores ignore uncertainty carried over from the item calibration process, which can lead to incorrect estimates of standard errors of measurement (SEM). Here, we review a variety of approaches that have been applied to this problem and compare them on the basis of their statistical methods and goals. We then elaborate on the particular flexibility and usefulness of a Multiple Imputation (MI) based approach, which can be easily applied to tests with mixed item types and multiple underlying dimensions. This proposed method obtains corrected estimates of individual scale scores, as well as their SEM. Furthermore, this approach enables a more complete characterization of the impact of parameter uncertainty by generating confidence envelopes (intervals) for item tracelines, test information functions, conditional SEM curves, and the marginal reliability coefficient. The MI based approach is illustrated through the analysis of an artificial data set, then applied to data from a large educational assessment. A simulation study was also conducted to examine the relative contribution of item parameter uncertainty to the variability in score estimates under various conditions. We found that the impact of item parameter uncertainty is generally quite small, though there are some conditions under which the uncertainty carried over from item calibration contributes substantially to variability in the scores. This may be the case when the calibration sample is small relative to the number of item parameters to be estimated, or when the IRT model fit to the data is multidimensional.
Liu Y, Wang W Psychometrika. 2023; 89(2):386-410.
PMID: 37973773 DOI: 10.1007/s11336-023-09936-3.
Huang S, Chung S, Cai L Qual Life Res. 2023; 33(3):637-651.
PMID: 37950818 DOI: 10.1007/s11136-023-03551-6.
Measuring individual true change with PROMIS using IRT-based plausible values.
Ho E, Verkuilen J, Fischer F Qual Life Res. 2022; 32(5):1369-1379.
PMID: 36282446 PMC: 10849110. DOI: 10.1007/s11136-022-03264-2.
Evaluating sensitivity to classification uncertainty in latent subgroup effect analyses.
Loh W, Kim J BMC Med Res Methodol. 2022; 22(1):247.
PMID: 36153493 PMC: 9508766. DOI: 10.1186/s12874-022-01720-8.
Improving reliability estimation in cognitive diagnosis modeling.
Schames Kreitchmann R, de la Torre J, Sorrel M, Najera P, Abad F Behav Res Methods. 2022; 55(7):3446-3460.
PMID: 36127563 PMC: 10615987. DOI: 10.3758/s13428-022-01967-5.