Sample Size Requirements for Training to a Kappa Agreement Criterion on Clinical Dementia Ratings

Overview

Journal Alzheimer Dis Assoc Disord

Specialties Neurology
Psychiatry

Date 2010 May 18

PMID 20473138

Citations 8

Authors

Rochelle E Tractenberg

Futoshi Yumoto

Shelia Jin

John C Morris

Affiliations

Soon will be listed here.

Abstract

The Clinical Dementia Rating (CDR) is a valid and reliable global measure of dementia severity. Diagnosis and transition across stages hinge on its consistent administration. Reports of CDR ratings reliability have been based on 1 or 2 test cases at each severity level; agreement (kappa) statistics based on so few rated cases have large error, and confidence intervals are incorrect. Simulations varied the number of test cases, and their distribution across CDR stage; to derive the sample size yielding a 95% confidence that estimated is at least 0.60. We found that testing raters on 5 or more patients per CDR level (total N=25) will yield the desired confidence in estimated kappa, and if the test involves greater representation of CDR stages that are harder to evaluate, at least 42 ratings are needed. Testing newly trained raters with at least 5 patients per CDR stage will provide valid estimation of rater consistency, given the point estimate for kappa is roughly 0.80; fewer test cases increases the standard error and unequal distribution of test cases across CDR stages will lower kappa and increase error.

Citing Articles

Location In Vivo of the Innervation Zone in the Human Medial Gastrocnemius Using Imposed Contractions: A Comparison of the Usefulness of the M-Wave and H-Reflex.

Guzman-Venegas R, Palma-Traro F, Valencia O, Hudson M, Pincheira P J Funct Morphol Kinesiol. 2022; 7(4).

PMID: 36547653 PMC: 9781038. DOI: 10.3390/jfmk7040107.

Reliability and validity test of a novel three-dimensional acetabular bone defect classification system aided with additive manufacturing.

Zhang J, Hu Y, Ying H, Mao Y, Zhu Z, Li H BMC Musculoskelet Disord. 2022; 23(1):432.

PMID: 35534887 PMC: 9082860. DOI: 10.1186/s12891-022-05365-y.

Aducanumab: Appropriate Use Recommendations.

Cummings J, Aisen P, Apostolova L, Atri A, Salloway S, Weiner M J Prev Alzheimers Dis. 2021; 8(4):398-410.

PMID: 34585212 PMC: 8835345. DOI: 10.14283/jpad.2021.41.

Evaluation of first information reports of Delhi police for injury surveillance: Data extraction tool development & validation.

Yadav S, Edwards P, Porter J Indian J Med Res. 2020; 152(4):410-416.

PMID: 33380706 PMC: 8061583. DOI: 10.4103/ijmr.IJMR_442_20.

Comparison of three-dimensional and two-dimensional computed tomographies in the classification of acetabular fractures.

Kanthawang T, Vaseenon T, Sripan P, Pattamapaspong N Emerg Radiol. 2019; 27(2):157-164.

PMID: 31792749 DOI: 10.1007/s10140-019-01744-6.

References

Altaye M, Donner A, Klar N . Inference procedures for assessing interobserver agreement among multiple raters. Biometrics. 2001; 57(2):584-8. DOI: 10.1111/j.0006-341x.2001.00584.x. View

Tractenberg R, Schafer K, Morris J . Interobserver disagreements on clinical dementia rating assessment: interpretation and implications for training. Alzheimer Dis Assoc Disord. 2001; 15(3):155-61. DOI: 10.1097/00002093-200107000-00007. View

Schafer K, Tractenberg R, Sano M, Mackell J, Thomas R, Gamst A . Reliability of monitoring the clinical dementia rating in multicenter clinical trials. Alzheimer Dis Assoc Disord. 2004; 18(4):219-22. PMC: 4367865. View

Hughes C, Berg L, Danziger W, COBEN L, Martin R . A new clinical scale for the staging of dementia. Br J Psychiatry. 1982; 140:566-72. DOI: 10.1192/bjp.140.6.566. View

Walter S, Eliasziw M, Donner A . Sample size and optimal designs for reliability studies. Stat Med. 1998; 17(1):101-10. DOI: 10.1002/(sici)1097-0258(19980115)17:1<101::aid-sim727>3.0.co;2-e. View

Donner A, Eliasziw M . A goodness-of-fit approach to inference procedures for the kappa statistic: confidence interval construction, significance-testing and sample size estimation. Stat Med. 1992; 11(11):1511-9. DOI: 10.1002/sim.4780111109. View

Saito Y, Sozu T, Hamada C, Yoshimura I . Effective number of subjects and number of raters for inter-rater reliability studies. Stat Med. 2005; 25(9):1547-60. DOI: 10.1002/sim.2294. View

Donner A . Sample size requirements for the comparison of two or more coefficients of inter-observer agreement. Stat Med. 1998; 17(10):1157-68. DOI: 10.1002/(sici)1097-0258(19980530)17:10<1157::aid-sim792>3.0.co;2-w. View

McCulla M, Coats M, Van Fleet N, Duchek J, Grant E, Morris J . Reliability of clinical nurse specialists in the staging of dementia. Arch Neurol. 1989; 46(11):1210-1. DOI: 10.1001/archneur.1989.00520470070029. View

10.

BURKE W, Miller J, Rubin E, Morris J, COBEN L, Duchek J . Reliability of the Washington University Clinical Dementia Rating. Arch Neurol. 1988; 45(1):31-2. DOI: 10.1001/archneur.1988.00520250037015. View

11.

BLACKMAN N, Koval J . Interval estimation for Cohen's kappa as a measure of agreement. Stat Med. 2000; 19(5):723-41. DOI: 10.1002/(sici)1097-0258(20000315)19:5<723::aid-sim379>3.0.co;2-a. View

12.

Morris J . The Clinical Dementia Rating (CDR): current version and scoring rules. Neurology. 1993; 43(11):2412-4. DOI: 10.1212/wnl.43.11.2412-a. View

13.

Landis J, Koch G . The measurement of observer agreement for categorical data. Biometrics. 1977; 33(1):159-74. View