» Articles » PMID: 22833776

Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial

Overview
Date 2012 Jul 27
PMID 22833776
Citations 934
Authors
Affiliations
Soon will be listed here.
Abstract

Many research designs require the assessment of inter-rater reliability (IRR) to demonstrate consistency among observational ratings provided by multiple coders. However, many studies use incorrect statistical procedures, fail to fully report the information necessary to interpret their results, or do not address how IRR affects the power of their subsequent analyses for hypothesis testing. This paper provides an overview of methodological issues related to the assessment of IRR with a focus on study design, selection of appropriate statistics, and the computation, interpretation, and reporting of some commonly-used IRR statistics. Computational examples include SPSS and R syntax for computing Cohen's kappa and intra-class correlations to assess IRR.

Citing Articles

Preoperative Magnetic Resonance Imaging Measurements of Hamstring Tendons' Cross-Sectional Area May Be Used to Predict the 5-Stranded Graft Diameter in Anterior Cruciate Ligament Reconstruction.

Ayres J, Ose B, Morey T, Brown E, Mar D, Henkelman E Arthrosc Sports Med Rehabil. 2025; 7(1):101001.

PMID: 40041840 PMC: 11873528. DOI: 10.1016/j.asmr.2024.101001.


Changes of spino-pelvic characteristics post-THA are independent of surgical approach: a prospective study.

Wagner M, Verhaegen J, Vorimore C, Innmann M, Grammatopoulos G Arch Orthop Trauma Surg. 2025; 145(1):165.

PMID: 39960539 PMC: 11832690. DOI: 10.1007/s00402-024-05739-y.


Assessing aesthetic impressions with pictorial measures: A novel approach in empirical aesthetics.

Stojilovic I Iperception. 2025; 16(1):20416695241309780.

PMID: 39958811 PMC: 11826878. DOI: 10.1177/20416695241309780.


Assessment of the global healthcare industry during COVID-19 pandemic: A content analysis approach.

Ladki M, Daher L, Abou Chacra R, Kassis E, Ayrout C, Moubayed H F1000Res. 2025; 12:1310.

PMID: 39931156 PMC: 11809683. DOI: 10.12688/f1000research.132486.1.


Magnetic resonance cholangiopancreatography for suspected cholangiopathy in children and young adults: a multi-reader agreement study.

Debnath P, Ata N, Cao J, Lala S, Malik A, Riedesel E Pediatr Radiol. 2025; 55(3):479-487.

PMID: 39903263 DOI: 10.1007/s00247-025-06173-x.


References
1.
GROSS S . The kappa coefficient of agreement for multiple observers when the number of subjects is small. Biometrics. 1986; 42(4):883-93. View

2.
Shrout P, Fleiss J . Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979; 86(2):420-8. DOI: 10.1037//0033-2909.86.2.420. View

3.
Putka D, LE H, McCloy R, Diaz T . Ill-structured measurement designs in organizational research: implications for estimating interrater reliability. J Appl Psychol. 2008; 93(5):959-81. DOI: 10.1037/0021-9010.93.5.959. View

4.
Byrt T, Bishop J, Carlin J . Bias, prevalence and kappa. J Clin Epidemiol. 1993; 46(5):423-9. DOI: 10.1016/0895-4356(93)90018-v. View

5.
Cohen J . Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull. 2009; 70(4):213-20. DOI: 10.1037/h0026256. View