James M. LeBreton, Purdue University, and Jenell L. Senter, Wayne State University, published “Answers to 20 Questions About Interrater Reliability and Interrater Agreement” in the October 2008 issue of Organizational Research Methods (ORM). It was the most-frequently read article of July 2011 for ORM, based on calculations from Highwire-hosted articles. Most-read rankings are recalculated at the beginning of the month and are based on full-text and pdf views. Professor LeBreton kindly provided the following responses to the article.
Who is the target audience for this article?
The primary audience is graduate students, faculty, and practitioners working in organizational psychology, human resource management, and organizational behavior. That said, I have received a number of e-mails asking about the paper from colleagues scattered across a wide array of social sciences.
Were there findings that were surprising to you?
Our particular paper was designed to test a priori hypotheses. Instead, we sought to synthesize and integrate roughly 30 years of thinking on issues related to interrater reliability and interrater agreement. So we did not have “findings”, per se. Instead we structured our paper as 20 key questions related to the use of interrater reliability and agreement statistics. We then did our best to answer these questions in a way that would provide others with a clear set of guidelines for using these statistics in their research and work.
How do you see this study influencing future practice?
We hope that our paper provides helpful guidelines for individuals using interrater reliability and agreement statistics in their practice. Practitioners often invoke these statistics when conducting organizational climate or culture studies, performance evaluation studies, or even when examining the quality of rating obtained via a panel of interviewers. Our paper was written to provide important information for practitioners and researchers to help guide a) the selection of the correct agreement or reliability statistic, 2) the correct estimation of the statistic, 3) the correct interpretation of the statistic, and 4) understand how various features of their situation might influence estimates of agreement/reliability (e.g., missing data; number of items on a scale, number of raters/judges).
How does this study fit into your body of work/line of research?
I have been publishing articles that use or refine estimates of interrater agreement and reliability for roughly 10 years. This paper represents an opportunity to reflect on my thinking over these years and integrate it with the thinking of my co-author (Jenell Wittmer-Senter) to arrive at a product that we believe will be helpful to both researchers and practitioners.
How did your paper change during the review process?
The most substantive changes involved providing a more balanced treatment of the rwg coefficient. This is the coefficient that I use in my work and thus we were probably a bit too laudatory in our evaluation. The revised paper presents both the pros and cons of rwg. I still think it is a great way to estimate agreement, it is not without its limitations. Those are now addressed more explicitly in our final paper. We also expanded the set of agreement coefficients we discussed to include awg, AD, and SD.
What, if anything, would you do differently if you could go back and do this study again?
As I noted above, this paper wasn’t structured as a traditional “research study.” Thus, there are particular design or analysis issues I would like to do differently. Overall I am quite pleased with the paper. I believe it has the potential to serve as a helpful resource to individuals wanting to estimate interrater agreement and/or interrater reliability. It was structured as a Q & A paper. We certainly didn’t address all possible questions related to agreement and reliability, but I hope we addressed some of the more pressing ones for individuals who are new to using these statistics.