Anesth Analg 2006;102:1501-1503
© 2006 International Anesthesia Research Society
doi: 10.1213/01.ane.0000200314.73035.4d
ECONOMICS, EDUCATION, AND HEALTH SYSTEMS RESEARCH
Peer Review Interrater Concordance of Scientific Abstracts: A Study of Anesthesiology Subspecialty and Component Societies
Ira Todd Cohen, MD, FAAP, and
Kantilal Patel, PhD
Department of Anesthesiology and Pediatrics, Childrens National Medical Center, George Washington University, Washington, DC
Address correspondence and reprint requests to Ira Todd Cohen, MD, Childrens National Medical Center, 111 Michigan Avenue, NW, Washington, DC 20010. Address e-mail to icohen{at}cnmc.org.
 |
Abstract
|
|---|
Abstracts presented at anesthesiology subspeciality and component society meetings are chosen by peer review. We assessed this process by examining selection criteria and determining interrater concordance. For the societies studied, the level of reviewer agreement ranged from poor to moderate, i.e., slightly better than by chance alone. We hypothesize that having clearer evaluation criteria, scoring systems with interval scales, and assessment based on quality can strengthen the peer review process.
 |
Introduction
|
|---|
Abstract presentations are an integral part of scientific and medical society meetings. They create a forum for the introduction of new ideas and collaborative learning among colleagues. The selection of abstracts for presentation typically involves a peer review process. This practice has been examined with a focus on selection criteria and interrater concordance (15). The specialty of anesthesiology has numerous subspecialty or component organizations. Annually, these societies meet independently to review and present information of interest to their membership. The purpose of this study was to examine and compare each of these organizations abstract acceptance criteria and to determine the interrater concordance of their peer review process.
 |
Methods
|
|---|
After IRB approval, we requested the review criteria, abstract grades, and individual reviewer scoring from nine affiliated societies listed by the American Society of Anesthesiologists (6). These organizations are the American Society of Critical Care Anesthesiologists, American Society of Regional Anesthesia and Pain Medicine, Association of University Anesthesiologists, Society for Ambulatory Anesthesia, Society of Cardiovascular Anesthesiologists, Society for Education in Anesthesia, Society of Neurosurgical Anesthesia and Critical Care, Society for Obstetric Anesthesia and Perinatology, and Society for Pediatric Anesthesia. Data were collected for the 2004 calendar year. Anonymity of individual authors, reviewers, and organizations was guaranteed and maintained. Data collected included the number of abstracts submitted, abstract groupings, the number of reviewers, stated assessment criteria, and rating scales (5).
Interrater concordance was calculated for each organization or subgroup of abstracts. A two-way analysis of variance was used to calculate the between-observation (PMS), between-reviewers (RMS), and residual (EMS) mean sums of squares. Concordance (Kappa) was determined for each group by the following equation in which n = number of abstracts, k = number of reviewers (7):
The strength of the agreement was defined as: <0.2 = poor, 0.210.40 = fair, 0.410.60 = moderate, 0.610.80, = substantial, and >0.81 = almost perfect (8).
 |
Results
|
|---|
Eight of the nine affiliated societies supplied the requested data. In 2004, 92 reviewers assessed 663 abstracts. The number of abstracts evaluated by each reviewer ranged from 31 to 99. The number of reviewers assigned to each abstract ranged from 4 to10. Three of the affiliated societies divided abstracts into subgroups, two arbitrarily and one by application and/or content (i.e., case reports, posters, oral presentations). In all cases the reviewers were blinded to the submitting authors and institutions.
Table 1 summarizes the abstract data, review criteria, scoring techniques, and calculated kappa coefficients. The format for abstract evaluations varied. Five societies listed established guidelines and three listed none. The following criteria, with minor variation in wording, were used by four societies: 1) originality, 2) interest or clinical relevance, 3) clarity of writing, 4) methods, 5) analysis, 6) results, and 7) discussion or conclusions. The remaining societies criteria also included educational value and ethical conduct.
Several different rating scales were used to assign abstract grades. Five organizations used nominal scales: rejection, possible rejection, possible acceptance, or acceptance, applying values of 1, 2, 3 and 4 or 70, 73, 77, and 80, respectively. Two societies used a 1 to 5 scale. One society used a 1 to 10 scale. These scales were not linked to rejection or acceptance and resulted in greater abstract score variability and a higher degree of reviewer agreement.
 |
Discussion
|
|---|
The peer review process and interrater concordance among anesthesiology subspecialty societies was examined. Interrater concordance for categorical data is not held to the same standards as that for objective observation and scientific instruments. In the latter category, an acceptable kappa coefficient is 0.8 or more. Landis and Koch (9) established a method for the analysis of multivariant categorical data involving agreement between more than two observers. They concluded that tests of significance should be used in a descriptive context to identify variation as opposed to a strict numerical context. For the component societies studied, the level of interrater concordance ranged from poor to moderate. The lack of agreement is comparable to values obtained if abstract scores were randomly assigned.
A similar low level of reviewer agreement on abstract evaluation has been reported by other medical subspecialties, including anesthesiology (5), orthopedic trauma (9), ambulatory pediatrics (10), and hepatology (11). Low interrater concordance has also been observed for the peer review process for manuscript publication (1,12,13). In addition, no differences among kappa statistics for reviewer groups have been found when reviewers were blinded or unblinded as to authors (14), did or did not apply set criteria (2), and did or did not attend instructional workshops (15). Only "positive findings" resulted in higher kappa coefficients (12). Shared expertise among reviewers also results in more agreement (16) but this was not observed for the anesthesiology subspecialty societies studied.
Review criteria lacking clear descriptors or standards allow for subjectivity and more discordance. The categories of originality, methods, and writing can achieve greater objectivity by establishing guidelines such as demonstrates an understanding of existing research, applies appropriate tests and measurements to obtain stated goals, and uses effective style and organization. Categories such as interest and value can also achieve clarity with descriptors such as appropriate for forum and critically evaluates importance, strengths, and weakness. In this review, organizations that used no set selection criteria tended to have the lowest kappa coefficients.
Kappa coefficient represents the proportion of interrater concordance obtained, corrected for that which would be expected to occur by chance. Low levels of agreement can be attributed to the lack of variability in observation scores. Narrowly defined, nominal-based scales cluster scores, reduce accuracy, and mathematically limit variability (7). Landis and Koch (8) found that linking scores to acceptance generates further clustering. They concluded that quality-based scales of greater range result in a higher level of agreement. The component societies that used such scales tended to have higher kappa coefficients.
A review of component anesthesiology societies demonstrated abstract reviewers interrater concordance to be poor to moderate and comparable to those reported by other medical subspecialties. We suggest that the peer review process might be strengthened by: a) defining precise evaluation criteria, b) using scoring systems that allow for greater variance, and c) assessment based on abstract quality that is not directly linked to acceptance or rejection.
 |
Footnotes
|
|---|
Accepted for publication December 1, 2005.
 |
References
|
|---|
- Bhandari M, Swiontkowski MF, Einhorn TA, et al. Interobserver agreement in the application of levels of evidence to scientific papers in the American volume of the Journal of Bone and Joint Surgery. J Bone Joint Surg Am 2004;86:171720.[Abstract/Free Full Text]
- van der Steen LP, Hage JJ, Kon M, Monstrey SJ. Validity of a structured method of selecting abstracts for a plastic surgical scientific meeting. Plast Reconstr Surg 2004;113:3539.[Medline]
- Timmer A, Sutherland LR, Hilsden RJ. Development and evaluation of a quality score for abstracts. BMC Med Res Methodol 2003;3:2.[Medline]
- Montgomery AA, Graham A, Evans PH, Fahey T. Inter-rater agreement in the scoring of abstracts submitted to a primary care research conference. BMC Health Serv Res 2002;26: 2:8.
- Cohen IT, Patel K. Peer Review Interrater Reliability of Scientific Abstracts: A Study of a Subspecialty Component of the American Society of Anesthesiologists. Journal for Education in Perioperative Medicine. 2005;7(2): JulyDec. http://jepmadmin.org. Accessed January 24, 2006.
- American Society of Anesthesiologists. Purpose http://www.asahq.org/relatedorgs/subspecofficers.htm. Accessed: March 12, 2005.
- Fleiss JL. The design and analysis of clinical experiments. John Wiley & Sons, Inc., New York, NY, 1989.
- Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:15974.[ISI][Medline]
- Bhandari M, Templeman D, Tornetta P. Interrater reliability in grading abstracts for the orthopaedic trauma association. Clin Orthop 2004;423:21721.
- Kemper KJ, McCarthy PL, Cicchetti D. Improving participation and interrater agreement in scoring Ambulatory Pediatric Association abstracts: how well have we succeeded? Arch Pediatr Adolesc Med 1996;150:3803.[Abstract]
- Vilstrup H, Sorensen HT. A comparative study of scientific evaluation of abstracts submitted to the 1995 European Association for the Study of the Liver Copenhagen meeting. Dan Med Bull 1998;45:3179.[Medline]
- Callaham ML, Wears RL, Weber EJ et al. Positive-outcome bias and other limitations in the outcome of research abstracts submitted to a scientific meeting. JAMA 1998;15: 280: 2547.
- Rothwell PM, Martyn CN. Reproducibility of peer review in clinical neuroscience: is agreement between reviewers any greater than would be expected by chance alone? Brain 2000;123:19649.[Abstract/Free Full Text]
- Smith J Jr., Nixon R Bueschen AJ, et al. Impact of blinded versus unblinded abstract review on scientific program content. J Urol 2002;168:21235.[Medline]
- Callaham ML, Schriger DL. Effect of structured workshop training on subsequent performance of journal peer reviewers. Ann Emerg Med 2002;40:3238.[Medline]
- Ernst E, Resch KL. Reviewer bias: a blinded experimental study. J Lab Clin Med 1994;124:17882.[ISI][Medline]
This article has been cited by other articles:

|
 |

|
 |
 
S. Hopewell, A. Eisinga, and M. Clarke
Better reporting of randomized trials in biomedical journal and conference abstracts
Journal of Information Science,
April 1, 2008;
34(2):
162 - 173.
[Abstract]
[PDF]
|
 |
|