| ||||||||||||||
|
|
|||||||||||||
Departments of Medicine and Anesthesiology, Hospital for Special Surgery, Cornell University Medical College, New York, New York
Address correspondence to Pamela Williams-Russo, MD, MPH, Robert Wood Johnson Foundation, Route 1 and College Rd. E., PO Box 2316, Princeton, NJ 08543-2316. Address e-mail to prusso{at}rwjf.org Reprints will not be available from the author.
| Abstract |
|---|
|
|
|---|
statistic, a measure of actual agreement beyond agreement by chance. When continuing checks on its operationalization and reliability are included, the modified Wilson scale provides a simple and reliable means by which to assess and monitor intraoperative sedation. IMPLICATIONS: We evaluated the interrater reliability of the Wilson scale for measuring sedation during regional anesthesia. Paired anesthesia care providers ratings of patient sedation indicated very good interrater reliability in both the original scale and a modified version. The modified Wilson scale provides a quick noninvasive means of monitoring sedation during regional anesthesia.
| Introduction |
|---|
|
|
|---|
There are three major categories of methods for assessing levels of sedation currently in use for adults: patient-based, observer-based, and machine-based methods. Machine-based methods are generally perceived to be the most objective assessments. Current options include the bispectral index (BIS), power spectral measure, and auditory evoked potentials (AEP). AEPs predict the response to verbal stimuli during general anesthesia (1,2), but it is unclear whether AEP latency and amplitude show graded changes as anesthesia lightens. Power spectral measure shows good correlation with drug concentrations and with increases in blood pressure or movements during general anesthesia (3), but at light anesthetic levels ambiguity increases because median and spectral edge frequencies can be the same whether the patient is awake or asleep. A number of studies have demonstrated the ability of the BIS to predict loss of consciousness and response to verbal commands (4,5). However, the correlation of BIS with more subtle gradations of sedation has not yet been determined; BIS readings are affected by the use of a regional anesthetic even when no other medications have been administered (6,7). Measurements of heart rate variability and respiratory sinus arrhythmia have also been proposed for use as the basis of a sedation score (8). Although machine-based assessments, in particular the BIS, offer a quantifiable means by which to measure sedation, their use is limited by their inability to discriminate among lesser degrees of sedation.
Patient-based assessment of sedation is frequently accomplished through the use of one or several 100-mm visual analog scales whose end points represent two extremes of sedation (e.g., "wide awake" to "extremely sleepy" or "as alert as I have ever been" to "I cannot keep awake") (9,10). The patient is asked to mark the point representing his or her own perception of the degree of sedation. These scales are quick and easy to administer, but their reliability and between-patient applicability is limited, as is their feasibility at higher degrees of sedation.
Literature review of observer-based sedation scales for adult patients demonstrates a wide variety of scales currently in use. Many of the identified scales with documented validity are designed for use in intensive care units, predominantly with mechanically ventilated patients (1118). It is important to distinguish scales intended specifically for use in intensive care units (ICUs) from those used to assess sedation during surgical procedures or in response to drug administration, because their primary aim is to assess calmness rather than level of consciousness (15). Furthermore, certain ICU scales, such as the Glasgow Coma Scale, are distorted by the use of sedatives (19), thus further limiting their use in surgical settings.
The most frequently cited observer-based scales for assessing sedation are the Ramsay sedation scale (20) and the Observers Assessment of Alertness/Sedation (OAA/S) (21). The six-category Ramsay scale, developed in the early 1970s, provides a simple, quick assessment of sedation; however, despite the widespread nature of its use, its reliability and validity have not been reported. A comparison of five sedation scoring systems by means of AEPs identified the Ramsay scale as having the best correlation with AEP (22); however, in 1994, Hansen-Flaschen et al. (23) identified numerous shortcomings of the Ramsay scale, including unclear definition of the sedation levels, lack of exclusivity among sedation levels, and its focus on assessing consciousness rather than sedation. The psychometric properties of the Ramsay scale have not been formally assessed.
The OAA/S is one of the few sedation scales whose reliability has been documented. However, to provide continuing measures of sedation, the OAA/S requires frequent stimulation; consequently, its usefulness is limited in surgical situations, because it could prove disruptive to both patient and surgeon (24). Although the OAA/S is reliable as a means of assessing level of alertness, it is not ideal for performing rapid, repeated assessments of a patients degree of sedation.
In their 1990 study comparing the sedative effects of propofol and midazolam during spinal anesthesia for orthopedic surgeries, Wilson et al. (25) used a categorical scale in which an observer rated the degree of sedation (Table 1). A variation on the Ramsay scale, the Wilson scale presents a simple means for assessing intraoperative sedation; however, there are no published data regarding the reliability and validity of this scale. The purpose of this study was to test the interobserver reliability of the scale proposed by Wilson et al. and to modify the scale as necessary to maximize reliability and feasibility in assessing sedation with regional anesthesia. The goal was to develop a valid and reliable method for assessing maintenance of a specified level of sedation for use in a randomized clinical trial comparing two different levels of intraoperative sedation with regard to a variety of clinical outcomes.
|
| Methods |
|---|
|
|
|---|
Cases eligible for inclusion in this study were those using a regional anesthetic where two anesthesia providers were present at the same time in the operating room. More than 85% of the orthopedic surgical procedures performed at the Hospital for Special Surgery are performed with a regional anesthetic. A consecutive convenience sample was drawn from the cases in which an attending anesthesiologist had been paired with a resident, fellow, or certified registered nurse anesthetist (CRNA) or in which a second attending was available. Orthopedic surgeries using a broad range of regional anesthetics and sedatives were included. Approval was obtained from the hospitals IRB. The assessment of the original Wilson sedation scale took place during August 1998. On the basis of the results, a revised scale was then tested from January to March 2001.
The sedation level of each patient was assessed once during the case, a minimum of 10 min after the administration of a regional block. Patient sedation level was assessed by asking the two anesthesia providers to rate sedation level simultaneously but independently as the study research assistant administered a standardized oral stimulus followed by a standardized physical stimulus. As the standardized oral stimulus, the patient was addressed by the research assistant: "[Name], please open your eyes." The command was given in a normal speaking voice by the same research assistant throughout the study. For the standardized physical stimulus, if the patient did not respond to the spoken command, a quick, firm earlobe tug was applied to the right ear.
After the stimuli, the two anesthesia providers each independently rated their assessment of the patients sedation on the basis of the five-point Wilson scale (and subsequently the modified four-point scale). The raters were blinded to each others ratings.
The data were analyzed with EpiInfo, version 6.04 (http://www.cdc.gov/epiinfo/ei6.htm; Centers for Disease Control and Prevention, Atlanta, GA). Interrater reliability was assessed by using the unweighted
statistic, which measures the concordance beyond chance between measurements of nominal data (40,41). When measuring observer agreement, the
statistic is preferred, because it accounts for agreement occurring by chance, is a measure of concordance rather than trend, and can account for systematic observer bias (42). The value of
can range from -1 (complete disagreement) to 0 (chance agreement) to +1 (perfect agreement). In deciding the level of significance of a
value, the following guidelines have been suggested: <0 to 0.40, poor to fair; 0.41 to 0.60, moderate; 0.61 to 0.80, substantial; and 0.81 to 1.00, almost perfect (41).
| Results |
|---|
|
|
|---|
Interrater percentage agreement on sedation scores with the original Wilson sedation scale was 79%, with a
coefficient of 0.72 (P < 0.00001), signifying substantial agreement. The major source of disagreement was between scores of 2 (drowsy) versus 3 (eyes closed but rousable to command) (Table 2). When Categories 2 and 3 were merged to form a modified four-point Wilson sedation scale, the
coefficient increased to 0.90, signifying excellent agreement. Analysis of interrater agreement as a function of the training of the second anesthesia care provider indicated no meaningful difference among the four possible combinations (Table 3).
|
|
|
coefficient of 0.75 (P = 0.0000), signifying substantial agreement (Table 5). Analysis of interrater agreement as a function of the training of the second anesthesia care provider suggested lower concordance between attending/resident pairs or attending/CRNA pairs than between attending/attending pairs (Table 3).
|
| Discussion |
|---|
|
|
|---|
This study of regional anesthesia patients documents the interrater reliability of Wilsons original sedation scale to be fairly good for assessing light sedation, with the exception of poor discrimination between Categories 2 (drowsy) and 3 (eyes closed but rousable to command). Because the descriptions for Categories 2 and 3 do not describe mutually exclusive states (i.e., one can be drowsy with or without ones eyes being closed), they do not fulfill this criterion for the construct validity of a scale (43,44).
To decrease the uncertainty associated with distinguishing between Categories 2 and 3, we modified the Wilson sedation scale by combining these two categories and by operationalizing the descriptions of each category with more specific criteria, as shown in Table 4. A preliminary statistical analysis using the original data suggested that the degree of agreement on the modified scale should be excellent. Data subsequently obtained with the modified Wilson scale, however, did not indicate an improvement in interrater agreement as measured by the
coefficient. Several factors may explain this observation.
First, the modified scale has one fewer category, thus predisposing to an increased likelihood of agreement by chance alone, as was observed. Second, the modified-scale study had a smaller sample; thus, a smaller absolute number of disagreements had a greater effect on the ratio of observed versus expected concordance. Third, with both versions of the scale, there was one category that was never assigned simultaneously by both raters. When the original Wilson scale was used, there were no paired scores of 2 assigned by both raters. The 14 pairings of Scores 2 and 3 suggest that agreement was affected by raters difficulty in distinguishing these two categories. In contrast, raters never both assigned a score of 3 when using the modified Wilson scale. However, the presence of only two 3 and 4 pairings suggests not a lack of clear distinction between categories, but rather that moderate sedation was uncommon among the types of surgical procedures observed.
It is interesting to examine the differences in interrater agreement on the basis of observers levels of training or experience. When the original Wilson scale was used, no meaningful difference was seen among the three categories of anesthesia care provider pairings. (The pairing of an attending with a CRNA was excluded from this comparison, because of the great variability of duration of experience among the participating CRNAs.) It must be noted, however, that well over a third of the pairings (38%) were attending/resident. In contrast, when the modified Wilson scale was used, the distribution of observer pairings (with the exception of attending/fellow) was more balanced, and the expected decrease in concordance with decreasing experience was seen. Although this would support the notion that more experience leads to better interrater reliability, the applicability of this analysis is limited by the relatively small number of cases involved.
Light sedation remains operationally the most difficult grade of sedation to evaluate, as indicated by the results of this evaluation of the interrater reliability of the Wilson sedation scale as well as by published findings on the limitations of BIS with lighter grades of sedation. Current machine-based methodologies remain unable to maintain consistently clear distinctions between levels of lighter sedation, as would be needed, for example, when comparing outcomes in patients maintained at a Wilson Level 2 versus Level 4 during regional anesthesia. The modified Wilson scale seems to offer the best current means by which to monitor intraoperative sedation.
Tools for assessing variables of physiologic function, be they observer, patient, or machine based, must in all cases be developed, checked, and rechecked systematically. Machine-based measurements are meaningless unless the mechanical tool has been calibrated according to a known standard, to ensure both validity (accuracy) and consistency (reliability). The same is true for observer-based measurements: reliability, accuracy, and precision must continually be evaluated to maintain a consistent standard of measurement. When observer-based methods are developed for use in research and clinical settings, they must include operationalized definitions, assessor training and retraining, and periodic checks on inter- and intrarater reliability (45,46) to achieve the consistency necessary for scientific measurement (47). Researchers intending to use observer-based assessment scales should include quality control training and repeated calibration within their research protocol.
The modified Wilson sedation scale has good interrater reliability, and its clarification of each scoring category allows it to provide improved discrimination between levels of light sedation. It is quick and easy to use; by defining clear sedation end points, it can be used for determining sedation during regional anesthesia as well as a reference with which to correlate measures obtained from diverse monitoring devices. When accompanied by continuing checks on its reliability and construct validity, it will be a valuable tool for establishing and evaluating the effect of specific degrees of sedation, or differing sedative regimens, on patient outcomes after regional anesthesia.
| Acknowledgments |
|---|
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
D. Hohener, S. Blumenthal, and A. Borgeat Sedation and regional anaesthesia in the adult patient Br. J. Anaesth., January 1, 2008; 100(1): 8 - 16. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Tirel, E. Wodey, R. Harris, J. Y. Bansard, C. Ecoffey, and L. Senhadji Variation of bispectral index under TIVA with propofol in a paediatric population Br. J. Anaesth., January 1, 2008; 100(1): 82 - 87. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|