| ||||||||||||||
|
|
|||||||||||||





Departments of *Anesthesiology and Pain Medicine and
Biostatistics and Applied Mathematics, The University of Texas M. D. Anderson Cancer Center, Houston;
Department of Anesthesia, The University of Iowa Roy J. and Lucille A. Carver College of Medicine, Iowa City; and
Department of Anesthesia, Children's Mercy Hospitals & Clinics, Kansas City, Missouri
Address Correspondence and reprint requests to Mohamed Naguib, MB, BCh, MSc, FFARCSI, MD, Department of Anesthesiology and Pain Medicine, Unit 409, The University of Texas M. D. Anderson Cancer Center, 1400 Holcombe Blvd., Houston, TX 77030. Address e-mail to Naguib{at}mdanderson.org.
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
Difficult tracheal intubation accounted for approximately 17% of adverse respiratory events in an American Society of Anesthesiologists closed-claims analysis (10). In 85% of these cases, the outcome was either death or brain damage (10). Increases in the incidence of morbid nonfatal events have also been noted in patients who have undergone difficult tracheal intubation (1114). These events included desaturation, hypertension, esophageal intubation, pharyngeal trauma, dental injury, cancellation of surgery, increased hospital stay, and an increased rate of unexpected intensive care unit admission.
In most studies, difficult laryngoscopy has been defined as a view of the larynx corresponding to grade 3 or 4 in the classification of difficult intubation by Cormack and Lehane (5). The American Society of Anesthesiologists defines difficult tracheal intubation as when "proper insertion of the endotracheal tube with conventional laryngoscopy requires more than 3 attempts, or more than 10 min" (15).
Although unanticipated difficult intubation has been the subject of many studies, a puzzling feature of these studies is the wide variation in the reported sensitivity of the different models used for prediction of this problem (1,3,4,6,1622). A test performed to predict difficult intubation should have high sensitivity so that it will identify most patients in whom intubation will truly be difficult. We are not aware of any studies that have evaluated different multivariate models in the same population of patients to determine the most sensitive model for predicting difficult intubation. Therefore, we designed and performed the double-blind, case-controlled study described herein to compare and validate the predictive performance of three multivariate clinical models described by Wilson et al. (1), Arné et al. (20), and Naguib et al. (21) in a group of patients who had confirmed unanticipated difficult intubation. The sensitivity reported for the latter two models has been the highest sensitivity for such clinical models reported in the literature. Subsequently, we developed a new model for predicting difficult intubation.
| Methods |
|---|
|
|
|---|
Postoperatively, patients who had unanticipated difficult intubation were approached by an investigator after they had fully awoken from general anesthesia. If a patient agreed to participate in the study, an investigator invited a second patient from that day's surgical schedule to participate as a control. Each control patient who had undergone uneventful general anesthesia without any reported difficulties at laryngoscopy or tracheal intubation was closely matched demographically with a study patient to age, weight, height, and sex. Difficult intubation patients and their matched controls were the only patients who consented to participate in this study. A second blinded investigator then evaluated the two patients in the postanesthesia care unit, second-stage recovery facility, or ward. To reduce measurement bias, patients were instructed by the consenting investigator not to comment on their sore throat, potential airway difficulty, or any other aspect of their anesthetic experience to the blinded investigator. The details of the laryngoscopic findings and degree of difficulty of intubation were not known by the investigator who interviewed and measured the patient pairs. All of these assessments were performed by one of three investigators.
The clinical assessment included:
|
4 or more (1), Arné model score >11 (20), and Naguib model score <0 (21). The Naguib model is based on the formula clinical prediction = 4.9504 + (thyrosternal distance x 1.1003) + (Mallampati score x 2.6076) + (thyromental distance x 0.9684) + (neck circumference x 0.3966).
|
Positive predictive value and negative predictive value were calculated based on a prevalence of difficult intubation of 5.8% (4), as reported in recent meta-analysis.
Statistical analyses were performed by using the SAS software program (version 9.1; SAS Institute Inc., Cary, NC). Each model was assessed based on the entire group of patients. Thus, the Cochran Q value was computed to test the homogeneity of the patient groups. Demographic differences were determined by using the
2 test and were considered significant when P < 0.05.
We also subjected patient data (age, weight, height, sex, thyromental distance, Mallampati score, interincisor gap, and neck circumference) to a logistic regression model to identify variables that are predictors of difficult intubation. For this analysis, the Mallampati score was dichotomized such that a score of 1 or 2 was scored as 0 and a score of 3 or 4 was scored as 1.
The receiver operating characteristic (ROC) curve was used to describe the discrimination abilities and to explore the trade-offs between the sensitivity and specificity of the different models (25). The ROC area under the curve (AUC) is frequently viewed as a robust indicator of the performance of classification models. The AUC is a performance indicator equivalent to the nonparametric concordance measure, Somers D, and the difference between two ROC areas is half the difference between the corresponding Somers D values (26). The STATA software program (version 8; Stata Corp, College Station, TX) was used to assess the difference between ROC AUCs based on the
2 test developed from the generalized U-statistics theory by DeLong et al. (27).
| Results |
|---|
|
|
|---|
There were no significant differences in the mean age, weight, or height between the two groups (Table 3). However, the mean interincisor gap, thyromental distance, thyrosternal distance, neck circumference, and Mallampati score differed significantly.
|
In the 97 patients in the difficult intubation group, tracheal intubation was achieved under direct laryngoscopy after several attempts (mean ± sd, 3.3 ± 1.1) in 40 patients and with the use of a gum-elastic bougie in another 19 patients. Direct laryngoscopic intubation was completely unsuccessful in 38 of 97 patients. In 15 of these 38 patients in whom direct laryngoscopic intubation was unsuccessful, fiberoptic-guided tracheal intubation was performed successfully (9 while the patient was awake [awakened after intubation failed] and 6 while the patient was asleep, including 1 who underwent fiberoptic-guided tracheal tube placement via an intubating LMA). Fiberoptic-guided intubation was unsuccessful in another five patients. In the remaining 23 of 38 patients, tracheal intubation was performed with the aid of an intubating LMA in 9 patients. A LMA was used in another nine patients and blind nasal intubation in four patients. One patient in whom tracheal intubation was difficult was allowed to awaken, and a regional technique was used. This patient suffered postoperative oral trauma and swelling. No other complications were noted.
The number of patients enrolled in the study during the first 15 months from September 1999 through December 2000 was 86 and decreased thereafter to 32, 18, 30, and 28 patients in December 2001, December 2002, December 2003, and November 2004, respectively.
The highest sensitivity was achieved with the Naguib model (Table 4). Specifically, the sensitivity of this model was 81.4% (95% confidence interval [CI], 74.0%89.0%) compared with 40.2% (95% CI, 30.0%50.0%) for the Wilson model and 54.6% (95% CI, 45.0%65.0%) for the Arné model. Naguib model was significantly more sensitive than the other 2 models based on a pair-wise comparison using the McNemar test (P < 0.0001). Cochran Q statistic value indicated that the 3 models differed significantly with respect to their prediction accuracy (P < 0.02). Both Naguib model and Arné models classified more intubations correctly (P = 0.01) than the Wilson model (Table 5). The McNemar test indicated that the Arné model and Naguib model did not differ significantly in their prediction accuracy (P = 0.6; kappa = 0.2). However, the specificity of the Arné model (94.9% [95% CI, 90.0%99.0%]) and Wilson model (92.8% [95% CI, 88.0%98.0%]) was significantly higher (P < 0.0001) than that of the Naguib model (72.2% [95% CI, 63.0%81.0%).
|
|
The ROC AUC that measured the discriminating power of the Arné, Naguib, and Wilson model was 0.87 (95% CI, 0.820.92), 0.82 (95% CI, 0.760.88), and 0.79 (95% CI, 0.720.85), respectively (Fig. 1). The ROC AUC for the Arné model was significantly greater than that of the Wilson model (P = 0.001).
|
Logistic regression analysis identified four risk factors correlated with the prediction of difficult laryngoscopy and intubation: thyromental distance, interincisor gap, height, and Mallampati score. The prediction (l) was determined by the equation
|
|
in which the thyromental distance, interincisor gap, and height were measured in centimeters and Mallampati score was 0 or 1. Using this equation for predicting difficult intubation, the laryngoscopy and intubation would be easy if the numerical value (l) in the equation is less than zero (i.e., negative) but difficult if the numerical value (l) is more than zero (i.e., positive).
The posterior probability of group membership for each patient was used to compare the model prediction with the actual outcome. This new model correctly predicted 84% (163 of 194) of the cases. The sensitivity, specificity, positive predictive value, and negative predictive value of this model were 82.5% (95% CI, 73%89%), 85.6% (95% CI, 77%91%), 26.1%, and 98.8%, respectively. The ROC AUC for this model was 0.90 (95% CI, 0.860.95) (Fig. 1).
A variable correlation analysis showed that height was significantly correlated with both interincisor gap (P < 0.0001) and thyromental distance (P = 0.0007). The existence of this multi-colinearity allows height, which is not significant at the univariate level (Table 5), to be a significant factor in the multivariate model (Table 6).
|
The total number of adult patients who underwent general anesthesia and were initially eligible for the study during the study period was 73,696. The trachea proved unexpectedly difficult to intubate in 97 patients (0.13%) and was impossible to intubate in 38 patients (0.05%).
| Discussion |
|---|
|
|
|---|
The ideal model for prediction of difficult intubation would have perfect sensitivity and specificity. Sensitivity and specificity are dependent on each other: an increase in one of them usually results in a decrease in the other. High specificity may also increase the positive predictive value despite low sensitivity, as seen with the Wilson and Arné models in this study. A more pressing question seems to be whether sensitivity and specificity are equally important. Clinical models used to predict difficult tracheal intubation have different trade-offs in optimizing sensitivity and optimizing specificity. We believe that the purpose of any such model should be detection of as many patients with a difficult airway as possible to minimize the potentially serious consequences of unanticipated difficult tracheal intubation. To that end, a model with high sensitivity, rather than high specificity, is required. A model with high sensitivity, low specificity, and low positive predictive value (as seen with the Naguib model) would incorrectly classify patients as having a difficult airway. This would probably increase the financial and emotional costs for the patients when, for example, an alternate intubation technique such as awake fiberoptic-guided intubation, is used. However, these costs may only be a fraction of those that accompany the potentially serious outcome of unanticipated difficult tracheal intubation. Therefore, the sensitivity of a prediction model is more important than the specificity and should be weighted more heavily when determining which model to use.
The value of the simplified risk index used in this study (>11) was the value recommended by Arné et al. (20). In their original description of their model, Naguib et al. (21) reported a sensitivity and specificity of 95% and 91%, respectively, whereas Arné et al. (20) reported a sensitivity and specificity of 94% and 93%, respectively. Of note is that in the present study both models had a lower sensitivity than previously demonstrated.
Oates et al. (17) evaluated the Wilson risk sum (score
2) in 675 cases. They reported a positive predictive value of 8.9% with a low sensitivity (42%) and high specificity (92%). Using the same threshold, Yamamoto et al. (28) reported that the Wilson risk sum yielded a low positive predictive value (5.9%), low sensitivity (55.4%), and high specificity (86.1%). Similarly, Siddiqi and Kazi (22) reported that both Wilson risk sum (score
2) and Mallampati classification have a similar sensitivity of 42% but different positive predictive values of 11% and 5%, respectively. A higher threshold is preferred for two reasons. First, the prevalence of unanticipated difficult tracheal intubation is small. Second, a false-positive result increases the potential for the serious consequence of failed tracheal intubation.
In the present study, univariate differences between the difficult intubation and control groups in the interincisor gap, thyromental distance, thyrosternal distance, neck circumference, and Mallampati score were noted. The most popular clinical test for predicting the ease of tracheal intubation is the Mallampati test (16). Because difficult laryngoscopy is a multifactorial problem, clearly no simple predictive test can be used alone. Simple bedside tests such as the Mallampati test (16,29), thyromental distance measurement (18), and sternomental distance measurement (30,31) have been found to be of limited use in predicting difficult laryngoscopy when performed alone. Effective prediction requires a combination of tests (4,32). A recent meta-analysis found the combination of the Mallampati test and thyromental distance to be the most accurate predictors of difficult intubation; however, this combination has a very low sensitivity of 36% (95% CI, 14%59%) (4).
We developed a new clinical prediction model that considers the thyromental distance, Mallampati score, interincisor gap, and height. This model is 82.5% sensitive and 85.6% specific with an AUC of 0.90. Height was found to be significantly correlated with both interincisor gap (P < 0.0001) and thyromental distance (P = 0.0007). The significance of height as a predictor of difficult intubation was addressed previously by Schmitt et al. (33). They reported that the ratio of height to thyromental distance was a more sensitive indictor of difficult intubation than the thyromental distance alone (33). We considered a model that included the ratio of height to thyromental distance, which yielded identical results to the new model. For the sake of parsimony, we chose to include only height in our model instead of the ratio of height to thyromental distance, as suggested by Schmitt et al. (33). The new model must be prospectively validated.
The incidence of unanticipated difficult tracheal intubation in this study (0.13%) is less frequent than the range of 1%18% reported by others (13). Also, the incidence of impossible tracheal intubation in our study (0.05%) is at the low end of the range of 0.05%0.35% reported previously (5,6). The number of patients enrolled in the first 15 months of our study was 44% of the total number of patients (86 of 194). However, the number of patients enrolled decreased dramatically over approximately the next four years, suggesting a possible Hawthorne effect. The authors feel that this study may have increased practitioner awareness of difficult airways in patients presenting for surgery and prompted more aggressive use of alternate airway-management techniques, leading to a decrease in the incidence of unanticipated difficult intubation. A Hawthorne effect (identified observer effect) is defined as the tendency of individuals to improve their behaviors or performance when they know that they are under observation (34,35). The authors also realize they were dependent upon many individual practitioners' assessment and self-reporting of difficult airways. This is a known limitation of voluntary reporting techniques for critical incidents in health care (36).
A potential limitation of a matched case-controlled study (such as this study) is the possibility that some segments of population may not be adequately represented in the study participants. Another limitation of our study was that it was not a truly prospective study. It can be best described as a quasi-prospective evaluation of three models for the prediction of unanticipated difficult intubation, because patients were identified, recruited, and examined after attempted intubation. However, we do not believe that this would have a significant impact on our results.
In conclusion, our study is the first to provide an evidence-based foundation for selection of the most sensitive model for prediction of unanticipated difficult tracheal intubation. We confirmed the high sensitivity of the Naguib model but failed to do so for the Arné and Wilson models. We also created a new model for predicting unanticipated difficult intubation, although it has not yet been prospectively tested. This model may be more sensitive and specific than any of the models currently used to predict difficult intubation.
| Footnotes |
|---|
Accepted for publication October 17, 2005.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. Naguib, J. E. Ensor, and C. O'Sullivan Predictive Performance of Three Multivariate Difficult Tracheal Intubation Models: A Double-Blind, Case-Control Study Anesth. Analg., December 1, 2006; 103(6): 1581 - 1581. [Full Text] [PDF] |
||||
![]() |
W. A. van Klei, C. J. Kalkman, and K. G. M. Moons Predictive Performance of Three Multivariate Difficult Tracheal Intubation Models: A Double-Blind, Case-Control Study Anesth. Analg., December 1, 2006; 103(6): 1579 - 1581. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|