| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The original and modified Mallampati tests are commonly used to predict the difficult airway, but there is controversy regarding their accuracy. We searched MEDLINE and other databases for prospective studies of patients undergoing general anesthesia in which the results of a preoperative Mallampati test were compared with the subsequent rate of difficult airway (difficult laryngoscopy, difficult intubation, or difficult ventilation as reference tests). Forty-two studies enrolling 34,513 patients were included. The definitions of the reference tests varied widely. For predicting difficult laryngoscopy, both versions of the Mallampati test had good accuracy (area under the summary receiver operating characteristic (sROC) curve = 0.89 ± 0.05 and 0.78 ± 0.05, respectively). For predicting difficult intubation, the modified Mallampati test had good accuracy (area under the sROC curve = 0.83 ± 0.03) whereas the original Mallampati test was poor (area under the sROC curve = 0.58 ± 0.12). The Mallampati tests were poor at identifying difficult mask ventilation. Publication bias was not detected. Used alone, the Mallampati tests have limited accuracy for predicting the difficult airway and thus are not useful screening tests.
Difficult laryngoscopy and difficult tracheal intubation occur in 1.5% to 8% of general anesthetics (1). Of available methods, the Mallampati, original (2) and modified (3,4) tests, are used as a preoperative bedside test to predict a difficult airway (5). However, the usefulness of this test is unclear, as published studies have produced variable estimates of diagnostic test accuracy. The original Mallampati test (2) identified difficult intubations with a high degree of accuracy, with sensitivity of 50% and specificity of 100%. However, subsequent larger studies have shown only modest degrees of accuracy using the original (6) and modified (79) versions of the test. Furthermore, the accuracy of the Mallampati test may vary according to patients ethnic group and sex and whether they are pregnant (10). For example, in Asian patients it may be more difficult to intubate the trachea than in Caucasians (11,12). The objective of this systematic review was to determine the accuracy of the Mallampati test for predicting the difficult airway. For the purposes of this review, the definition of a difficult airway included difficult laryngoscopy, difficult tracheal intubation, and difficult ventilation. The null hypothesis tested was that all versions of the Mallampati test had poor accuracy for identifying difficult airway. We also explored sources of heterogeneity to increase the clinical relevance of the results.
This systematic review and meta-analysis followed guidelines on conducting systematic reviews of diagnostic studies (13,14). We included all prospective observational studies of patients undergoing general anesthesia who had preoperative Mallampati test assessments and a subsequent assessment of difficult laryngoscopy, difficult tracheal intubation, or difficult mask ventilation. Difficult airway was defined by a grade III score in the original Mallampati test (2) or a grade III or IV in the modified Mallampati test (3,4) (Table 1).
The studies we assessed included patients with no known risk factors for difficult tracheal intubation as well as patients with upper airway pathology, diabetes, obesity, and patients who were pregnant. All types of surgery were considered. Patients undergoing indirect laryngoscopy were excluded (8,15). Retrospective studies (3,16) and case-control studies (1719) were excluded because these would overestimate the diagnostic test accuracy compared with studies using a prospective clinical population. The reference tests for difficult airway with which the Mallampati test were compared included difficult laryngoscopy (as defined by the four-grade Cormack and Lehane scoring system (20) or the modified five-grade Cormack and Lehane classification (21) and difficult intubation and difficult ventilation (as defined by the authors).
Search Strategy The methodological quality of eligible studies was assessed independently under open conditions. Methods of recruitment and blinding between test and reference test results among anesthesiologists were recorded. The patient population, type of surgery and details of test and reference tests were also collected. Data were obtained from studies independently by two or more investigators using a standardized data extraction form; disagreements were resolved by consensus. The primary author was contacted by letter or email for relevant data that were not presented in the original publication.
Outcome Measures
Statistical Analysis Heterogeneity was described using the I2statistic (29) for pooling sensitivity, specificity, and positive and negative likelihood ratios. Sensitivity analyses were performed to evaluate the robustness of results according to blinding of test results among anesthesiologists (blinding versus unclear/no blinding) for the primary outcomes. Publication bias was assessed using the Eggers weighted regression method (30) with precision (1/standard error) and log odds ratio plotted. The intercept value in Eggers regression method provides an estimate of asymmetry of the funnel plot, with positive values indicating a trend towards higher levels of test accuracy in studies with smaller sample sizes. The threshold of significance was set at P < 0.10 for this method as this test has low power (31). All statistical analyses were performed using Stata version 8.0 (Stata Corp, College Station, TX) and MetaDiSc version 1.1.1 (Zamora J, Muriel A, Abraira V, Madrid, Spain).
Our literature search identified 42 studies that enrolled 34,513 patients (2,4,69,11,12,3265). One study was excluded because of inconsistencies in the presented data (58). Two studies were excluded in which the modified Mallampati test was assessed as part of a more comprehensive risk score (53,57) but data for the modified Mallampati test component were unavailable. The characteristics of the included studies are summarized in Tables 2 and 3. There were no studies in children.
The quality of the studies was assessed according to the method of patient recruitment and blinding of the Mallampati tests and reference tests results among anesthesiologists. Patients were recruited consecutively in 19 studies (2,4,6,8,34,35,4044,46,47,49,50,52,59,60,62). Constantikes (39) took a convenient sample of 30 patients. Blinding was reported in 10 studies (7,11,35,38,46,4850,52,53). Both consecutive patient recruitment and blinding occurred in 5 studies (35,46,49,50,52). There was poor documentation about how the Mallampati tests were done with regard to body and head positions and the use of phonation. Nine studies had adequate description of all three aspects (4,7,43,51,53,56,57,64,65). Phonation during the Mallampati test was described in six studies (33,39,42,44,53,65). The modified Mallampati test was performed in Asian patients in eight studies (8,11,12,46,55,56,59,64). The modified Mallampati test was used in obstetric patients in five studies (7,11,43,46,59). In two studies (11,59) separate data of test characteristics were given for obstetric and gynecological patients. As there was a discrepancy between the abstract and text in the proportion of obstetric and non-obstetric patients in one study (52), it was excluded from meta-regression analyses.
Difficult Laryngoscopy The original Mallampati test was used in 9 studies with 14,438 patients (Table 2). The prevalence of difficult laryngoscopy ranged from 6% to 27%. There was a high prevalence of difficult laryngoscopy in patients with cervical disease (36) and in patients with diabetes (40). In one study, the authors attributed the high prevalence of difficult laryngoscopy to the use of the McCoy laryngoscope (37). The sensitivity and specificity of the individual studies ranged from 0.05 to 1.00 and 0.65 to 0.98, respectively (Table 2). The positive and negative likelihood ratios of the 9 studies ranged from 1.71 to 32.08 and 0.14 to 0.97, respectively (Table 2). As there was an apparent relationship between sensitivity and specificity (Spearman r = 0.45), a sROC curve was constructed (Fig. 1). The area (± se) under the symmetrical sROC curve was 0.89 (0.05). Considering the threshold effect, the diagnostic odds ratio (DOR) was 19.57 (95% CI, 5.02 to 76.27). The summary estimate for sensitivity was 0.71 derived from equation (2) for the sROC curve at the summary specificity of 0.89. Hence, the summary positive and negative likelihood ratios were 6.45 and 0.33, respectively. Blinding (relative DOR) (rDOR) 0.49; 95% confidence interval [CI], 0.02 to 12.72) and phonation (rDOR 0.11, 95% CI, 0.00 to 5.47) did not change the diagnostic performance of the original Mallampati test for predicting difficult laryngoscopy. There was no evidence of publication bias being present in this meta-analysis (t = 1.19, P = 0.27).
The modified Mallampati tests (3,4) were used in 19 studies with 10,579 patients (Table 3). There was wide variability in the prevalence of difficult laryngoscopy, which ranged from 2% to 26% (Table 2). The highest prevalence was in acromegaly patients (51). The sensitivity and specificity of the individual studies ranged from 0.12 to 1.00 and 0.44 to 0.98, respectively (Table 2). There was an association between sensitivity and specificity (Spearmans r = 0.32). The area (± se) under the symmetrical sROC curve (Fig. 2) was 0.78 (0.05). Considering the threshold effect, the DOR was 6.45 (95% CI, 2.73 to 15.22). The summary estimate for sensitivity was 0.55 and the summary specificity was 0.84. The summary positive and negative likelihood ratios were 3.44 and 0.54, respectively. There was no significant difference in the areas under the sROC curve between the original and modified versions of the Mallampati test (z = 1.56; P = 0.12). Publication bias was not evident in the meta-analysis of 19 studies for difficult laryngoscopy (t = 0.57; P = 0.57).
Blinding (rDOR 1.49; 95% CI, 0.22 to 10.31) and phonation (rDOR 0.86; 95% CI, 0.13 to 5.76) did not change the diagnostic performance of the modified Mallampati test for predicting difficult laryngoscopy. Also, there were no differences in diagnostic test performance between studies of Asian and Caucasian patients (rDOR 1.09; 95% CI, 0.21 to 5.64). However, on meta-regression, the modified Mallampati test was 5.08 (95% CI, 1.26 to 20.58; P = 0.03) times more accurate in studies of obstetric patients than in studies in surgical patients.
Difficult Tracheal Intubation The original Mallampati test (2) was used to predict difficult tracheal intubation in 5 studies enrolling 12,351 patients (Table 2). The prevalence of difficult tracheal intubation ranged from 6% to 13%. There were low sensitivities (0.34 to 0.66) and varying specificities (0.65 to 1.00). The positive likelihood ratios varied from 1.87 to 91.0; negative likelihood ratios varied from 0.50 to 0.73. There was an association between sensitivity and specificity (Spearmans r = 0.30). As the DOR was not constant across the threshold (b = 0.71; 95% CI, 1.21 to 0.21), the sROC curve was asymmetrical (Fig. 3). The area (± se) under the asymmetrical sROC was 0.58 (0.12). The summary estimate for the sensitivity was 0.50 derived from the equation (3) for the sROC curve at the summary specificity of 0.89. Phonation during the test (rDOR 0.27; 95% CI, 0.00 to 15.55) and blinding (rDOR 15.65; 95% CI, 0.14 to 1784.76) did not appear to affect the overall accuracy of the test. There was no evidence of publication bias being present in the studies pooled (t = 0.39, P = 0.72).
Twenty studies enrolling 13,957 patients (Table 3) examined the use of the modified Mallampati (3,4) test for predicting difficult tracheal intubation. The prevalence of difficult tracheal intubation ranged from 2% to 30% (Table 3). The high prevalence of difficult tracheal intubation (30%) occurred in patients with pharyngolaryngeal disease (50). The sensitivities ranged from 0 to 0.88 and the specificities ranged from 0.53 to 0.98 (Table 3). The positive likelihood ratios varied 1.43 to 27.19; negative likelihood ratios varied from 0.13 to 0.97 (Table 3). There was a correlation between sensitivity and specificity (Spearmans r = 0.47). The area (± se) under the symmetrical sROC (Fig. 4) was 0.83 (0.03). Considering the threshold effect, the DOR was 10.43 (95% CI, 5.32 to 20.48). The summary estimate for sensitivity was 0.76 when the summary specificity was 0.77. Hence, the summary positive and negative likelihood ratios were 3.30 and 0.31, respectively. The modified Mallampati test was better at identifying difficult tracheal intubation than the original Mallampati test (z = 2.02; P = 0.04). There was no evidence of publication bias in the 20 studies pooled for difficult tracheal intubation (t = 0.89; P = 0.39).
Blinding (rDOR 0.52; 95% CI, 0.17 to 1.65), phonation (rDOR 0.94; 95% CI, 0.13 to 6.69), and studies of Asian patients (rDOR 1.65; 95% CI; 0.44 to 6.24) did not change the diagnostic performance of the modified Mallampati test for predicting difficult tracheal intubation. The difference between obstetric patients and non-obstetric patients in the accuracy of the modified test for predicting difficult tracheal intubation was not significant (rDOR 2.69; 95% CI, 0.81 to 8.88, P = 0.10). No failed intubation occurred in 12 studies (4,12,31,34,3941,4648,50,62). When failed intubation occurred, the prevalence varied from 0.1% (7) to 3.8% (57).
Difficult Ventilation To put the results of this systematic review in a clinical context, readers can estimate the post-test probability of a difficult airway after an examination of the airway using the modified Mallampati test according to the prevalence of difficult airway in their population (Fig. 5). The range of pre-test probabilities reflects the range of prevalence reported in this systematic review. If the pre-test probability of difficult airway is 10%, a positive test generates a post-test probability of difficult laryngoscopy of 28% and difficult intubation of 27%; a negative test generates a post-test probability of difficult laryngoscopy of 6% and difficult intubation of 3%.
This systematic review of the literature identified many studies describing the performance of the original (2) and modified (3,4) Mallampati tests to predict the difficult airway. There was substantial variability in the reported sensitivity and specificity among the studies and in definitions of the reference tests. Unlike meta-analysis of interventions, which produces one answer, the performance of diagnostic tests is affected by changes in sensitivity and specificity as reflected in the sROC curves (Figs. 1 to 4). Thus, there is no unique joint summary estimate of sensitivity and specificity; it is only possible to obtain a summary estimate of one value conditional on the value of the other (66). Overall, the accuracy of the Mallampati tests was poor to good, depending on the version of the test and reference test used. Our results are not directly comparable to a recent meta-analysis of bedside screening test for predicting difficult intubation (67). In Shiga et al.s meta-analysis (67), there was no distinction made between the various versions of the Mallampati test or between difficult laryngoscopy and difficult intubation, a major limitation of their study. Nevertheless, they concluded that the Mallampati tests clinical value of a bedside screening test was limited as it had poor to moderate discriminative power when used alone. Our results concur with this view. Both versions of the Mallampati test had good accuracy for identifying difficult laryngoscopy as assessed according to the original and modified Cormack and Lehane grading system. This system is widely used in clinical practice to describe the best view obtained by direct laryngoscopy with or without manipulation of the larynx. However, there is considerable uncertainty and inaccuracy in this grading system, especially between grade 2 and grade 3 (68). The incidence of difficult laryngoscopy may be underestimated, as most of the studies used the original Cormack and Lehane grading system. Approximately 3% (55) to 7% (21) of patients graded 2b, who would otherwise have been rated grade 2 in the original system, will have a high risk of difficult laryngoscopy. Such misclassification may affect the overall test performance of the Mallampati tests. Many studies used the same Cormack and Lehane grading system to define both difficult laryngoscopy and difficult intubation. Although difficult intubation is the end result of difficult laryngoscopy, the former also depends on the operators experience, patient characteristics, and clinical setting. The recommended best way to perform the Mallampati test for predicting difficult laryngoscopy is putting the patient in a sitting position, with the head in full extension, the tongue out, and with phonation (53). However, many studies did not specifically document the way the Mallampati test was performed. Therefore, variations in the conduct of Mallampati tests may contribute to some of the heterogeneity of results seen in this systematic review. Unexpectedly, phonation did not influence the overall accuracy of the Mallampati tests. There was a large variation among studies in the definition of difficult tracheal intubation. There is no current consensus on the definition of difficult tracheal intubation. Therefore, we used the definition from each study to establish an operational reference standard reflecting current clinical practice. The different definitions of difficult tracheal intubation may explain, in part, the heterogeneity of results in the sROC curves. For predicting difficult tracheal intubation, the original Mallampati test had very poor accuracy. Four of the five studies had sensitivities <50%. Small increases in sensitivity led to large sacrifices in specificity. The asymmetrical sROC curve suggests that accuracy was dependent on threshold. The lowest accuracy occurred when the threshold was high. This may be related to the quality of study. The lowest accuracy occurred in a study with the least amount of reviewer and patient selection bias (35). In contrast, the modified Mallampati test had good accuracy for predicting difficult tracheal intubation and was significantly better than the original test. The discrepancy in results between the two versions of the Mallampati test may be related to the definition of difficult tracheal intubation used and difference in the study populations. The accuracy of the modified Mallampati test for predicting difficult laryngoscopy was five times higher in obstetric patients than in non-obstetric patients, although for predicting difficult tracheal intubation, the difference was not significant. This is consistent with studies that showed that pregnancy caused a 34% increase in Mallampati grade 4 (10) and that the risk of difficult intubation in obstetric patients was approximately 8 times more than in surgical patients (3). More difficult laryngoscopy in obstetric patients most likely occurs because of facial and pharyngeal edema secondary to hormonally induced fluid retention (69). These results suggest that the Mallampati tests are probably better at predicting difficult laryngoscopy associated with soft tissue changes compared with other anatomical factors. We found little evidence of ethnic differences in the accuracy of modified Mallampati tests for difficult laryngoscopy and difficult intubation, despite known cephalometric differences among ethnic populations (70). In a recent editorial, Murphy et al. (71) suggested that we should focus on "ventilatability" rather than "intubatability." The accuracy of the Mallampati tests for predicting difficult mask ventilation was poor, but this was based on only three studies. Therefore, these results should be interpreted with caution. For predicting difficult mask ventilation, the presence of 2 of 5 factors (age older than 55 years, body mass index >26 kg/m2, lack of teeth, presence of beard, and history of snoring) was associated with good accuracy (area under the curve 0.76 ± 0.11) (65). As expected, there was a strong association between difficult tracheal intubation and difficult mask ventilation (65). Systematic review and meta-analysis are considered to provide the least biased estimates of effect but if the "raw material" is flawed, then the conclusions of systematic reviews will be compromised and invalid (66). The quality of reporting varied among studies; only a few studies described study methodology and Mallampati test assessments in adequate detail. We assumed that the quality of the study was inadequate if it was clearly stated that there were deficiencies in design and conduct. Omission of reporting specific details of a study was associated with systematic differences in results (72). Of the 42 studies included in this systematic review, only 5 studies recruited patients consecutively with test results blinded among anesthesiologists. This suggests that the majority of studies included in this systematic review may have less than adequate study methodology. Future studies of tests for identifying difficult airway should adopt the Standards for Reporting of Diagnostic Accuracy guidelines (73). This would allow readers to assess the potential for bias in the study and to evaluate the generalizability of study results. Interpreting the reference test with knowledge of the results of the test under study can lead to an over-estimation of a tests accuracy (72). This is known as review bias. Unblinded studies tend to overestimate the diagnostic test accuracy by 1.3 times (95% CI, 1.0 to 1.9) (72). However, we did not find a significant effect of blinding on the Mallampati tests accuracy. We also minimized spectrum bias (study sample does not include the complete spectrum of patient characteristics) by excluding case-control studies from the systematic review. Diagnostic accuracy can be overestimated by 3 times (95% CI, 2.0 to 4.5) if the test is evaluated in a group of patients already known to have the disease and a separate group of normal patients, as in case-control studies (72). Publication bias in meta-analyses of test accuracy is highly prevalent (22). This type of bias is a threat to the validity of meta-analysis as it can lead to inappropriate decision making and health care policies. We undertook a comprehensive literature search using several electronic databases. Although we restricted our systematic review to include English language studies, the inclusion of non-English language studies would only increase the precision without affecting the overall accuracy estimates. A previous study showed no relationship between publication bias and language restriction in reviews (22). We believe that our results are robust, as publication bias was not present. As there were no pediatric studies, the results of our systematic review are applicable only to adults. There was a wide range of difficult airway prevalence, reflecting various patient characteristics, including pregnancy (7,11,43,46,59), pharyngolaryngeal disease (50), acromegaly (51), and obesity (44,61). As post-test probability depends on the disease prevalence, knowledge of the prevalence of difficult laryngoscopy and difficult intubation at any individual hospital will aid in the application of our results (Fig. 5). The decision to perform additional radiographic evaluation, consultation with other specialists or use special techniques/equipment to manage difficult airways will depend on how high the post-test probability is and, consequently, at what level the treatment threshold is set by the individual anesthesiologist. The results of our systematic review question the routine use of the Mallampati tests. Given the poor to moderate inter-observer reliability of the modified Mallampati test (74,75) and the poor to good accuracy of the Mallampati tests, should we abandon their use? To decide this, anesthesiologists should balance the cost of failing to predict a difficult airway when there is a false negative result versus the possibility of unnecessary treatment when there is a false positive result. Used alone, the Mallampati tests are insufficient to confidently predict the presence or absence of a difficult airway; we believe they should form only a limited part of the overall assessment of the airway. As recommended by the American Society of Anesthesiologists Task Force on the management of the difficult airway (23), dentition, thyromental distance, and neck extension are other parts of the airway examination that also need to be examined. The authors thank the authors of the original studies who responded to our requests for unpublished and additional data.
The sROC curve method considers heterogeneity across studies attributable to differences in the threshold values used. Even if the same threshold has been used, inter-observer differences in the Mallampati grades (74,75) may lead to inherent variations in the positive results cutoff. To confirm that there was a threshold effect, the true-positive rate (TPR) and false-positive rate (FPR) of each study were plotted against each other, and the Spearman correlation coefficient was calculated. In creating the sROC, the TPR and FPR were converted to their logits, and the sum and differences of the logits were estimated. Equally unweighted least squares linear regression of the following model was performed:
where D = logit TPR logit FPR, S = logit TPR + logit FPR, a = intercept term, and b = regression coefficient for S. D is equivalent to the diagnostic odds ratio (DOR), which conveys the tests accuracy in discriminating diseased subjects from nondiseased subjects (76). S can be interpreted as a measure of the diagnostic test threshold, with high values corresponding to liberal inclusion criteria for diseased subjects (76). The regression coefficient b represents the dependence of the test accuracy on threshold. If b
The equation (26) of the corresponding asymmetrical sROC curve is given by:
Meta-regression was used to explore possible reasons for heterogeneity with a priori subgroups, including type of patient population (coded 1 = Asians, 0 = Caucasians; 1 = obstetrics, 0 = non-obstetrics) and phonation (coded 1 = yes, 0 = no) during the Mallampati tests. This was done by extending the sROC model introduced above (equation 1) to include a covariate (27). The resulting parameter estimates of the covariate can be interpreted, after antilogarithm transformation, as the relative DOR (rDOR) (72) and reflects the differences in threshold choice at different levels of the covariate. Fitting a covariate to the model does not result in a separate sROC curve for each level of the covariate, as the relationship between TPR and FPR is reflected only in a (78).
Accepted for publication February 6, 2006. Supported solely from departmental and institutional funding.
This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|