| ||||||||||||||
|
|
|||||||||||||
Department of Anaesthesia and Intensive Care, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, NT, Hong Kong
Address correspondence to Anna Lee, PhD, MPH, Department of Anaesthesia and Intensive Care, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, NT, Hong Kong. Address e-mail to annalee{at}cuhk.edu.hk.
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
The objective of this systematic review was to determine the accuracy of the Mallampati test for predicting the difficult airway. For the purposes of this review, the definition of a difficult airway included difficult laryngoscopy, difficult tracheal intubation, and difficult ventilation. The null hypothesis tested was that all versions of the Mallampati test had poor accuracy for identifying difficult airway. We also explored sources of heterogeneity to increase the clinical relevance of the results.
| Methods |
|---|
|
|
|---|
|
The studies we assessed included patients with no known risk factors for difficult tracheal intubation as well as patients with upper airway pathology, diabetes, obesity, and patients who were pregnant. All types of surgery were considered. Patients undergoing indirect laryngoscopy were excluded (8,15). Retrospective studies (3,16) and case-control studies (1719) were excluded because these would overestimate the diagnostic test accuracy compared with studies using a prospective clinical population. The reference tests for difficult airway with which the Mallampati test were compared included difficult laryngoscopy (as defined by the four-grade Cormack and Lehane scoring system (20) or the modified five-grade Cormack and Lehane classification (21) and difficult intubation and difficult ventilation (as defined by the authors).
Search Strategy
A systematic search of all relevant prospective observational studies was conducted. Relevant studies were identified from electronic databases (MEDLINE, EMBASE, Science Citation Index, The Cochrane Library) January 1985December 2004, and reference lists of relevant studies and reviews in major journals related to anesthesia. Articles were restricted to those published in English, as there is no evidence to suggest a strong association between language restriction and publication bias in systematic reviews of diagnostic tests (22). We used four databases to ensure that relevant articles were identified, as publication bias is more likely to be found if only one to two databases are used in systematic reviews of diagnostic tests (22). In addition, the following subject headings and text words, and their combinations, were included in electronic database search strategy: sensitivity, specificity, screening, false positive, false negative, predictive value of tests, reference values, roc analyses, roc area, roc characteristics, roc curve, endotracheal intubation, intratracheal intubation, laryngoscopy, difficult laryngoscopy, difficult intubation, Mallampati and Cormack and Lehane.
The methodological quality of eligible studies was assessed independently under open conditions. Methods of recruitment and blinding between test and reference test results among anesthesiologists were recorded. The patient population, type of surgery and details of test and reference tests were also collected. Data were obtained from studies independently by two or more investigators using a standardized data extraction form; disagreements were resolved by consensus. The primary author was contacted by letter or email for relevant data that were not presented in the original publication.
Outcome Measures
The primary outcomes were 1) difficult laryngoscopy (20,21) (Cormack and Lehane Grades 2b, 3 and 4) (Table 1) and 2) difficult tracheal intubation (as there is no standard definition for difficult intubation, we accepted the definition used by authors from each study). The secondary outcome was difficult ventilation, as defined by authors from each study. The primary and secondary outcomes were chosen because they are related to consensus guidelines (23) and are clinically important measures of difficult airway.
Statistical Analysis
The sensitivity, specificity, positive likelihood ratio, and negative likelihood ratio were determined individually from each included study. The accuracy of the test was judged by the magnitude of the positive and negative likelihood ratios (how much a given diagnostic test result will increase or decrease the pre-test probability of the target disorder) using the guide by Jaeschke et al. (24). The potential problems associated with sensitivities and specificities of 0% and 100% were solved by adding 0.5 to all cells of the diagnostic 2 x 2 table (13). The DerSimonian Laird method (random effects model) was used to incorporate variation among studies when pooling sensitivity, specificity, positive likelihood ratio, and negative likelihood ratio. However, when there was an association between sensitivity and specificity across studies (threshold effect), we did not report the individual weighted average for sensitivity, specificity, positive likelihood ratio, and negative likelihood ratio, as this would lead to under-estimation of diagnostic test performance (25,26). Instead, a summary receiver operator characteristic (sROC) curve of all the studies was created (27), as this is a better summary of the study results than a single joint summary estimate of sensitivity and specificity (see Appendix for details about construction and interpretation of sROC curve). We used the area under the sROC curve to judge the degree of accuracy of the tests according to published guidelines (28) (
0.97 = excellent, 0.93 to 0.96 = very good, 0.75 to 0.92 = good, 0.50 to 0.75 = poor). We computed a weighted average of the specificity from all studies using the random effects models; then the sensitivity was calculated from the sROC curve equations. Positive and negative likelihood ratios were derived from the summary sensitivity and specificity.
Heterogeneity was described using the I2statistic (29) for pooling sensitivity, specificity, and positive and negative likelihood ratios. Sensitivity analyses were performed to evaluate the robustness of results according to blinding of test results among anesthesiologists (blinding versus unclear/no blinding) for the primary outcomes.
Publication bias was assessed using the Eggers weighted regression method (30) with precision (1/standard error) and log odds ratio plotted. The intercept value in Eggers regression method provides an estimate of asymmetry of the funnel plot, with positive values indicating a trend towards higher levels of test accuracy in studies with smaller sample sizes. The threshold of significance was set at P < 0.10 for this method as this test has low power (31). All statistical analyses were performed using Stata version 8.0 (Stata Corp, College Station, TX) and MetaDiSc version 1.1.1 (Zamora J, Muriel A, Abraira V, Madrid, Spain).
| Results |
|---|
|
|
|---|
|
|
There was poor documentation about how the Mallampati tests were done with regard to body and head positions and the use of phonation. Nine studies had adequate description of all three aspects (4,7,43,51,53,56,57,64,65). Phonation during the Mallampati test was described in six studies (33,39,42,44,53,65). The modified Mallampati test was performed in Asian patients in eight studies (8,11,12,46,55,56,59,64). The modified Mallampati test was used in obstetric patients in five studies (7,11,43,46,59). In two studies (11,59) separate data of test characteristics were given for obstetric and gynecological patients. As there was a discrepancy between the abstract and text in the proportion of obstetric and non-obstetric patients in one study (52), it was excluded from meta-regression analyses.
Difficult Laryngoscopy
All studies used Cormack and Lehanes original classification for defining difficult laryngoscopy except one study (55) that used the modified five-grade score (21).
The original Mallampati test was used in 9 studies with 14,438 patients (Table 2). The prevalence of difficult laryngoscopy ranged from 6% to 27%. There was a high prevalence of difficult laryngoscopy in patients with cervical disease (36) and in patients with diabetes (40). In one study, the authors attributed the high prevalence of difficult laryngoscopy to the use of the McCoy laryngoscope (37). The sensitivity and specificity of the individual studies ranged from 0.05 to 1.00 and 0.65 to 0.98, respectively (Table 2). The positive and negative likelihood ratios of the 9 studies ranged from 1.71 to 32.08 and 0.14 to 0.97, respectively (Table 2). As there was an apparent relationship between sensitivity and specificity (Spearman r = 0.45), a sROC curve was constructed (Fig. 1). The area (± se) under the symmetrical sROC curve was 0.89 (0.05). Considering the threshold effect, the diagnostic odds ratio (DOR) was 19.57 (95% CI, 5.02 to 76.27). The summary estimate for sensitivity was 0.71 derived from equation (2) for the sROC curve at the summary specificity of 0.89. Hence, the summary positive and negative likelihood ratios were 6.45 and 0.33, respectively. Blinding (relative DOR) (rDOR) 0.49; 95% confidence interval [CI], 0.02 to 12.72) and phonation (rDOR 0.11, 95% CI, 0.00 to 5.47) did not change the diagnostic performance of the original Mallampati test for predicting difficult laryngoscopy. There was no evidence of publication bias being present in this meta-analysis (t = 1.19, P = 0.27).
|
The modified Mallampati tests (3,4) were used in 19 studies with 10,579 patients (Table 3). There was wide variability in the prevalence of difficult laryngoscopy, which ranged from 2% to 26% (Table 2). The highest prevalence was in acromegaly patients (51). The sensitivity and specificity of the individual studies ranged from 0.12 to 1.00 and 0.44 to 0.98, respectively (Table 2). There was an association between sensitivity and specificity (Spearmans r = 0.32). The area (± se) under the symmetrical sROC curve (Fig. 2) was 0.78 (0.05). Considering the threshold effect, the DOR was 6.45 (95% CI, 2.73 to 15.22). The summary estimate for sensitivity was 0.55 and the summary specificity was 0.84. The summary positive and negative likelihood ratios were 3.44 and 0.54, respectively. There was no significant difference in the areas under the sROC curve between the original and modified versions of the Mallampati test (z = 1.56; P = 0.12). Publication bias was not evident in the meta-analysis of 19 studies for difficult laryngoscopy (t = 0.57; P = 0.57).
|
Blinding (rDOR 1.49; 95% CI, 0.22 to 10.31) and phonation (rDOR 0.86; 95% CI, 0.13 to 5.76) did not change the diagnostic performance of the modified Mallampati test for predicting difficult laryngoscopy. Also, there were no differences in diagnostic test performance between studies of Asian and Caucasian patients (rDOR 1.09; 95% CI, 0.21 to 5.64). However, on meta-regression, the modified Mallampati test was 5.08 (95% CI, 1.26 to 20.58; P = 0.03) times more accurate in studies of obstetric patients than in studies in surgical patients.
Difficult Tracheal Intubation
There was wide variation in the definition of difficult tracheal intubation. Many studies (6,8,11,35,43,49,56,59) used the original Cormack and Lehane definition (20). Three studies (41,42,63) defined difficult tracheal intubation as a score >5 on the Intubation Difficulty Score described by Adnet et al. (41). This scoring system incorporates the number of attempts, number of additional operators, number of alternative intubation techniques, Cormack and Lehane grade, lifting force, laryngeal pressure, and vocal cord mobility. Two studies (7,46) in obstetric patients used Rocke et al.s classification (7) for defining difficult tracheal intubation.
The original Mallampati test (2) was used to predict difficult tracheal intubation in 5 studies enrolling 12,351 patients (Table 2). The prevalence of difficult tracheal intubation ranged from 6% to 13%. There were low sensitivities (0.34 to 0.66) and varying specificities (0.65 to 1.00). The positive likelihood ratios varied from 1.87 to 91.0; negative likelihood ratios varied from 0.50 to 0.73. There was an association between sensitivity and specificity (Spearmans r = 0.30). As the DOR was not constant across the threshold (b = 0.71; 95% CI, 1.21 to 0.21), the sROC curve was asymmetrical (Fig. 3). The area (± se) under the asymmetrical sROC was 0.58 (0.12). The summary estimate for the sensitivity was 0.50 derived from the equation (3) for the sROC curve at the summary specificity of 0.89. Phonation during the test (rDOR 0.27; 95% CI, 0.00 to 15.55) and blinding (rDOR 15.65; 95% CI, 0.14 to 1784.76) did not appear to affect the overall accuracy of the test. There was no evidence of publication bias being present in the studies pooled (t = 0.39, P = 0.72).
|
Twenty studies enrolling 13,957 patients (Table 3) examined the use of the modified Mallampati (3,4) test for predicting difficult tracheal intubation. The prevalence of difficult tracheal intubation ranged from 2% to 30% (Table 3). The high prevalence of difficult tracheal intubation (30%) occurred in patients with pharyngolaryngeal disease (50). The sensitivities ranged from 0 to 0.88 and the specificities ranged from 0.53 to 0.98 (Table 3). The positive likelihood ratios varied 1.43 to 27.19; negative likelihood ratios varied from 0.13 to 0.97 (Table 3). There was a correlation between sensitivity and specificity (Spearmans r = 0.47). The area (± se) under the symmetrical sROC (Fig. 4) was 0.83 (0.03). Considering the threshold effect, the DOR was 10.43 (95% CI, 5.32 to 20.48). The summary estimate for sensitivity was 0.76 when the summary specificity was 0.77. Hence, the summary positive and negative likelihood ratios were 3.30 and 0.31, respectively. The modified Mallampati test was better at identifying difficult tracheal intubation than the original Mallampati test (z = 2.02; P = 0.04). There was no evidence of publication bias in the 20 studies pooled for difficult tracheal intubation (t = 0.89; P = 0.39).
|
Blinding (rDOR 0.52; 95% CI, 0.17 to 1.65), phonation (rDOR 0.94; 95% CI, 0.13 to 6.69), and studies of Asian patients (rDOR 1.65; 95% CI; 0.44 to 6.24) did not change the diagnostic performance of the modified Mallampati test for predicting difficult tracheal intubation. The difference between obstetric patients and non-obstetric patients in the accuracy of the modified test for predicting difficult tracheal intubation was not significant (rDOR 2.69; 95% CI, 0.81 to 8.88, P = 0.10).
No failed intubation occurred in 12 studies (4,12,31,34,3941,4648,50,62). When failed intubation occurred, the prevalence varied from 0.1% (7) to 3.8% (57).
Difficult Ventilation
Three studies recorded difficult ventilation (6,9,65). Definitions varied and included inability to obtain chest excursion sufficient to maintain a clinically acceptable capnogram waveform despite optimal head and neck positioning, use of muscle paralysis, use of an oral airway, and optimal application of a facemask (6). In another study (65), ventilation via a mask was considered difficult only when the anesthesiologist considered that the difficulty was clinically relevant and could have led to problems if mask ventilation had to be maintained for a longer time. Bag-mask ventilation was considered difficult if one or more of the following factors were present: inability to maintain an adequate seal; inability to obtain chest excursion, obtain a good capnograph tracing, or maintain oxygen saturation more than 90% despite good muscle relaxation; the necessity of using a Guedel oral airway; or two-person bag-mask ventilation (9). The sensitivity, specificity, and positive and negative likelihood ratios are shown in Tables 2 and 3 for this outcome. Pooled sensitivity and specificity of the modified Mallampati test (3) were 0.26 (95% CI, 0.19 to 0.35) and 0.89 (95% CI, 0.88 to 0.90), respectively (9,65). There was little heterogeneity between the two studies for sensitivity (I2= 21%). However, there was substantial heterogeneity for specificity (I2= 90%). The positive and negative likelihood ratios were 2.42 (95% CI, 1.25 to 4.66) and 0.83 (95% CI, 0.71 to 0.98), respectively, suggesting poor accuracy. There were moderate amounts of heterogeneity among studies for positive and negative likelihood ratios (I2= 78% and 52% respectively).
To put the results of this systematic review in a clinical context, readers can estimate the post-test probability of a difficult airway after an examination of the airway using the modified Mallampati test according to the prevalence of difficult airway in their population (Fig. 5). The range of pre-test probabilities reflects the range of prevalence reported in this systematic review. If the pre-test probability of difficult airway is 10%, a positive test generates a post-test probability of difficult laryngoscopy of 28% and difficult intubation of 27%; a negative test generates a post-test probability of difficult laryngoscopy of 6% and difficult intubation of 3%.
|
| Discussion |
|---|
|
|
|---|
Both versions of the Mallampati test had good accuracy for identifying difficult laryngoscopy as assessed according to the original and modified Cormack and Lehane grading system. This system is widely used in clinical practice to describe the best view obtained by direct laryngoscopy with or without manipulation of the larynx. However, there is considerable uncertainty and inaccuracy in this grading system, especially between grade 2 and grade 3 (68). The incidence of difficult laryngoscopy may be underestimated, as most of the studies used the original Cormack and Lehane grading system. Approximately 3% (55) to 7% (21) of patients graded 2b, who would otherwise have been rated grade 2 in the original system, will have a high risk of difficult laryngoscopy. Such misclassification may affect the overall test performance of the Mallampati tests. Many studies used the same Cormack and Lehane grading system to define both difficult laryngoscopy and difficult intubation. Although difficult intubation is the end result of difficult laryngoscopy, the former also depends on the operators experience, patient characteristics, and clinical setting.
The recommended best way to perform the Mallampati test for predicting difficult laryngoscopy is putting the patient in a sitting position, with the head in full extension, the tongue out, and with phonation (53). However, many studies did not specifically document the way the Mallampati test was performed. Therefore, variations in the conduct of Mallampati tests may contribute to some of the heterogeneity of results seen in this systematic review. Unexpectedly, phonation did not influence the overall accuracy of the Mallampati tests.
There was a large variation among studies in the definition of difficult tracheal intubation. There is no current consensus on the definition of difficult tracheal intubation. Therefore, we used the definition from each study to establish an operational reference standard reflecting current clinical practice. The different definitions of difficult tracheal intubation may explain, in part, the heterogeneity of results in the sROC curves. For predicting difficult tracheal intubation, the original Mallampati test had very poor accuracy. Four of the five studies had sensitivities <50%. Small increases in sensitivity led to large sacrifices in specificity. The asymmetrical sROC curve suggests that accuracy was dependent on threshold. The lowest accuracy occurred when the threshold was high. This may be related to the quality of study. The lowest accuracy occurred in a study with the least amount of reviewer and patient selection bias (35). In contrast, the modified Mallampati test had good accuracy for predicting difficult tracheal intubation and was significantly better than the original test. The discrepancy in results between the two versions of the Mallampati test may be related to the definition of difficult tracheal intubation used and difference in the study populations.
The accuracy of the modified Mallampati test for predicting difficult laryngoscopy was five times higher in obstetric patients than in non-obstetric patients, although for predicting difficult tracheal intubation, the difference was not significant. This is consistent with studies that showed that pregnancy caused a 34% increase in Mallampati grade 4 (10) and that the risk of difficult intubation in obstetric patients was approximately 8 times more than in surgical patients (3). More difficult laryngoscopy in obstetric patients most likely occurs because of facial and pharyngeal edema secondary to hormonally induced fluid retention (69). These results suggest that the Mallampati tests are probably better at predicting difficult laryngoscopy associated with soft tissue changes compared with other anatomical factors.
We found little evidence of ethnic differences in the accuracy of modified Mallampati tests for difficult laryngoscopy and difficult intubation, despite known cephalometric differences among ethnic populations (70).
In a recent editorial, Murphy et al. (71) suggested that we should focus on "ventilatability" rather than "intubatability." The accuracy of the Mallampati tests for predicting difficult mask ventilation was poor, but this was based on only three studies. Therefore, these results should be interpreted with caution. For predicting difficult mask ventilation, the presence of 2 of 5 factors (age older than 55 years, body mass index >26 kg/m2, lack of teeth, presence of beard, and history of snoring) was associated with good accuracy (area under the curve 0.76 ± 0.11) (65). As expected, there was a strong association between difficult tracheal intubation and difficult mask ventilation (65).
Systematic review and meta-analysis are considered to provide the least biased estimates of effect but if the "raw material" is flawed, then the conclusions of systematic reviews will be compromised and invalid (66). The quality of reporting varied among studies; only a few studies described study methodology and Mallampati test assessments in adequate detail. We assumed that the quality of the study was inadequate if it was clearly stated that there were deficiencies in design and conduct. Omission of reporting specific details of a study was associated with systematic differences in results (72). Of the 42 studies included in this systematic review, only 5 studies recruited patients consecutively with test results blinded among anesthesiologists. This suggests that the majority of studies included in this systematic review may have less than adequate study methodology. Future studies of tests for identifying difficult airway should adopt the Standards for Reporting of Diagnostic Accuracy guidelines (73). This would allow readers to assess the potential for bias in the study and to evaluate the generalizability of study results.
Interpreting the reference test with knowledge of the results of the test under study can lead to an over-estimation of a tests accuracy (72). This is known as review bias. Unblinded studies tend to overestimate the diagnostic test accuracy by 1.3 times (95% CI, 1.0 to 1.9) (72). However, we did not find a significant effect of blinding on the Mallampati tests accuracy. We also minimized spectrum bias (study sample does not include the complete spectrum of patient characteristics) by excluding case-control studies from the systematic review. Diagnostic accuracy can be overestimated by 3 times (95% CI, 2.0 to 4.5) if the test is evaluated in a group of patients already known to have the disease and a separate group of normal patients, as in case-control studies (72).
Publication bias in meta-analyses of test accuracy is highly prevalent (22). This type of bias is a threat to the validity of meta-analysis as it can lead to inappropriate decision making and health care policies. We undertook a comprehensive literature search using several electronic databases. Although we restricted our systematic review to include English language studies, the inclusion of non-English language studies would only increase the precision without affecting the overall accuracy estimates. A previous study showed no relationship between publication bias and language restriction in reviews (22). We believe that our results are robust, as publication bias was not present.
As there were no pediatric studies, the results of our systematic review are applicable only to adults. There was a wide range of difficult airway prevalence, reflecting various patient characteristics, including pregnancy (7,11,43,46,59), pharyngolaryngeal disease (50), acromegaly (51), and obesity (44,61). As post-test probability depends on the disease prevalence, knowledge of the prevalence of difficult laryngoscopy and difficult intubation at any individual hospital will aid in the application of our results (Fig. 5). The decision to perform additional radiographic evaluation, consultation with other specialists or use special techniques/equipment to manage difficult airways will depend on how high the post-test probability is and, consequently, at what level the treatment threshold is set by the individual anesthesiologist.
The results of our systematic review question the routine use of the Mallampati tests. Given the poor to moderate inter-observer reliability of the modified Mallampati test (74,75) and the poor to good accuracy of the Mallampati tests, should we abandon their use? To decide this, anesthesiologists should balance the cost of failing to predict a difficult airway when there is a false negative result versus the possibility of unnecessary treatment when there is a false positive result. Used alone, the Mallampati tests are insufficient to confidently predict the presence or absence of a difficult airway; we believe they should form only a limited part of the overall assessment of the airway. As recommended by the American Society of Anesthesiologists Task Force on the management of the difficult airway (23), dentition, thyromental distance, and neck extension are other parts of the airway examination that also need to be examined.
The authors thank the authors of the original studies who responded to our requests for unpublished and additional data.
| Appendix |
|---|
|
|
|---|
|
|
where D = logit TPR logit FPR, S = logit TPR + logit FPR, a = intercept term, and b = regression coefficient for S. D is equivalent to the diagnostic odds ratio (DOR), which conveys the tests accuracy in discriminating diseased subjects from nondiseased subjects (76). S can be interpreted as a measure of the diagnostic test threshold, with high values corresponding to liberal inclusion criteria for diseased subjects (76). The regression coefficient b represents the dependence of the test accuracy on threshold. If b
0, then the studies are homogeneous and can be summarized by an overall DOR noting that a = ln(DOR) (76), giving a symmetrical sROC. The studies are heterogeneous with respect to the diagnostic odds ratio if b
0 (76). In this case, the sROC is asymmetrical. The DOR is related to the area under the sROC curve. A DOR of 1 is equivalent to an area under the sROC curve of 50%; the larger the DOR, the larger the area under the sROC curve. Ninety-five percent confidence intervals (95% CI) were estimated around the DOR. The areas under the sROC curve of the original and modified Mallampati test were compared using the method outlined by Hasselblad and Hedges (77). The resulting equations below (equations 2 and 3) represent the logit form of the sROC curve, from which a pooled estimate of TPR and FPR can be obtained. The equation (66) of the corresponding symmetrical sROC curve is given by:
|
|
The equation (26) of the corresponding asymmetrical sROC curve is given by:
|
|
Meta-regression was used to explore possible reasons for heterogeneity with a priori subgroups, including type of patient population (coded 1 = Asians, 0 = Caucasians; 1 = obstetrics, 0 = non-obstetrics) and phonation (coded 1 = yes, 0 = no) during the Mallampati tests. This was done by extending the sROC model introduced above (equation 1) to include a covariate (27). The resulting parameter estimates of the covariate can be interpreted, after antilogarithm transformation, as the relative DOR (rDOR) (72) and reflects the differences in threshold choice at different levels of the covariate. Fitting a covariate to the model does not result in a separate sROC curve for each level of the covariate, as the relationship between TPR and FPR is reflected only in a (78).
| Footnotes |
|---|
Supported solely from departmental and institutional funding.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
Mallampati Score Predicts Difficult Laryngoscopy and Intubation Journal Watch Emergency Medicine, July 28, 2006; 2006(728): 2 - 2. [Full Text] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||