| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
We investigated the validity of several statistical methods to monitor the cancellation of electively scheduled cases on the day of surgery: 2 test, Fishers exact test, Rao and Scott test, Students t-test, Clopper-Pearson confidence intervals, and Chen and Tipping modification of the Clopper-Pearson confidence intervals. Discrete-event computer simulation over many years was used to represent surgical suites with an unchanging cancellation rate. Because the true cancellation rate was fixed, the accuracy of the statistical methods could be determined. Cancellations caused by medical events, rare events, cases lasting longer than scheduled, and full postanesthesia or intensive care unit beds were modeled. We found that applying Students two-sample t-test to the transformation of the numbers of cases and canceled cases from each of six 4-wk periods was valid for most conditions. We recommend that clinicians and managers use this method in their quality monitoring reports. The other methods gave inaccurate results. For example, using 2 or Fishers exact test, hospitals may erroneously determine that cancellation rates have increased when they really are unchanged. Conversely, if inappropriate statistical methods are used, administrators may claim success at reducing cancellation rates when, in fact, the problem remains unresolved, affecting patients and clinicians.
Case cancellations on the day of surgery are generally undesirable. For hospitals in the United States of America (U.S.) not on a fixed annual budget, the lost revenue from each canceled case averages $1430 to $1700 USD per operating room (OR) hour plus the variable cost of performing the case (1,2). For non-U.S. hospitals and U.S. hospitals with a fixed annual budget (e.g., Veterans Affairs), canceling a case and performing it on another day increases costs to the physicians, hospital, patient, and society, even if overtime would have been required to perform the case on the day it was originally scheduled (3). For example, more than half of family members of pediatric patients miss at least 1 day at work when cases are canceled (4). Similarly, the person accompanying an adult patient often gives up a day of work. Finally, the appropriate managerial response to frequent cancellations rate on the day of surgery is to have patients arrive earlier on the day of surgery (5). This strategy allows moving up start times to avoid gaps in the OR schedule, should a preceding case be canceled. However, this strategy increases average patient waiting times on the day of surgery (5), which may decrease patient satisfaction. There have been many research studies evaluating causes of cancellations on the day of surgery (e.g., 610). However, actually monitoring case cancellation rates and determining change over time or differences among specialties is difficult. For example, when the intensive care unit (ICU) fills, there are many cancellations both for services whose patients require postoperative ICU care and for services using the postanesthesia care unit (PACU), the ICU overflow site. If the ICU fills once or twice a month, and cancellations are being compared from one month to the next, the cancellation rate may seem to vary markedly from month to month, leading to poor management decisions.
In research studies, when statistical methods are used to compare cancellations among groups, often the In the reality of clinicians and managers surgical suites, many cancellations result from nonmedical causes (e.g., full ICU, full PACU, surgeon unavailable, bad weather, or urgent cases). Whenever one of these nonmedical causes occurs, more than one case can be canceled. For example, at a university hospital with outpatient preoperative evaluation, when adults had their surgery canceled on the day of surgery, nonmedical causes were responsible for 80% of cancellations (9). At a Veterans Affairs Hospital, nonmedical reasons accounted for 67% of cancellations before the introduction of outpatient preoperative evaluations (6) and 81% of cancellations after a year of experience with this process (7). Among inpatients, 43% of cancellations were caused by nonmedical factors (11). Among all patients at a tertiary teaching hospital, 68% of cancellations had nonmedical causes (12). The issue likely is less relevant to pediatrics because percentages for nonmedical causes of case cancellation are lower than for adults: 15% in one study (10) and 33% in another (4). We studied several statistical methods for analyzing case cancellations to determine which methods can be used accurately for clinicians and managers routine monitoring needs. In the Discussion, we include a worked example demonstrating the recommended method so that readers can easily implement the findings of the study.
Type I and Type II Error Rates Type I errors occur when there are no true differences between groups, and yet statistically significant differences are detected. If the nominal chance of a type I error ( ) is set equal to 0.05 (i.e., P < 0.05 is significant), a test should not achieve significance more often than on 5% of occasions unless there are true differences between groups. Because decisions based on faulty analysis often result in the implementation of processes that waste everyones time (e.g., additional paperwork, phone calls, and laboratory and diagnostic testing), these type I errors can have a detrimental effect. Similarly, a type I error may lead some administrators to claim success at reducing cancellation rates when, in fact, there has been no change. Type II errors occur when significant differences are not detected, even though there are true differences between groups. Statistical power is high when type II error rates are low. For example, type II errors occur when some services suffer from full ICUs, but statistical tests show those cancellations do not differ significantly from those of other services. Evaluation of type II errors is relevant provided statistical methods have appropriate type I error rates.
Descriptions of Statistical Methods
Fishers exact test will have an appropriate type I error rate (i.e., equal to its nominal, correct value) when comparing rates of cancellations from medical events between groups (e.g., between services or between 6-mo periods). If P = 0.05 is considered significant, then 5% of comparisons should demonstrate a statistical change purely based on chance. The Statistical methods to analyze nonmedical causes of cancellations can consider variations in cancellation rates within and among short periods (14). The principal determinant of OR workload by subspecialty is the day of the week (15,16). Vacations, meetings, variations in clinics, etc., are often 2 weeks long. Consequently, we considered 4 weeks the shortest data collection period that would be used without considering variation by day of the week (1720). Our choice of 4 weeks was similar to previously published periods of multiples of months: 1 mo (8), 3 mo (7,9,10), 4 mo (12), and 6 mo (6,11). The statistical methods estimate the variance in cancellation rates among different 4-week periods and add it to the estimate of the variance in cancellation rates among cases within the same period. The Rao and Scott method (21) has the highest statistical power among competing methods for comparing two groups, without exceeding nominal rates (2123). Chen and Tipping (24) described an analogous method for modifying Clopper-Pearson confidence intervals. We used sets of six 4-week periods for our comparisons. A sample size of six is small statistically, but even that duration is the longest period pooled in practice when studying cancellations (612). Alternatively, the uncertainty in the true percentage cancellation rate within each of the 4-week periods can be ignored (25), and Students two-sample t-test with unequal variances applied to 2 samples of 6 numbers each. Confidence intervals for the means of single sets of six 4-week periods are calculated with the Student t distribution (Appendix). This approach has been used widely for the statistical analysis of other OR management data, including staffing costs (17,18), ORs in use at different times of the day (19), and OR workload for purposes of OR allocation (20). However, those values are not percentage cancellation rates with values that can be close to zero. The method may work poorly when percentages are nearly equal to zero. Consequently, we followed Shirley and Hickling (25) in using the Students t-test after transforming the percentages (26), using Equation (1) of the Appendix.
Testing Statistical Methods We used the above referenced research (3,4,69,12) and other papers to assure that the conditions simulated were realistic (Appendix). Computer simulation provided known, correct answers to which the results of the statistical methods could be compared. Simulated data to test the statistical methods were obtained using ARENA version 7.01 (Rockwell Software, Sewickley, PA). For each of eight different combinations of parameter values (Appendix; Tables 27), simulation output was counts of canceled and noncanceled cases for 65,000 4-week periods of 20 workdays. Because the cancellation rate was fixed over these 5,200 years of data, we could evaluate whether statistical tests would have a type I error rate exceeding 5% (the expected value) with a P < 0.05 criterion.
Visual Basic for Excel 2003 (Microsoft, Redmond, WA) was used for statistical analysis of the output of the ARENA simulations. Cancellations because of medical events alone were used to confirm our computer code because we knew all methods would perform well for these simulations. The Discussion includes limitations of the simulations and an example using real OR data.
We simulated 5 OR and 15 OR surgical suites to represent small and large facilities, respectively. Cancellation rates were 5.7% for the 5 OR surgical suites and 6.4% for the 15 OR surgical suites (Table 2). Rare events caused the cancellation of 1.0% of scheduled cases by causing events on 7.9% of days at the 5 OR surgical suites and 23% of days at the 15 OR surgical suites.
When testing for differences from one 4-week period to the next, both the
Rao and Scott (21) and Chen and Tipping (24) methods were more accurate than the Students t-test and analogous methods were generally accurate (Tables 46). The same finding was obtained when the counts were first transformed. The latter method had the smallest absolute difference from the expected 5% type I error rate for confidence intervals in the simulation of the five OR surgical suites with cancellations caused by rare events only (Table 6). In that circumstance, 19% of the 4-week periods had no observed cancellations, and 60% had 4 or less (see Discussion). Table 7 studies type II errors, as described in the first section of Methods. Statistical power to detect differences in cancellation rates between services was significantly higher for Students t-test applied to transformed counts than without transformation.
Statistical methods used in research of medical causes of events (e.g., Fishers exact test) should not be used by clinicians and managers in their routine monitoring of case cancellations because of their high type I error rates. Internal and benchmarking quality reports should use Students t-test and analogous methods applied to cancellation rates from 4-week periods after transforming the counts. Table 8 provides an example of the method using real data from an academic medical center. Table 8 also shows the usefulness of the method. The method can be implemented in a few lines of computer code and a spreadsheet (e.g., Excel). A manager can test the answer provided to him or her by computer software using small amounts of data (e.g., that in Table 8). Finally, the method is based simply on the numbers of canceled versus performed cases. Although we studied effects of different types of cancellations in this paper, the usefulness of the method is unaffected by the ability of a facility to track and categorize the reason for each of its case cancellations.
Different Statistical Methods We do not recommend using Fishers exact test or similar methods to compare cancellation rates when review of the data suggests that few of the observed cancellations were caused by rare events. A 1% cancellation rate attributable to rare events (Table 2) was sufficient to affect statistical methods markedly (Tables 36). Some surgical suites will have an incidence of cancellations caused by rare events of <1%. Yet, they are unlikely to know their true incidence because the upper bound on the incidence of cancellations from rare events cannot be estimated accurately using methods appropriate for medical events (Table 6). Thus, we recommend simply using Students t-test applied to transformed data for OR cancellations. Rao and Scott and Chen and Tipping methods performed worse than we expected (2124). Our results probably differed from those previously reported because the previous papers used sample sizes applicable to toxicology studies, not case cancellations. First, we studied only six 4-week periods versus toxicology studies with 30 or so litters of pups, for which those methods perform well. Our sample size of six was probably too small for accurate estimation of the variances in cancellation rates among 4-week periods. Second, we had hundreds of scheduled cases within each 4-week period versus toxicology with litters of 212 pups. Consequently, there was relatively little uncertainty in cancellation rates within 4-week periods, just uncertainty among periods. This pattern explains why Students t-test and analogous methods performed quite well. We did not study nonparametric methods such as Mann-Whitney-Wilcoxon (23) because parametric methods like Students t-test have higher statistical power, and, for our application, performed well after data transformation (Tables 4 and 5).
Limitations Cancellation rates likely vary among facilities, depending partly on the types of patients receiving care. For example, some published cancellation rates (including those on the day before surgery) include 4.6% for outpatients (9), 6.6% for outpatients (6), 9% for outpatients (11), 10% among outpatients (12), 10% among pediatric outpatients (10), 12% among plastic surgery patients (8), 13% overall (7), 17% among inpatients (11), 19% among inpatients (6), and 30% among inpatients (12). We studied cancellation rates on the day of surgery between 0.8% and 6.4% (Table 2). We recommend that our results not be applied by facilities lacking at least one observed cancellation in each of the 6 studied 4-week periods. We repeated the simulations with just rare events, using only two ORs and only the service with two-hour average case durations. Confidence intervals were created using Students t distribution with the transformation applied to six 4-week periods, as in Table 6. The 95% confidence intervals failed to contain the true cancellation rate for 11.9% ± 0.3% of comparisons. This unacceptably high type I error rate occurred because 57% of 4-week periods had no cancellations. We doubt that our inability to consider less than one cancellation every four weeks is a major limitation, because when the incidence is so infrequent, most clinicians and managers would be uninterested in quantifying cancellations.
Summary
Computer Simulation Discrete-event computer simulation (27) was used to represent the random flow of patients from ORs through the PACU. Each workday was simulated independently of all other workdays. Simulation was performed for 5 OR and 15 OR surgical suites. Scheduled case durations were described using different log-normal distributions for each of three services. Each service had a mean scheduled duration of 1.0, 2.0, or 3.0 h, with a common standard deviation of the logarithm of case duration in hours equal to 0.725 (28). After calculation, the scheduled durations were bounded between 0.3 and 1.9 h for the 1-h service, between 0.6 and 3.9 h for the 2-h service, and between 0.9 and 5.9 h for the 3-h service. The actual case durations were calculated using the method described by Kennedy (29) to include the differences between scheduled and actual case durations that were measured by Goldman et al (30). Specifically, actual case durations were set equal to the scheduled case duration multiplied by a normally distributed random number with a mean of 1.00 and sd of 0.25 (31). Each turnover time ("patient out" to "patient in") was assigned a time duration generated randomly from a log-normal distribution with mean ± sd = 0.30 ± 0.20 h, bounded between 0.17 and 1.50 h. Each OR in the surgical suite had two surgeons. The first surgeon completed his or her cases, followed by the second surgeon. The cases were divided randomly, with equal probability, between the two surgeons. Often this resulted in an unequal number of cases performed by the two surgeons in each OR. For the 5 OR and 15 OR surgical suites, 2 ORs and 5 ORs were allocated for the service with a mean duration of 1.0 h, respectively. Cases were scheduled sequentially using an 8-h workday. Adjusted use (OR time plus turnovers) was 83.7% ± 0.1% (se). For the 5 OR and 15 OR surgical suites, 2 ORs and 5 ORs were allocated for the service with a mean duration of 2.0 h. Adjusted use was 77.6% ± 0.1%. For the 5 OR and 15 OR surgical suites, 1 OR and 5 ORs were allocated for the service with a mean duration of 3.0 h. Adjusted use was 71.2% ± 0.1%. Cancellations caused by rare events were represented by the unexpected absence of a surgeon. Whether a surgeon was unavailable was determined by a Bernoulli distributed random number. If the surgeon was unavailable, all of that surgeons cases for the day, from the preceding paragraph, were canceled. The achieved risk of any one case being canceled from this cause was 1.0% (Table 2). Cancellations caused by medical causes were simulated by generating a Bernoulli distributed random number with a 0.8% probability. Such cancellations occured equally frequently for the three services, unlike cancellations caused by other causes. Cancellations caused by cases running late were used to represent cancellations from any cause providing correlation in risks within services. If a case was expected, from its scheduled duration, to finish more than 0.5 h after the end of the 8-h workday, the case was canceled. Cancellations caused by a full PACU were used to represent cancellations from any cause providing correlation in risks among services. Ten PACU beds were planned for the 5 OR surgical suite and 30 PACU beds for the 15 OR suite. Each patients time in the PACU was generated from a lognormal statistical distribution with a mean of 1.0 h and sd of 1.2 h, bounded between 0.5 and 3.0 h. If the PACU was full, discharges from ORs into the PACU were delayed in original sequence. A case was canceled if the patient was expected to enter the PACU more than 1.5 h after the end of the 8-h workday. Whether a case was canceled was determined in the sequence of medical cause, rare event, cases running late, and then full PACU. Freeman-Tukey Double Arcsin Transformation The Freeman-Tukey double arcsin transformation (26) equals
where c is the number of cancellations and n is the number of scheduled cases during a 4-week period. Table 8 gives an example of applying the transformation.
The inverse of the transformation is required only when calculating 95% confidence intervals for the cancellation rate, as in Table 6. Calculate the sample mean
where t is the inverse of the Student t-distribution with p-1 degrees of freedom. Report the value of
To calculate the inverse of the transformation, we use the bisection method in our Visual Basic for Excel code (32). For convenience, we show the steps for
Readers can check their implementation of the steps by using the transformed and nontransformed cancellation rates in Table 8.
Accepted for publication December 13, 2004.
This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|