| ||||||||||||||
|
|
|||||||||||||
Washington University Clinical Simulation Center, Washington University School of Medicine, St. Louis, Missouri
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
In earlier studies, we developed a scoring method for a set of exercises that were designed to assess the ability of trainees to accomplish key diagnostic and therapeutic actions during a brief, directed simulation encounter(1,9,10). This test battery, multiple encounter approach is similar in structure to the standardized patient assessments often used in medical schools and for testing clinical skills as part of licensure examinations(1316).
Our goals for this study were (a) to evaluate scenario content by determining trainee performance on individual exercises, (b) to provide further validation of a simulation-based acute care assessment, and (c) to compare the acute care skills of anesthesia trainees.
| Methods |
|---|
|
|
|---|
The student nurse anesthetists (n = 15) were recruited from two specialty-training programs. These individuals had completed their clinical education and were in the final days of their respective programs. The residents (n = 28) were recruited from a single residency-training program and were individually evaluated during a 2-mo period close to the end of their respective training year (either CA-1, CA-2, and CA-3). The residents were categorized into two groups based on clinical experience. The junior residents (CA-1; n = 12) had completed 2 years of graduate training. The senior residents (CA-2 and CA-3; n = 16) had completed at least 3 yr of training that included additional more advanced anesthesia subspecialty experiences, including more complex experiences in major vascular and transplant anesthesia, surgical and cardiovascular intensive care, obstetric anesthesia, pediatric anesthesia, and pain management.
All of the participants had more than 8 h of simulation laboratory experience at our training center. These experiences were primarily in small group (<10 participants) training sessions. Before beginning the individual session, each participant provided informed written consent for videotaped recording and subsequent analyses of their performances. The simulation exercises for each participant were conducted in a single 75- to 90-min individual training session that was supervised by a nurse or physician educator. The six exercises were presented in identical order to each participant. After every two simulation encounters, the supervising faculty member discussed the case management for the preceding exercises.
This study was conducted in a simulation laboratory that contains a sensorized life-size electromechanical patient mannequin developed by MEDSIM-EAGLE® (MedSim USA Inc., Ft. Lauderdale, FL).
Each participants performance was videotaped and recorded on a four-quadrant screen that included two separate video views of the provider and the mannequin. Two microphones are suspended from the ceiling to capture audio during the scenarios. The third screen of the four-quadrant video recording is the simultaneous full display of patient vital signs (electrocardiogram (ECG), pulse oximetry, inspired and expired gas monitoring, arterial blood pressure (BP), and central venous pressures). In the lower right quadrant of the screen, preceptors typed identifying information such as the date, participant, and scenario number. This quadrant of the screen could also be used to add information to clarify participant actions.
The general approach to scoring the scenarios included two analytic methods (checklist and key action) and a single global rating scale. For the analytic scoring, three raters scored each participants performances using a detailed checklist of diagnostic and therapeutic actions that was individually developed for each of the six scenarios(Table 1). An additional three raters used an abbreviated checklist system that consisted of three key actions for each scenario. The key action raters also provided a single global rating of the performance using a visual analog scale. The checklist scoring system included 1116 possible actions for each scenario (Table 1), and each was weighted based on its importance with respect to overall patient care. Faculty experts assigned item weights after a review of existing patient care practice standards and subsequent deliberations concerning the potential positive (or negative) impact of performing (or not performing) an action in the time allotted. The raters were asked to indicate whether or not a specific action described on the checklist had been achieved by the participant. Before scoring the exercises, the raters met to develop consistent end-points for each of the checklist and key actions. The highest cumulative weighted score defined the best possible performance in this scoring system. The maximum possible weighted score on the scenarios ranged from 14 to 22 points. To compare participant performances across scenarios, this score was converted to a percentage value based on the maximum number of attainable points.
|
The abbreviated or key action scoring system included three performance items for each exercise. The raters recorded whether participants performed each of the key actions. If the key action did not occur within 5 min, it was assigned a value of 0. The key action score for any given scenario could range from 0 to 3.
The three raters who used the abbreviated checklist (key actions) also provided a single global rating of each participants performance after each exercise. The raters were instructed to make a mark on a 10-cm horizontal line to indicate the overall level of performance. The rating system was anchored by the lowest value 0 (unsatisfactory) and the highest value 10 (outstanding). In advance, the raters agreed that a score of 7 or more would be considered a standard expected for a provider assuming independent responsibility for patient care.
Five anesthesiologists and one nurse clinician, divided into groups of three, independently rated the participants performances using the three scoring methods (checklist, key action, and global rating). Two faculty members and the nurse clinician scored the performances using the checklist; three faculty anesthesiologists used the key action and global scoring system. All of the raters independently observed and scored the residents performances from the videotaped recordings. The psychometric properties of the scores obtained from the analytic and global methods outlined above are provided in a previous set of studies of graduate physicians and anesthesia residents(1,10,11).
For each of the scoring systems (checklist, key action, and global rating), a two-way analysis of variance (ANOVA) was conducted to test the null hypothesis that there were no differences in performance among the senior residents, junior residents, and nurse anesthetists. For the three analyses, the independent variables were group (senior resident, junior resident, and nurse anesthetist) and case16. The dependent variable was the score (i.e., weighted checklist, key action, and global rating). Descriptive statistics (mean and standard deviation) were calculated, by study group, for each of the six scenarios and for each scoring modality. In addition, for the key action scoring, the percentage of each participant group who completed all three key actions in the 5-min period was calculated.
The reproducibility of the scoring was investigated using generalizability analysis(17). A trainees score can be influenced not only by their ability, but also by specific rater effects (stringency or bias in scoring), familiarity with scenario content, and scoring method. Generalizability analysis provides a method to evaluate rater and task (scenario) effects, including their interactions, and determine the magnitude of potential sources of variability in participant scores. These variance components can then be used to determine the reliability of participant scores as a function of the number of raters and number of scenarios.
| Results |
|---|
|
|
|---|
|
|
A variety of performance patterns were recognized for each of the events. In the initial scenario, only one half of the participants (21 of 42) recognized that the condition was anaphylaxis (heart rate [HR], 140 bpm; BP, 75/40; bronchospasm; O2 saturation, 85%) during the initial 3 min. After participants received a verbal prompt from the recovery room nurse indicating the patient had a rash(3 min), the majority of them (38 of 42) were able to diagnose anaphylaxis. Only 27 of 42 participants treated the condition with epinephrine.
The myocardial ischemia exercise required trainees to recognize that tachycardia (HR max, 130 bpm) and hypertension (BP max, 180/120) were associated with ST increase in the ECG (lead II ST-T wave increase, 2.7 mm). The diagnosis of myocardial ischemia was established by 29 of the 42 participants during the exercise. Almost all of the senior residents(15 of 16) were able identify the presence of myocardial ischemia, but less than half of the student nurse anesthetists (7 of 15) and CA-1 residents (5 of 12) were able to recognize myocardial ischemia. Despite failure to recognize the diagnosis, the majority of trainees initiated therapy to treat the tachycardia and hypertension.
In the atelectasis scenario, participants were expected to recognize that a definitive step to improve oxygenation was either to suction or provide larger tidal volumes. The intubated mannequin had an O2 saturation of 88%, decreased lung compliance, and reduced tidal volumes. Whereas more than half of the senior residents performed these actions (11 of 16), only 7 of 27 student nurse anesthetist and junior residents accomplished either of these definitive steps.
The stroke scenario required participants to recognize an intracerebral event in a postoperative patient. Eighteen of the 42 participants did not recognize that a cerebral vascular event had occurred in this bradycardic, hypertensive, and unresponsive simulated patient who had a dilated pupil. Trainees who recognized the diagnosis were also more likely to indicate the need for consultation and securing the airway.
In the final scenario, most of the trainees (34 of 42) recognized the need for reintubation after examining a tachypneic postoperative patient in respiratory failure. The mannequin was receiving 100% O2 with a nonrebreathing mask and had a respiratory rate of 28 breathes/min and O2 saturation of 75%. All 16 senior residents, 10 of 15 student nurse anesthetists, and 9 of 12 CA-1 residents reintubated the mannequin during the 5-min exercise.
In terms of overall performance, fewer than 20% of all participants were able to complete all three key actions for the stroke scenario. In contrast, the ventricular tachycardia scenario was managed effectively; more than 75% of all participants were able to complete the key actions, often in much less than the prescribed 5-min time period. Across all six scenarios, a larger percentage of the senior residents were able to complete all three key actions than either of the other two study groups.
ANOVA was used to test for specific differences in performance among groups and across scenarios. For the analysis based on the weighted checklist, the case by group interaction was not significant. This indicates that the relative performance of the individuals in each group did not vary as a function of the case and suggests that, whereas there are group differences, all groups found that the cases were similar in terms of individual case difficulty. However, there was a significant main effect attributable to group (F = 11.2; P < 0.01). This result reveals that, averaged over the six cases, there was a significant difference in mean scores among the senior residents, junior residents, and nurse anesthetists. A post hoc analysis (Scheffé test for multiple comparisons) revealed that the senior residents out-performed the nurses (mean difference = 11.2; P < 0.05). Although the junior residents also out-performed the nurses (mean difference = 5.0), this difference was not statistically significant. Finally, the senior residents out-performed the junior residents (mean difference = 6.2), but this effect was not statistically significant. There was also a significant main effect attributable to case (F = 17.5; P < 0.01). This indicates that, averaged over all study participants, the scenarios were not of equal difficulty.
The results for the other two scoring systems (global rating and key action) were similar to those for the weighted checklists. For these two analyses, there was no significant group-by-case interaction. This indicates that the group differences in performance were consistent across cases. The group main effects were all significant(Fglobal = 16.8; P < 0.01; Fkey action = 16.6; P < 0.01) revealing differential mean performance by group. Based on post hoc analysis of the global scores, the senior residents significantly(P < 0.05) out-performed the junior residents (mean difference = 1.0) and the nurse anesthetists (mean difference = 1.7). The differences observed between junior residents and nurse anesthetists (mean difference = 0.7) were not statistically significant. Similar results were found for the key action scores. That is, senior residents significantly out-performed junior residents and nurse anesthetists, but there were no statistically significant differences in scores between junior residents and nurse anesthetists. Similar to the weighted checklist analysis, there were significant main effects attributable to case (Fglobal = 8.8; P < 0.01; Fkey action = 13.6; P < 0.01). Averaged over study participants, the cases were not of equal difficulty.
Generalizability analysis was used to evaluate the sources of variance in scores and, in particular, to determine how reliable raters were in assigning scores and how consistent participants were in managing the scenarios. In this study, the variance attributable to the raters, and associated interactions, were relatively small. This indicates that the raters identified comparable scoring end-points for each event and were reasonably consistent in their assignment of scores for each exercise(Table 3). Although participants abilities varied depending on the content of the exercise, raters rank-ordered trainee performances in a near identical manner for each scenario. These rater variances were similar, and relatively small, whether analyzed across the entire participant group or within groups of participants (student nurse anesthetist, junior residents, and senior residents) (Table 3). This indicates that a trainees score is unlikely to vary as a function of the number of raters or the scoring method used to quantify the performance (Table 3). The largest variance component for the checklist, key action, and global scoring methods was related to the content of the exercises (trainee xscenario) (Table 3). Therefore, the reliability of the participants overall scores will be more dependent on the number of scenarios in the assessment as opposed to the number of raters for a given scenario. Overall, whereas the use of six encounters resulted in moderately reliable scores, additional performance samples would be required if more precise ability estimates were required.
|
| Discussion |
|---|
|
|
|---|
The scores obtained from individual scenarios provide a way to evaluate how well trainees perform in various types of encounters and to make some inferences about trainee skill in specific domains of practice. For example, most trainees, regardless of group, successfully managed the ventricular tachycardia scenario. Trainees in all three participant groups were able to recognize and effectively administer the prescribed treatment in less than five minutes from the onset of the arrhythmia and frequently in less than two minutes. Although most of our participants had never encountered a patient in ventricular tachycardia in the operating room environment, it would seem that their previous training in advanced cardiac life support prepared them to manage this condition in an intraoperative environment. The algorithms and arrhythmia recognition skills acquired in advanced cardiac life support training likely translated to enhanced performance in a simulation laboratory.
The comparable performance of the three groups on the ventricular tachycardia scenario also provides some evidence to support fairness in scoring models and, at least for this scenario, the content of the exercise. All three groups effectively managed this exercise, and many participants received the highest possible score. The requisite skills to obtain the maximum possible score were achieved by nurse and physician groups. If scenarios were designed to test in-depth knowledge as well as clinical skill, then group comparisons would be expected to favor the residents who have more extensive knowledge. Our goal in developing the evaluation was to evaluate requisite skills in acute care management, rather than measure in-depth knowledge of pathophysiology of disease process.
Unlike the ventricular tachycardia scenario, participants did not effectively manage some of the exercises. Two postoperative scenarios, stroke and anaphylaxis, were more difficult for all participants, regardless of previous training. Clinical findings in these two simulations (stroke = increased BP, bradycardia, and unresponsiveness in addition to a dilated left pupil; anaphylaxis = bronchospasm, tachycardia, and hypotension) were not subtle but seemed to be more difficult for participants to identify and subsequently manage. These results indicate that simulation-based assessment might be helpful to identify deficits in skill acquisition during training. If training strategies using simulation were available, then participants could potentially manage these conditions as well as the ventricular tachycardia scenario. The complexity of many conditions and non-uniform treatment algorithms may make training strategies more difficult to develop for these events when compared with more straightforward scenarios such as the ventricular tachycardia exercise. An alternative explanation for the performance deficits might be that these two conditions were simply modeled in a manner that made it more difficult for all providers to recognize and treat. More study of these exercises, and additional scenarios of related content, are required to determine if the results are generalizable to other acute care postoperative conditions or if similar performance deficiencies are found in graduates of other training programs.
There were several limitations to our study that warrant discussion. Trainees managed the scenarios in the same order and received performance feedback about their performance during the study period. If a simulation-based assessment is to be used as a summative evaluation method, then steps to enhance the security of the exercises and standardize feedback during the evaluation would need to be implemented in future studies. The majority of our raters (five of the six raters) were recruited from the faculty at the training site used by the residents and student nurse anesthetists. Trainee scores may be subject to rater bias or "halo" effects particularly when raters are aware of the training level of participants. This bias might be manifest by variances in scores recorded by blinded and unblinded raters. The more variation that there is among raters in scoring actions, the greater the potential is for this type of bias. However, if variances among raters are minimal, then a small number of raters can be used to establish a reliable score. Fortunately, in this study, as in our previous studies, the variance among raters scores was small, indicating that regardless of whether we used a single rater(blinded or unblinded) or the mean ratings of multiple raters, the trainees overall assessment score would be similar. Simple, unambiguous scoring systems with defined end-points for performance may be important to decreasing the potential for rater bias.
A simulation-based assessment may be a valuable tool for understanding the relationship between a specialists training and clinical experiences in developing and maintaining skill. The senior residents with more training and experience performed better than the junior residents and nurse anesthetists. If experience were a key requirement to developing requisite skills, then additional practice experience beyond clinical training would potentially narrow the differences in performance among groups. If training were an essential requirement to develop requisite skill, then differences between the senior residents and nurse anesthetist group would be expected to persist beyond training.
The skills required to manage these modeled situations are relevant for both nurse anesthetists and anesthesiologists, but the content domain of acute care is certainly more expansive than represented by the six scenarios modeled in this study. Therefore, replicating this study with additional scenarios would be valuable. The resident and nurse participants represent trainees from a small number of training programs. As a result, it is unclear whether the results of our investigation will generalize to trainees in other programs. By increasing the number of performance samples for each participant as well as the number of trainees, a more detailed analysis of the content and nature of a simulation-based assessment could be provided. This information could be used to assess skill acquisition during training and to develop training and assessment strategies using life-sized mannequins.
At present, there are few, if any, methods available to determine whether a professional has the skills required to manage complex, high-acuity events(1824). A simulation-based assessment strategy could be developed for critical events, but additional studies that explore content domain and fidelity of the exercises are required. A key goal of future investigations will be to explore the relationship between a providers skill managing simulated patients and associated measures of clinical performance.
| Footnotes |
|---|
Supported, in part, by the Foundation for Anesthesia Education and Research: Education Grant.
Address correspondence and reprint request to David Murray, MD, Washington University Clinical Simulation Center, Washington University School of Medicine, Box 8054, 660 South Euclid, St. Louis, MO 63110. Address e-mail to murrayd{at}notes.wustl.edu.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
G. K. Lighthall and J. Barr The Use of Clinical Simulation Systems to Train Critical Care Physicians J Intensive Care Med, September 1, 2007; 22(5): 257 - 269. [Abstract] [PDF] |
||||
![]() |
M. Srinivasan, J. C. Hwang, D. West, and P. M. Yellowlees Assessment of Clinical Skills Using Simulator Technologies Acad Psychiatry, December 1, 2006; 30(6): 505 - 515. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|