A New Methodology for Evaluating Radiologist Error Rates: Factoring in the Complexity of Study and Potential for Significant Pathology By Frank E. Seidelmann, D.O. and Douglas C. Ward The time, multitude of anatomic structures demonstrated on an imaging study, and potential for pathology are not equal for all imaging studies. The inherent information content significantly varies between modalities, based upon the technology involved in producing the images, from extremely basic to extremely complex. The training and experience of radiologists significantly varies, and certain types of advanced imaging modalities and specific body parts can only be interpreted by subspecialty radiologists. The 2010 Radisphere error rates were analyzed, evaluated, summarized in detail, and presented to Radisphere professional and medical leadership during the winter and spring of 2011 in a presentation entitled: Radisphere Clinical Error Analysis: 2010 QA / Peer Review. Errors: How, What, Who, and What To Do? As part of this in-depth evaluation, a new model was developed for comparing the error rates of radiologists, not based on the volume of cases interpreted or RVUs. This new model provides a methodology for assigning a Complexity Rating for the interpretation of a study, Integrating the potential for making an error of omission of Significant Pathology, (CRISP). This paper describes the complexities in interpretations, explaining how all errors cannot be rated on a similar basis, the process that the authors undertook in the creation of a new rating system of errors, and a method to rate radiologists so that all radiologists are equally compared, no matter what type or volumes of studies they are individually reading. Failure of Observation The Checklist Manifesto by Atul Gawande, M.D. i provided insight into how to deal with complex issues. The following review by The Independent clearly demonstrates the benefit of a logical review of complex subjects: Avoidable failures continue to plague us in healthcare, government, the law, the financial industry in almost every realm of organized activity. And the reason is simple: the volume and complexity of knowledge today has exceeded our ability as individuals to properly deliver it to people consistently, correctly, safely. Dr. Trafton Drew, a psychologist at Harvard Medical School, modified the classic 1999 Simons and Chabris gorilla experiment ii and adapted it for a CT chest study being interpreted by skilled observers (radiologists). iii Dr. Drew superimposed a matchbox-sized gorilla on one slice of five CT chest images. The radiologists were asked to identify pulmonary nodules, lesions which are suspicious for cancer. 80% of the radiologists and 100% of non-radiologists completely missed the gorilla. During this experiment, an eye-tracking device monitored the movement of the radiologist s eyes, confirming that they had looked at the gorilla, but failed to actually recognize what they had observed. Dr. Drew s explanation for the radiologist s performance is: Part of the reason that radiologists are so good at what they do is that they are very good at narrowly focusing their attention on these lung nodules. And the cost of that is that they are subject to missing other things, even 2013 Radiology Quality Institute 1
really obvious large things like a gorilla. Professor Simons, who authored the original study, explained that this is not unique to radiologists: We re aware of only a small subset of our visual world at any time. We focus attention on those aspects of the world that we want to see. By focusing attention, we can filter out distractions. But in limiting our attention to just those aspects of our world we are trying to see, we tend not to notice unexpected objects or events. As radiologists, we have known about this phenomenon during our entire professional careers. Radiologists have referred to this phenomenon as tunnel vision. As residents doing barium enemas (colon examinations requiring rectal introduction of liquid contrast material), we looked closely for colon pathologies (polyps, cancer, etc.) but often missed large destructive cancerous processes in the bones, because of the intense focus to find colon pathology. Similarly, when a referring clinician provides a clinical diagnosis or requests the radiologist to rule out a specific pathology, the clinician has inadvertently directed the focus of the radiologist. Seasoned radiologists have learned that once they have identified specific pathology, they should then ignore it and look for other findings. But this alone will not always provide a guarantee that the radiologist will find the problem. In 2010, Atul Gawande, M.D. gave a special lecture at RSNA entitled Real Reform: Facing the Complexity of Health Care. He lectured on two types of errors: Failure of Ineptitude and Failure of Ignorance. Failures of ineptitude are often failures of observation, secondary to not following a structured format or checklist. The Radiologic Society of North America has endorsed structured reporting to achieve consistency with similar information content and, more importantly, to avoid errors of ineptitude. Simply stated: structured reporting helps avoid failures of observation. Radisphere has adopted and developed structured reporting (lexicons) as a standard of practice to avoid errors of ineptitude or the failure to systematically review a set of images. Radisphere s lexicons have been developed over a 10 year period with over 10,000 man hours involved in their evolution. The purpose of the lexicons is to provide both consistent reporting across radiologists, and more importantly, to provide systematic guidance to radiologist examinations. The lexicon functions as a checklist, verifying that every pertinent anatomic structure is reviewed and scrutinized, so that the gorilla is not missed. Radisphere has produced 248 lexicons. Radisphere s lexicons are composed of fields, containing statements of normal anatomy. Each field is basically an entry in the checklist. The radiologist must go through each field as a checklist and confirm that each field, stating normal anatomy, is correct. The lexicon forces the radiologist to systematically evaluate each study in a consistent manner. The number of fields is a proxy for the complexity of interpreting a study. The more fields in the lexicon, the greater the complexity of the study. Counting the number of fields for each lexicon provided a Complexity Rating. A CT Abdomen examination has 21 fields or a complexity rating of 21, while an X-ray Abdomen examination has 6 fields or a complexity rating of 6. A modifier for integrating the potential for omission of serious pathology was developed. Certain anatomic regions have a low risk for significant pathology, while other anatomic regions have the potential for serious life threatening pathologies. For example, a hand examination, independent of the modality used to image the hand, has a very low risk of serious pathology. This can be generalized to most extremity imaging; however, the more proximal to the torso (i.e. the closer to the body), the greater the risk for more serious pathology. Very significant life threatening pathology, however, is often present in examinations of the brain, chest, abdomen, and pelvis. 2013 Radiology Quality Institute 2
A rating scale for significant pathology was developed. It was the authors belief that the increasing risk of significant pathology and the risk of omission are not linear functions; but rather, the risk of omission exponentially worsens with increasingly significant pathologies, putting the patient at greater risks of morbidity and mortality. The following scale was agreed upon by the investigators. 1 - Very Low No risk of morbidity 2 - Low Trauma, infection (treatable by medical treatment or superficial I&D) not anticipated to result in long term disability or morbidity. 4 - Medium Trauma, infections (non-critical anatomy, which may undergo medical treatment or I.R. drainage or open surgical drainage), degenerative disease, inflammatory disease, benign tumors, not likely to cause morbidity, but may result in minor disability. 12 - High Vascular events (hemorrhage, infarctions, thrombosis, dissections, high grade stenosis) of organs that may not or are unlikely to result in immediate death (lung, liver, spleen, kidney, mesentery, extremities), life threatening infections of anatomy with the potential for spread to critical organs, which require surgery, malignant tumors, life threatening trauma, with potential for possible long term disability or eventual mortality (within weeks to months). 20 - Very High Vascular events (hemorrhage, infarctions, thrombosis, dissections, high grade stenosis) which involve critical life threatening organs (brain, heart) which are likely to result in immediate death, infections of critical anatomy requiring immediate surgery (spine, brain), malignant tumors with involvement of vital organs, producing life threatening compromise, life threatening trauma (if undetected would have the potential for short term mortality, within hours to days). Multiplying the Complexity Rating times the Specific Pathology modifier resulted in a CRISP rating for the CPT codes of 248 types of procedures. A sample of the CRISP Ratings is presented in Table 1 within the Appendix. The 2010 Radisphere Clinical Error Analysis was performed with the goal of fully understanding errors and assessing changes that could be made to lower error rates. The specific goals of the 2010 analysis of errors were to determine: how errors were made, what studies have the greatest risk of errors, who was making the errors, and what could be done to lower the error rates. Radisphere s rating of errors is a modification of the ACR s RadPeer rating system. 2B - Subjective variance or difficult diagnosis, not ordinarily expected to be made; could possibly be clinically significant. 3A - Diagnosis should be made most of the time; unlikely to be clinically significant. 3B - Diagnosis should be made most of the time; could possibly be clinically significant. 4A - Diagnosis should be made almost every time; a misinterpretation of findings is unlikely to be clinically significant. 4B - Diagnosis should be made almost every time; a misinterpretation of findings could possibly be clinically significant. 2010 saw 491 errors of 2B, 3B, and 4B errors out of 1.1 million total studies performed. A database was developed for analysis. The review of charts included reviewing the original interpretations ( the report ), the QA committee letter to the radiologist, and addendum reports if provided. Obtaining complete information was not possible for all errors due to the systems in place at that time. Complete data was available on 336 cases which represented 68% of all 2B or higher errors. 2013 Radiology Quality Institute 3
Understanding Where Errors Come From Atul Gawande, M.D. postulated that 80% of errors are errors of ineptitude (mistakes we make because we don t make proper use of what we know) and 20% of errors are errors of ignorance (mistakes we make because we don t know enough). Our study results revealed that 70% of errors were of ineptitude and 30% ignorance. Further, 65% of the errors of ineptitude were due to the radiologist not using the lexicon as a checklist, but rather as a template of normal statements. 7.4% of the errors would not have been caught by using the checklist. The Radisphere analysis demonstrated that the studies with the highest errors included OB ultrasound, CT abdomen/pelvis, CT Chest, and CT Brain. CT errors by body part were: 51% abdomen; 22% brain; 11% spine; and 7.6% chest. The CRISP rating system was applied to the radiologists to level the playing field of performance so that all of the radiologists errors could be evaluated comparing an equal amount of cases weighted for complexity and potential for significant pathology. Based on the then existing lexicons, the average CRISP index rating of complexity of studies read per radiologists varied from a low rating of 82 to the highest rating of 192. The average CRISP index rating per radiologist was multiplied by the total number of studies read for the year to obtain a total CRISP volume for the year. Total yearly CRISP volumes allow for comparison of radiologists reading studies of varying complexity and time requirements. Essentially, it is equivalent to saying a ton of feathers, versus a ton of bricks, both weigh a ton. Radisphere Clinical Error Analysis: 2010 QA / Peer Review. Errors: How, What, Who, and What To Do? provided an in-depth analysis of errors going far beyond a simple tabulations of numbers of errors per radiologist. This analysis underscored what radiologists have always known: more complex studies are harder to interpret, take longer in time, and have a greater risk of making a mistake. During the analysis of our errors, the question was asked: Is it possible to predict quantitatively which studies are of the greatest risk for a radiologist to make a significant error? CMS has attempted to quantify the time and skill it takes to interpret studies and provide reimbursement with the Relative Value Units (RVU) methodology. RVU s do not accurately reflect the complexity in reading a study or the potential for omission of serious pathology. A different methodology for assessment of the complexity of reading a study and for predicting the potential for making a serious error was needed. Quantifying the complexity was now made easy by the development of lexicons, or structured reports. The checklists could serve as a proxy for complexity. A modifier was necessary to further distinguish interpretations of complex studies with potential for serious pathology. A modifier which exponentially increased with increasing potential for serious pathology was created. This resulted in a semi-quantitative rating for each study. Our utilization of the CRISP method for evaluating studies resulted in a much more significant variance between studies, as compared with RVU s. For example, a CT Soft Tissue Neck without contrast has an RVU of 1.3, while a CT Maxillofacial Sinuses also has a RVU of 1.3. By comparison, the CRISP index rating in our study was 564 for the CT Soft Tissue Neck, while for the CT Maxillofacial Sinuses the CRISP index rating was 36. We believe that the CRISP methodology of semi-quantitative analysis of studies provides two very significant benefits for avoiding errors and assessing quality within a radiology group: predictive risk assessment and uniform quality assessment of radiologists, both of which can improve a radiologist s performance and reduce errors. 2013 Radiology Quality Institute 4
Predictive Risk Assessment Using the CRISP method of evaluating the risk of all studies, a risk assessment can be assigned to all CPT codes. The riskiest top 25 studies can therefore be determined. CRISP risk assessment would have predicted the actual error rates which occurred in the 2010 analysis. Using this information, Radisphere undertook two initiatives: Risk Alerts; and pro-active Selective Monitoring of cases by a second radiologist. Risk Alerts: Radisphere developed a proactive program alerting radiologists to the fact that they are reading a risky study. Risk alerts with must not miss guidance was hard wired into the radiologist information system (radii ). Selective Monitoring: Radisphere has piloted a program where the riskiest cases are viewed real time by a second radiologist, excluding or noting only the most significant pathology that should not be missed. This program is entitled: Selective Monitoring of Accuracy with Reporting Timeliness (SMART). A concurrent reading radiologist is requested to review a study for the exclusion of predetermined significant pathologies, which should never be missed. This was piloted for CT Brain examinations. The concurrence interpreter was instructed to exclude (or identify) the following: No Calvarial Fracture; No acute CVA; No Intracranial Hemorrhage. A checklist for each exclusion was provided, with a box for a free form note from the concurrence radiologist if a positive finding was noted. The concurrence radiologist was a second set of eyes but did not provide a final report. The SMART review was provided to the interpreting radiologist, who then provided the final report with the added benefit of that second set of eyes. Uniform Quality Assessment of Radiologists All complex endeavors in which participants engage with different skill levels or perform tasks of varying complexities have a handicap system to level the field of comparison. This is usually seen in competitive sports, such a golf, Olympic diving, or sailboat racing where boats of different length and speed are giving a handicap rating so that they may competitively race each other. CRISP, in essence, is a radiologist performance handicapping system. 2013 Radiology Quality Institute 5
Based upon the results of evaluating radiologists using the CRISP model, a program was designed to improve the performance of radiologists in the lower tier of performers. Radiologist Profile Adjustment: Radiologists were individually evaluated for their performance on risky examinations. Individual radiologist reading profiles were modified removing the riskiest studies. Radiologists were informed of the types of studies removed from their reading profile and were instructed to undertake additional study or CME training. Radiologists who did undergo additional training or study had their profiles re-instated to include some of the riskier studies. The radiologists were also placed on an ongoing professional focused review to confirm improvement and reduction in errors. Other radiologists have had a permanent reduction in the complexity of the studies that they are interpreting and have become successful productive radiologists with a lowered CRISP index rating. The lowering of the individual CRISP index rating for the low tier performers decreased their error rates and resulted in improved professional satisfaction for the radiologist, making them valued consultants to referring physicians. Summary A method for evaluating radiologists that adjusts for the complexity of the type of studies interpreted is necessary. The CRISP model has demonstrated value in providing important information which can reduce errors, fairly assess the quality of a radiologist s work, and help radiologists to continually improve. Radisphere has integrated the CRISP model and SMART program into its quality assessment and improvement efforts as works in progress and is committed to continual program refinement. About Radiology Quality Institute Founded by Radisphere, RQI is a collaborative research organization dedicated to the identification and promotion of radiology quality standards and process improvements. With access to Radisphere s extensive quality data, analytics, and outcomes, the Institute is focused on developing performance benchmarks and sharing relevant information to deliver measurable improvements in radiology quality for unparalleled levels of patient care. As the leading provider of standardsbased radiology delivery solutions for more than 100 clients in 28 states, Radisphere is transforming the practice of radiology at health systems by establishing measurable performance standards and accountability for diagnostic accuracy, appropriate utilization, service level excellence and patient care. About the Authors Frank E. Seidelmann, D.O. is a diplomat of the American Board of Radiology, with a C.A.Q. in Neuroradiology and 36 years of academic and private practice experience. He is also Co-founder, Chairman of Radiology, and Chief Medical Officer of Radisphere. Douglas C. Ward is Radisphere s Director of Professional Innovation and co-investigator of the Radisphere Clinical Error Analysis: 2010 2013 Radiology Quality Institute 6
Appendix CRISP Analysis - CT Average CRISP CT 152.7 Study Modality Complexity Rating Potential For Serious Pathology CT Abdomen With CT 21 20 420 CT Abdomen With & Without CT 21 20 420 CT Abdomen Without CT 21 20 420 CT Abd / Pel With CT 20 20 400 CT Abd / Pel With & Without CT 20 20 400 CT Abd / Pel Without CT 20 20 400 CTA Brain CT 17 20 340 CTA Abd Arteries, LowerExt CT 27 12 324 CTA Chest CT 15 20 300 CTA Neck With CT 15 20 300 CT Soft Tissue Neck With CT 21 12 252 CT Soft Tissue Neck With & Without CT 21 12 252 CT Soft Tissue Neck Without CT 20 12 240 CT Chest With CT 11 20 220 CT Chest With & Without CT 11 20 220 CT Brain With CT 10 20 200 CT Brain With & Without CT 10 20 200 CT Brain Without CT 10 20 200 C T Cervical Myelogram CT 12 12 144 CT Cspine With CT 12 12 144 CT Cspine With & Without CT 12 12 144 CT Cspine Without CT 12 12 144 CT Tspine With CT 12 12 144 CT Tspine With & Without CT 12 12 144 CT Tspine Without CT 12 12 144 CT Chest Without CT 11 12 144 CT Pelvis With CT 11 12 132 CT Pelvis With & Without CT 11 12 132 CT Temporal Bones With CT 31 4 124 CT Temporal Bones With & Without CT 31 4 124 CT Temporal Bones Without CT 30 4 120 CT Pelvis Without CT 9 12 108 CT Temporal Bones With & Without (IAC s) CT 18 4 72 CT Temporal Bones With (IAC s) CT 18 4 72 CT Temporal Bones Without (IAC s) CT 16 4 64 CT Lumbar Myelogram CT 14 4 56 CT Knee With CT 12 4 48 CT Knee With & Without CT 12 4 48 CT Lspine With CT 12 4 48 CT Lspine With & Without CT 12 4 48 CT Knee Without CT 11 4 44 CT Lspine Without CT 11 4 44 CT Foot CT 21 2 42 CT Shoulder CT 9 4 36 CT Ankle CT 8 4 32 CT Orbits With CT 15 2 30 CT Orbits With & Without CT 15 2 30 CT Orbits Without CT 14 2 28 CT Facial Bones With CT 6 4 24 CT Facial Bones With & Without CT 6 4 24 CT Facial Bones Without CT 6 4 24 CT Maxillofacial Sinuses CT 12 2 24 CT Calcium Scoring CT 7 2 14 CT Orbits MRI Clearance CT 4 2 8 Table 1 - Sample CRISP Index (CT) CRISP Index 2013 Radiology Quality Institute 7
Endnotes i. Atul Gawande, M.D. is a general and endocrine surgeon at Brigham and Women s Hospital in Boston, Massachusetts and associate director of its Center for Surgery and Public Health. He is also an associate professor at the Harvard School of Public Health and an associate professor of surgery at Harvard Medical School. He has written extensively on medicine and public health for The New Yorker and Slate and is the author of the books Complications, Better, and The Checklist Manifesto. ii. http://www.theinvisiblegorilla.com/ ii. http://search.bwh.harvard.edu/new/presentations/psychonomics2012_drew_vo.pdf 2013 Radiology Quality Institute 8