Hur utvärderar man klinisk bildkvalitet med statistiska metoder?


 Rudolf Cannon
 1 years ago
 Views:
Transcription
1 Hur utvärderar man klinisk bildkvalitet med statistiska metoder? SKkurs, Medicinsk strålningsfysik, 8 okt 2013 Sammanhang Val av behandling Hälsoeffekt Behandlingseffekt Undersökning Örjan Smedby Radiologi IMH/CMIV Linköpings universitet Efficacy of Diagnostic Methods Level 1: Technical efficacy Technical, resolution, noise... Level 2: Diagnostic accuracy efficacy Hur ofta blir diagnosen rätt? Level 3: Diagnostic thinking efficacy Hur påverkas remittentens diagnostiska tänkande? Level 4: Therapeutic efficacy Hur påverkas valet av behandling? Level 5: Patient outcome efficacy Hur påverkas patientens hälsa? Level 6: Societal efficacy ytta och kostnader för samhället (Fryback DG, Thornbury JR. Med Decis Making 1991) Image vs. diagnostic accuracy entire diagnostic process Reliable ground truth RC study physical parameters Physical measuring tools Classical statistical tools Receiver operating characteristics Generalization of sensitivity and specificity How is sens. and spec. affected as threshold is changed? Ett diagnostiskt test Pos test eg test Summa Sjuk 25! 5! 30 Frisk 15! 105! 120 Summa Hur stor är chansen att en sjuk klassificeras rätt? Sensitivitet 25/30 = 83%
2 Ett diagnostiskt test Pos test eg test Summa Sjuk 25! 5! 30 Frisk 15! 105! 120 Summa Hur stor är chansen att en frisk klassificeras rätt? Specificitet 105/120 = 88% Ett diagnostiskt test Pos test eg test Summa Sjuk 25! 5! 30 Frisk 15! 105! 120 Summa Hur stor är sannolikheten att en pat med pos test verkligen är sjuk? Positivt prediktionsvärde 25/40 = 63% Ett diagnostiskt test Pos test eg test Summa Sjuk 25! 5! 30 Frisk 15! 105! 120 Summa Hur stor är sannolikheten att en pat med neg test verkligen är frisk? egativt prediktionsvärde 105/110 = 95% Tröskelnivå Högre gräns för patologi:  sensitiviteten sjunker  specificiteten ökar Lägre gräns för patologi:  sensitiviteten ökar  specificiteten sjunker sensitivitet specificitet RCcurve Receiver operating characteristics sensitivity Area under RC curve (AURC): 1 perfect 0.5 worthless 1 specificity Generalization of sensitivity and specificity How is sens. and spec. affected as threshold is changed? Requires gold standard Requires large material Much work, large costs
3 Image vs. diagnostic accuracy Image vs. diagnostic accuracy entire diagnostic process physical parameters entire diagnostic process Visual image concept physical parameters Reliable ground truth RC study Physical measuring tools Classical statistical tools Reliable ground truth RC study Visual grading experiment? Physical measuring tools Classical statistical tools Single images Rate image A on a scale from 1 to 5 Study types Image pairs Rate the difference between image A and B on a scale from 2 to +2 Typical: visibility of an anatomical structure Visually sharp reproduction of the thoracic aorta 1. Criterion is fulfilled 2. Criterion is probably fulfilled 3. Indecisive Criteria & rating scale 4. Criterion is probably not fulfilled 5. Criterion is not fulfilled Situation Types of data Patient P1 P2 P3 P4... Im1 Im2 Postprocessing PP1 PP2 PP3 bserver score Interval: numerical, continuous rdinal: ordered categories ominal: individual categories, no order I Measurement Rating score Persons A B C D
4 Visual grading characteristics (VGC) (Båth & Månsson BJR 2007) För varje kvalitetsnivå: Hur stor andel uppfyller kravet med metod A resp. metod B? Metod A Metod B Figure 2. The visual grading characteristic (VGC) curve from the data presented in Tables 1 and 2. The boxes represent the operating points corresponding to the observer s interpretation of the scale steps of the rating scale. Patient 5 Discussion Statistical model system I Settings IC is a visual grading method for which valid statistical methods have been used most often previously. The dissatisfaction from the fact that the observer can only use processing a twostep rating scale in IC (criterion fulfilled/criterion not fulfilled) often leads to the use of VGA, enabling the use of multiple scale steps, although invalid statistical methods are often used. The use of VGC analysis can hopefully satisfy the needs for both a valid statistical bserver method and freedom for the observer. Furthermore, VGC analysis can be used directly on the image criteria defined by the European Commission giving statements of the needed levels of reproduction for certain anatomical landmarks without the need for extracting the relevant structures from the criteria and grading the visibility of these structures. This has the potential of leading to an increased validity in the use of the image Örjan criteria in Smedby, multiplechoice Linköping Univ. / Radiology (IMH) grading studies. However, VGC analysis is not limited to the use of European criteria. Modifications of the original criteria have been proposed for chest radiography [16], lumbar spine radiography [33] and mammography [23, 34] and these modified criteria as well as other relevant criteria may meritoriously be used. Furthermore, the grading task is not limited to normal anatomy. If applicable, grading of image criteria based on pathology may also be used. Postprocessing bserver VGC analysis consists of elements from both IC and relative and absolute VGA as well as from RC analysis. The concept of VGC analysis can be interpreted as IC meets RC with the VGC curve presenting the ICS B (the proportion of images rated as fulfilling a criterion for modality B) as a function of the ICS A (the proportion of images rated as fulfilling a criterion for modality A) for a grading task, just like the RC curve describes the TPF (the proportion of images rated as containing a signal for the positive images) as a function of the FPF (the proportion of images rated as containing a signal for the negative images) for a detection task. (ne important difference between the two curves being that the RC curve describes an observer s ability to separate the signal and noise distributions belonging to one modality from each other, whereas the VGC curve describes the observer s opinion about the separation of the image distributions from two modalities.) For the observer, the resulting study is similar to absolute VGA with the use of a multistep scale for grading the image. The resulting measure of image, AUC VGC, is finally, like in relative VGA, a relative measure of image, describing the image for modality B in comparison with modality A. Using the statistical methods of RC analysis, VGC analysis presents a solution to the need of nonparametric rankinvariant statistical methods for analysing the data from visual grading studies. The use of the RC technique for comparing data from studies other than detection tasks has been proposed previously. Sonn and Svensson [25] studied changes in activities of daily living (ADL) measured by a 10level scale, the Staircase of ADL, in rehabilitation medicine and used the RC curve to analyse the M Båth and L G Månsson Im1 Im2 Im3 systematic change in ADL levels between two age groups. The use of the RC technique, enabling a statistically valid analysis of data, can probably be applied to many other rating10 tasks Strengths of VGC system I Settings Post? Weaknesses of VGC The value of the AUC VGC can be criticized for the same reason as the Az can be questioned in RC analysis. The index A z is useful in most cases because it reflects accuracy in general through a range of possible operating points [35]. However, doubts have been expressed by some investigators concerning the fact that a large part of the area comes from the rightmost part of the curve and thereby include false positive fractions of limited or no clinical relevance. Also, crossing curves can cause confusion; one curve may have higher TPFs than another in the region of relevant FPFs, but if the curves cross for higher FPF values, the superiority for the first curve may be lost or even reversed if the area under each curve is used as an index of accuracy [27, 36]. In the same way a large part of the area of the VGC curve comes from a part of the curve which corresponds to a very low threshold of the observer for judging a criterion of being fulfilled possibly corresponding to an unacceptable image. The VGC curve It is important to realise that a VGC curve is completely determined by the two underlying distributions of the modalities being studied (in the same way as 174 The British Journal of Radiology, March 2007 regression score Statistical model Patient Logistic regression score Logit function logit (p) = log (p/(1 p)) Regression equation logit (p) = ax + b p = 1/(1 + exp(ax + b)) rdinal regression Statistical model Logit function logit (p) = log (p/(1 p)) Patient Regression equation logit (p) = ax + b p = 1/(1 + exp(ax + b)) VGR model logit (P(y n)) = a 1 Im1 +a 2 Im2 + b 1 PP1 +b 2 PP2 +b 3 PP3 + D P +E C n Im1 Im2 Im3 PP1 PP2 system I Settings Postprocessing bserver regression score (Smedby & Fredrikson, British Journal of Radiology 2010)!
5 random effect Im1 Im2 Im3 fixed effect PP1 PP2 fixed effect Statistical model Patient system I Settings Postprocessing regression score Empirical data (Jakob De Geer) Coronary CTA 24 patients (P1 P24) Standard (310 mas Ref) and reduced dose (62 mas Ref) Reduceddose images postprocessed with 2D adaptive filter (Sharpview) Filtered and unfiltered reduceddose images viewed by 9 radiologists (R1 R9) bserver random effect Criteria Criterion 1: Visually sharp reproduction of the thoracic aorta. Criterion 2: Visually sharp reproduction of the wall of the thoracic aorta. Criterion 3: Visually sharp reproduction of the heart. Criterion 4: Visually sharp reproduction of the left main coronary artery (LMA). Criterion 5: The image noise in relevant regions is sufficiently low for diagnosis. Rating scale 1.Criterion is fulfilled 2.Criterion is probably fulfilled 3.Indecisive 4.Criterion is probably not fulfilled 5.Criterion is not fulfilled Statistical model Results: filter effect Patient Postprocessing bserver unfiltered filterered regression (GLLAMM) score Criterion 1: Visually sharp reproduction of the thoracic aorta 2: Visually sharp reproduction of the aortic wall rdinal regression regression coefficient p value < : Visually sharp reproduction of the heart : Visually sharp reproduction of the LMA : oise sufficiently low for diagnosis 0.96 <
6 Including mas effect Both standarddose and reduceddose images were viewed, reduceddose images with and without filtering Postprocessing unfiltered filterered Statistical model with mas Patient I log mas setting bserver regression score Statistical model with mas etc. Dose reduction I Weight 1.0 Criterion Postprocessing unfiltered filterered Patient I log mas setting bserver Education regression (GLLAMM) score Probability of a score of 1 or mas Ref setting Unfiltered Filtered Results with mas Regression coefficients Criterion log (mas) adaptive filter 1: Visually sharp reproduction of the thoracic aorta : Visually sharp reproduction of the aortic wall : Visually sharp reproduction of the heart : Visually sharp reproduction of the LMA : oise sufficiently low for diagnosis Results with mas Regression coefficients Estimated Criterion log (mas) adaptive filter mas reduction 1: Visually sharp reproduction of the thoracic aorta % 2: Visually sharp reproduction of the aortic wall % 3: Visually sharp reproduction of the heart % 4: Visually sharp reproduction of the LMA % 5: oise sufficiently low for diagnosis %
7 Criterion 1: Visually sharp reproduction of the thoracic aorta 2: Visually sharp reproduction of the aortic wall 3: Visually sharp reproduction of the heart 4: Visually sharp reproduction of the LMA 5: oise sufficiently low for diagnosis Results with mas Regression coefficients (95% confidence limits) Estimated adaptive mas log (mas) filter reduction 2.52 ( 2.88; 2.16) 2.53 ( 2.82; 2.24) 2.54 ( 2.91; 2.18) 2.52 ( 2.81; 2.24) 2.74 ( 3.04; 2.44) 0.45 ( 0.78; 0.11) 0.75 ( 1.07; 0.44) 0.74 ( 1.12; 0.36) 0.61 ( 0.91; 0.30) 0.77 ( 1.07; 0.46) 16% (6%; 27%) 26% (17%; 34%) 25% (15%; 36%) 21% (13%; 30%) 24% (17%; 32%) For analyzing diagnostic accuracy, RC studies are superior, but costly and cumbersome. Visual grading experiments describe visual image. Simple comparisons can be made with VGC. rdinal regression (VGR) makes it possible to obtain direct numeric estimates of the potential for dose reduction. Particularly useful when testing and optimising acquisition/postprocessing protocols. Conclusion
Data Quality Assessment: A Reviewer s Guide EPA QA/G9R
United States Office of Environmental EPA/240/B06/002 Environmental Protection Information Agency Washington, DC 20460 Data Quality Assessment: A Reviewer s Guide EPA QA/G9R FOREWORD This document is
More informationCHOICE OF CONTROL GROUP AND RELATED ISSUES IN CLINICAL TRIALS E10
INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE ICH HARMONISED TRIPARTITE GUIDELINE CHOICE OF CONTROL GROUP AND RELATED ISSUES IN CLINICAL
More informationTHE development of methods for automatic detection
Learning to Detect Objects in Images via a Sparse, PartBased Representation Shivani Agarwal, Aatif Awan and Dan Roth, Member, IEEE Computer Society 1 Abstract We study the problem of detecting objects
More informationIBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
More informationIndirect comparisons Methods and validity
Indirect comparisons  Methods and validity SUMMARY REPORT Indirect comparisons Methods and validity July 2009 HAS  Department of Medecines Assessment 1/66 Indirect comparisons  Methods and validity
More information8 INTERPRETATION OF SURVEY RESULTS
8 INTERPRETATION OF SURVEY RESULTS 8.1 Introduction This chapter discusses the interpretation of survey results, primarily those of the final status survey. Interpreting a survey s results is most straightforward
More informationType I error rates and power analyses for singlepoint sensitivity measures
Perception & Psychophysics 28, 7 (2), 3894 doi:.3758/pp.7.389 Type I error rates and power analyses for singlepoint sensitivity measures Caren M. Rotello University of Massachusetts, Amherst, Massachusetts
More informationEmployee is also a customer. How to measure employees satisfaction in an enterprise?
Employee is also a customer. How to measure employees satisfaction in an enterprise? Z. Kotulski, Z.Wąsik and B. Dorożko 1. Introduction A working place: factory, company, office, etc., is the place where
More informationPRINCIPAL COMPONENT ANALYSIS
1 Chapter 1 PRINCIPAL COMPONENT ANALYSIS Introduction: The Basics of Principal Component Analysis........................... 2 A Variable Reduction Procedure.......................................... 2
More informationElements of Scientific Theories: Relationships
23 Part 1 / Philosophy of Science, Empiricism, and the Scientific Method Chapter 3 Elements of Scientific Theories: Relationships In the previous chapter, we covered the process of selecting and defining
More informationALTERNATE ACHIEVEMENT STANDARDS FOR STUDENTS WITH THE MOST SIGNIFICANT COGNITIVE DISABILITIES. NonRegulatory Guidance
ALTERNATE ACHIEVEMENT STANDARDS FOR STUDENTS WITH THE MOST SIGNIFICANT COGNITIVE DISABILITIES NonRegulatory Guidance August 2005 Alternate Achievement Standards for Students with the Most Significant
More informationDesign of Experiments (DOE)
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. Design
More informationMODEL SELECTION FOR SOCIAL NETWORKS USING GRAPHLETS
MODEL SELECTION FOR SOCIAL NETWORKS USING GRAPHLETS JEANNETTE JANSSEN, MATT HURSHMAN, AND NAUZER KALYANIWALLA Abstract. Several network models have been proposed to explain the link structure observed
More informationSpecial Article. Division of Pulmonary and Critical Care Medicine, Johns Hopkins University, Baltimore, MD; 2 James A. Haley VA Hospital, Tampa, FL; 3
Special Article Clinical Guidelines for the Use of Unattended Portable Monitors in the Diagnosis of Obstructive Sleep Apnea in Adult Patients Portable Monitoring Task Force of the American Academy of Sleep
More informationDigital Imaging and Communications in Medicine (DICOM) Part 14: Grayscale Standard Display Function
PS 3.142011 Digital Imaging and Communications in Medicine (DICOM) Part 14: Grayscale Standard Display Function Published by National Electrical Manufacturers Association 1300 N. 17th Street Rosslyn,
More informationChoices in Methods for Economic Evaluation
Choices in Methods for Economic Evaluation A METHODOLOGICAL GUIDE Choices in Methods for Economic Evaluation October 2012 Department of Economics and Public Health Assessment 1 Choices in Methods for Economic
More informationWas This Review Helpful to You? It Depends! Context and Voting Patterns in Online Content
Was This Review Helpful to You? It Depends! Context and Voting Patterns in Online Content Ruben Sipos Dept. of Computer Science Cornell University Ithaca, NY rs@cs.cornell.edu Arpita Ghosh Dept. of Information
More informationARTICLE 29 DATA PROTECTION WORKING PARTY
ARTICLE 29 DATA PROTECTION WORKING PARTY 0829/14/EN WP216 Opinion 05/2014 on Anonymisation Techniques Adopted on 10 April 2014 This Working Party was set up under Article 29 of Directive 95/46/EC. It is
More informationIBM SPSS Direct Marketing 21
IBM SPSS Direct Marketing 21 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This edition applies to IBM SPSS Statistics 21 and to
More informationIntroduction to Linear Regression
14. Regression A. Introduction to Simple Linear Regression B. Partitioning Sums of Squares C. Standard Error of the Estimate D. Inferential Statistics for b and r E. Influential Observations F. Regression
More informationIBM SPSS Direct Marketing 20
IBM SPSS Direct Marketing 20 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This edition applies to IBM SPSS Statistics 20 and to
More informationGUEST EDITORIAL. Twenty Statistical Errors Even YOU Can Find in Biomedical Research Articles. Tom Lang. Tom Lang Communications, Murphys, Ca, USA
5():7, GUEST EDITORIAL Twenty Statistical Errors Even YOU Can Find in Biomedical Research Articles Tom Lang Tom Lang Communications, Murphys, Ca, USA Critical reviewers of the biomedical literature have
More informationMetaAnalysis Notes. Jamie DeCoster. Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 354870348
MetaAnalysis Notes Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 354870348 Phone: (205) 3484431 Fax: (205) 3488648 September 19, 2004
More informationThe InStat guide to choosing and interpreting statistical tests
Version 3.0 The InStat guide to choosing and interpreting statistical tests Harvey Motulsky 19902003, GraphPad Software, Inc. All rights reserved. Program design, manual and help screens: Programming:
More informationResults from the 2014 AP Statistics Exam. Jessica Utts, University of California, Irvine Chief Reader, AP Statistics jutts@uci.edu
Results from the 2014 AP Statistics Exam Jessica Utts, University of California, Irvine Chief Reader, AP Statistics jutts@uci.edu The six freeresponse questions Question #1: Extracurricular activities
More informationGuide to Biostatistics
MedPage Tools Guide to Biostatistics Study Designs Here is a compilation of important epidemiologic and common biostatistical terms used in medical research. You can use it as a reference guide when reading
More informationDirect and Indirect Causal Effects via Potential Outcomes*
Published by Blackwell Publishing Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA Vol 31: 161 170, 2004 Direct and Indirect Causal Effects via Potential Outcomes*
More informationStandardized or simple effect size: What should be reported?
603 British Journal of Psychology (2009), 100, 603 617 q 2009 The British Psychological Society The British Psychological Society www.bpsjournals.co.uk Standardized or simple effect size: What should be
More informationABSTRACT. 2008, International Society for Pharmacoeconomics and Outcomes Research (ISPOR) 10983015/09/419 419 429 419
Volume 12 Number 4 2009 VALUE IN HEALTH Recommendations on Evidence Needed to Support Measurement Equivalence between Electronic and PaperBased PatientReported Outcome (PRO) Measures: ISPOR epro Good
More informationBecoming an Educated Consumer of Research: A Quick Look at the Basics of Research Methodologies and Design. Taylor Dimsdale Mark Kutner
Becoming an Educated Consumer of Research: A Quick Look at the Basics of Research Methodologies and Design Taylor Dimsdale Mark Kutner Meeting of the Minds PractitionerResearcher Symposium December 2004
More information