Evaluation of Diagnostic Tests

Similar documents

Health Care and Life Sciences

MedicalBiostatistics.com

Bayes Theorem & Diagnostic Tests Screening Tests

Overuse of PSA Screening for Prostate Cancer in Older Men. Elizabeth Jaramillo, MD January 17, 2014

Performance Measures for Machine Learning

1. What is the prostate-specific antigen (PSA) test?

Case-Control Studies. Sukon Kanchanaraksa, PhD Johns Hopkins University

Oncology Programme. Oncology Programme August 2015 Page 1 of 15

Screening for psychological distress in inoperable lung cancer patients: an evaluation of the Brief Distress Thermometer (BDT)

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Evaluation of Diagnostic and Screening Tests: Validity and Reliability. Sukon Kanchanaraksa, PhD Johns Hopkins University

Section 3 Part 1. Relationships between two numerical variables

Test Reliability Indicates More than Just Consistency

Interpreting Diagnostic Tests (tutorial by Thomas Tape at U of Nebraska)

NEW HYBRID IMAGING TECHNOLOGY MAY HAVE BIG POTENTIAL FOR IMPROVING DIAGNOSIS OF PROSTATE CANCER

Evaluation of Fitness Capacity of Players in Team Sports

SECOND M.B. AND SECOND VETERINARY M.B. EXAMINATIONS INTRODUCTION TO THE SCIENTIFIC BASIS OF MEDICINE EXAMINATION. Friday 14 March

Individual patient data meta-analysis of continuous diagnostic markers

Guide to Biostatistics

Prostate Cancer Guide. A resource to help answer your questions about prostate cancer

PSA Testing 101. Stanley H. Weiss, MD. Professor, UMDNJ-New Jersey Medical School. Director & PI, Essex County Cancer Coalition. weiss@umdnj.

Measures of diagnostic accuracy: basic definitions

Detection and staging of recurrent prostate cancer is still one of the important clinical problems in prostate cancer. A rise in PSA or biochemical

Hindrik Vondeling Department of Health Economics University of Southern Denmark Odense, Denmark

Digital Image Processing with DragonflyPACS

The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities

CART Community Assessment of Risk Tool

Mammography. What is Mammography?

OBJECTIVES By the end of this segment, the community participant will be able to:

Basic research methods. Basic research methods. Question: BRM.2. Question: BRM.1

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Malignant Pleural Diseases Advances Clinicians Should Know F Gleeson

Accident & Health Division QUOTATION REQUEST FORM. Thank you for your interest in our Jubilee Health Insurance Scheme.

B e l l i n. S c h o o l R a d i o l o g i c T e c h n o l o g y. Bellin Health School of Radiologic Technology. Bellin Health

There are many different types of cancer and sometimes cancer is diagnosed when in fact you are not suffering from the disease at all.

S The Difference Between Predictive Modeling and Regression Patricia B. Cerrito, University of Louisville, Louisville, KY

Monitoring Patient Radiation Dose in VA. Charles M. Anderson MD, PhD Chief Consultant for Diagnostic Services Veterans Health Administration

Case-control studies. Alfredo Morabia

myhealthcare Cost Estimator (myhce)

Meeting the Needs of Aging Persons. Aging in Individuals with a

Knowledge Discovery and Data Mining

Challenge of FUJIFILM in Medical ICT

UMHS-PUHSC JOINT INSTITUTE

The PSA Test for Prostate Cancer Screening:

College Readiness LINKING STUDY

Cancer Screening. Robert L. Robinson, MD, MS. Ambulatory Conference SIU School of Medicine Department of Internal Medicine.

Plan Comparison. Executive Plan. Comprehensive Series. Priority Series. Saver Series. Core Series. KeyCare Series. Executive

Appraising Diagnostic Test Studies

Brain Cancer. This reference summary will help you understand how brain tumors are diagnosed and what options are available to treat them.

Summary of the benefits available on the Quantum Essential Saver Plan

The 4Kscore blood test for risk of aggressive prostate cancer

DIAGNOSING SCAPHOID FRACTURES. Anthony Hewitt

Performance Metrics for Graph Mining Tasks

Measurement: Reliability and Validity Measures. Jonathan Weiner, DrPH Johns Hopkins University

Introduction to Statistics Used in Nursing Research

UnitedHealthcare Engagement & Transparency: An Analytic Approach.

DATA INTERPRETATION AND STATISTICS

Section 5 Part 2. Probability Distributions for Discrete Random Variables

The Field. Radiologic technologists take x-rays and administer nonradioactive materials into patients' bloodstreams for diagnostic purposes.

Recommendations for cross-sectional imaging in cancer management, Second edition

II. DISTRIBUTIONS distribution normal distribution. standard scores

Long-Term Clinical Service Plan Impacts for 2016/17. October 2015

The digestive system. Medicine and technology. Normal structure and function Diagnostic methods Example diseases and therapies

11. Analysis of Case-control Studies Logistic Regression

Medical Radiologic Technology

Scoring (manual, automated, automated with manual review)

Follow-up care plan after treatment for breast cancer. A guide for General Practitioners

Patient Prep Information

Cancer Biostatistics Workshop Science of Doing Science - Biostatistics

Chronic Low Back Pain

Life Tables. Marie Diener-West, PhD Sukon Kanchanaraksa, PhD

2015 Medicare Advantage Summary of Benefits

Measures of Prognosis. Sukon Kanchanaraksa, PhD Johns Hopkins University

Point Biserial Correlation Tests

The Executive Physical Program

Introduction to Risk Adjustment Programs for Medicare Advantage and the Affordable Care Act (Commercial Health Insurance Exchange)

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Tatyana A Shamliyan. I do not have COI. 30-May-2012

O CONNOR REHAB & WELLNESS CLINIC. Patient Information Record

Patient Information Form Pain Management Center at Phoebe

Diagnostic Testing Considerations for Shelters and Rescues Part One: Fundamentals

Evaluation & Validation: Credibility: Evaluating what has been learned

BRCA Genes and Inherited Breast and Ovarian Cancer. Patient information leaflet

International Journal of Case Reports in Medicine

Monitoring Clinical Stage to Improve Care

Stereotactic Radiotherapy for Prostate Cancer using CyberKnife

Addressing the complex needs of people with cancer and co-existing dementia

Dental caries is an infectious disease caused

Cancer Care Delivered Locally by Physicians You Know and Trust

Clinical Indicator Ages Ages Ages Ages Ages 65+ Frequency of visit as recommended by PCP

2011 Radiology Diagnosis Coding Update Questions and Answers

Health Care Careers in the Field of Imaging. Shari Workman, MSM,PHR,CIR MultiCare Health System Senior Recruiter/Employment Specialist

2014 E&M Oncology Documentation & Coding Basics Working Smarter, Not Harder!

Evaluation and Management of the Breast Mass. Gary Dunnington,, M.D. Department of Surgery Internal Medicine Ambulatory Conference December 4, 2003

Biostatistics: Types of Data Analysis

How To Compare Your Medicare Benefits To Health Net Ruby Select (Hmo)

Performance Measures in Data Mining

How To Become An Xray Technician

R/F. Efforts to Reduce Exposure Dose in Chest Tomosynthesis Targeting Lung Cancer Screening. 3. Utility of Chest Tomosynthesis. 1.

UNH MEDICAL PLAN CHANGE UPDATE. May 3, 2011

Transcription:

Biostatistics for Health Care Researchers: A Short Course Evaluation of Diagnostic Tests Presented ed by: Siu L. Hui, Ph.D. Department of Medicine, Division of Biostatistics Indiana University School of Medicine

Objectives Calculate and interpret sensitivity, specificity, predictive value positive, and predictive value negative. Understand the principle behind ROC curves. Understand the use of kappa and intraclass correlation coefficients to measure agreement.

Screening Examples Lab (BMP, lipids) id Imaging (bone density, mammogram) Questionnaires (depression, dementia, QOL) Diagnostic Blood tests for infection Imaging (X-ray, CT, MRI) Histology All tests have errors. How accurate are they?

Outline Accuracy of a diagnostic test: - Binary data: sensitivity and specificity predictive value positive and negative - Ordinal or continuous data: ROC curve Measure of the agreement of two tests: - nominal or ordinal data: Kappa - Continuous data: Intraclass correlation coefficient

Example 1: Staging g of Prostate Cancer with MRI Tempany, et al. (1994) studied the accuracy of conventional MRI in detecting advanced stage prostate cancer. Disease: advanced stage prostate cancer. Test: conventional MRI. The true disease status was established by surgery. Question: How accurate is conventional MRI in detecting advanced stage prostate cancer? Tempany, Zhou, Zerhouni, Rifkin, Quint, Piccoli, Ellis, and McNeil (1994). Staging of Prostate Cancer: Results of Radiology Diagnostic Oncology Group Project Comparison of Three MR Imaging Techniques Radiology, 192:47-54.

Example 1 (continued) Disease\MRI T+ T- total D+ 70 45 115 D- 32 53 85 total 102 98 200

Accuracy of a Test True Disease Status: Disease (D + ) and non-disease (D - ) by gold standard {Advanced stage vs. early stage by surgery} Test Result: Positive (T + ) and negative (T - ) results from a test of interest. {Advanced stage vs. early stage as assessed by MRI}

Example 1 (continued) Disease\MRI T+ T- total D+ 70 45 115 D- 32 53 85 total 102 98 200

Overall Accuracy N = total # of cases A = # of correctly diagnosed cases Overall accuracy = A/N

Example 1 (continued) Disease\MRI T+ T- total D+ 70 45 115 D- 32 53 85 total 102 98 200 Overall Accuracy = (70 + 53)/200 = 0.615 ~ 62%

Sensitivity and Specificity Sensitivity (Sens) - the ability of a test to give a positive finding when the person tested truly has the disease under study. Specificity (Spec) - the ability of a test to give a negative finding when the person tested truly is free of the disease under study.

Sensitivity and Specificity Sens P(T D ) # of T and D # of D Spec P(T D ) # of ft and dd # of D

Sensitivity and Specificity Sensitivity: True positive rate (TPR) Specificity: it True negative rate (1-FPR) where FPR=false positive rate

Example 1 (continued) Disease\MRI T+ T- total D+ 70 45 115 D- 32 53 85 total 102 98 200 Sens = 70/115 = 61%

Example 1 (continued) Disease\MRI T+ T- total D+ 70 45 115 D- 32 53 85 total 102 98 200 Spec = 53/85 = 62%

In summary, Sensitivity and Specificity it are two intrinsic properties of a test. BUT, clinical providers have to infer a patient s disease status from test results. How well can a given test result of a patient predict the disease status of the patient? {How likely l a patient t with positive MRI result actually has advanced stage prostate cancer?}

Predictive Values Predictive value positive (PV + ) is the probability that a patient with a positive test result actually has the disease: PV # of diseased patients with a positive test # of patients with a positive test Predictive value negative (PV - ) is the probability that a patient with a negative test does not have the disease: PV # of non-diseased patients with a negative test # of patients with a negative test

PV + and PV - Both PV + and PV - depend on the sensitivity, the specificity and the disease prevalence (Prev). PV P PV P Sens Prev D T Sens Prev + 1 Spec 1 Prev Spec Prev D T Spec Prev + 1 Sens 1 Prev

Example 2: A Screening Test for a Rare Disease Disease\Test T+ T- total D+ 19 1 20 D- 99 1881 1980 total 118 1882 2000 Disease prevalence=20/2000=1%.

Example 2: A Screening Test for a Rare Disease Disease\Test T+ T- total D+ 19 1 20 D- 99 1881 1980 total 118 1882 2000 Disease prevalence=20/2000=1%. Sens=19/20=95% Spec=1881/1980=95%.

Example 2: A Screening Test for a Rare Disease Disease\Test T+ T- total D+ 19 1 20 D- 99 1881 1980 total 118 1882 2000 Disease prevalence=20/2000=1%. Sens=19/20=95%, Spec=1881/1980=95%. PV + =19/118=16.1%.

Example 2: A Screening Test for a Rare Disease Disease\Test T+ T- total D+ 19 1 20 D- 99 1881 1980 total 118 1882 2000 Disease prevalence=20/2000=1%. Sens=19/20=95%, Spec=1881/1980=95%. PV + =19/118=16.1%, PV - =1881/1882=99.9%.

Example 2 (continued) Prev(%) PV + (%) PV - (%) 1 16.11 99.99 5 50.0 99.7 20 82.6 98.7 50 95.0 95.0 75 98.3 83.7 Sens = Spec = 95%. Note how the prevalence affects PV + and PV -. When does it not make sense to screen?

What if your test gives a continuous What if your test gives a continuous reading rather than +/-?

Example 3: Blood Test for Disease ID D T 1 2 3 4 1-0.2 - - - - 2-0.7 - - - - 3-18 1.8 + - - - 4-2.0 + - - - 5-3.1 + + - - 6-3.3 + + + - 7 + 1.5 + - - - 8 + 2.4 + + - - 9 + 3.0 + + + - 10 + 3.1 + + + - 11 + 3.8 + + + - 12 + 4.0 + + + - Sens 1.00 0.83 0.67 0.00 Spec 0.33 0.67 0.83 1.00 FPR 0.67 0.33 0.17 0.00

ROC Curve (Receiver Operating Characteristic Curve) When is it applied? Test results are continuous and we may have more than one possible cutoff points, or We have multiple degrees of suspicion for a given a test t (ordinal l response, e.g. definitely it no disease, probably no disease, probably disease, and definitely disease). How is it plotted? FPR on the horizontal axis and TPR on the vertical axis. Recall: True positive rate (TPR) = Sens False positive rate (FPR) = 1-Spec

ROC Curve (Receiver Operating Characteristic Curve) When is it applied? Test results are continuous and we may have more than one possible cutoff points, or We have multiple degrees of suspicion for a given a test t (ordinal l response, e.g. definitely it no disease, probably no disease, probably disease, and definitely disease). How is it plotted? FPR on the horizontal axis and TPR on the vertical axis. Recall: True positive rate (TPR) = Sens False positive rate (FPR) = 1-Spec

Example 3 (continued) SENS 4 0.6 0.2 0.4 0.0 0 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 FPR=1-SPEC

ROC Curves Why do we need the ROC curve? Displays Sens (benefit) and FPR (cost) under different thresholds so decision makers can choose the appropriate threshold for their situation. How to choose a threshold in practice? Based on how the test is used e.g. screening vs. diagnostic Provides the ability to compare two or more diagnostic tests.

ROC Curve Comparison 2 0.4 0.0 0.2 SENS 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 FPR=1-SPEC

Comments We can compare the accuracies of two tests if we know the gold standard. What if we don t have the gold standard?

Reliability When the gold standard is not available, a test is considered reliable if it agrees with another reference test. A test t is considered d reliable if it provides higher h interrater and intra-rater (or inter-assay and intra-assay) agreement. Kappa: used for nominal or ordinal data agreement ICC: used for continuous data agreement

Example 4: Biphasic Radiography vs. Fiberoptic Endoscopy in Gastric Ulcers Endo\Radio No Ulcer Ulcer total No Ulcer 351 4 355 Ulcer 7 12 19 total 358 16 374 Shaw, van Romunde, Griffioen, Janssens, Kreuning, Eilers (1987). Peptic Ulcers and Gastric Carcinoma: Diagnosis with Biphasic Radiography Compared with Fiberoptic Endoscopy Radiology, 163:39-42.

Kappa ( ) P o =observed agreement. P e =agreement expected by chance. P o -P e =agreement beyond chance. 1-P e =the maximum agreement possible beyond chance. P 1 o P P e e

An Interpretation of Kappa Kappa Strength of Agreement <0.00 poor 0.00-0.20 Slightly poor 0.21-0.40 Fair 0.41-0.60 Moderate 0.61-0.80 Substantial 0.81-1.00 1.00 Almost perfect

Example 4 (continued) Endo\Radio No ulcer Ulcer total No Ulcer 351 4 355 Ulcer 7 12 19 total 358 16 374 Total observed agreement = (351+12)/374 12)/374 = 97%. =.67 (Substantial).

Intraclass Correlation ICC compares the variability of a trait between subjects to the total variation across all ratings and all subjects. As the variability of a trait between subjects increases As the variability of a trait between subjects increases relative to the total variability, ICC moves closer to 1.

Example 5: Faculty Ratings of Residents Performances Resident Faculty1 Faculty2 1 77 81 2 80 79 3 55 60 4 91 88 5 60 62 6 80 84 ICC=.96

Summary Measure of accuracy with gold standard: Sens, Spec, PV+, PV- ROC curve Measure the agreement of two tests in the absence of gold standard: Kappa ICC