Big- Data- Driven Medicine



Similar documents
Crea%ng Value from Data Governance, Ethics, and Appropriate Use

Learning from observational databases: Lessons from OMOP and OHDSI

Randomized trials versus observational studies

The FDA s Mini- Sen*nel Program and the Learning Health System

Using 'Big Data' to Estimate Benefits and Harms of Healthcare Interventions

RWE and Pharmaceuticals Challenges and Approach. Christian Reich PRISME 16-October-2013

Big Data and Health Insurance Product Selec6on (and a few other applica6on) Jonathan Kolstad UC Berkeley and NBER

New Anticoagulants and GI bleeding

OBSERVATIONAL MEDICAL OUTCOMES PARTNERSHIP

Missing Data. Katyn & Elena

Big data size isn t enough! Irene Petersen, PhD Primary Care & Population Health

Electronic health records to study population health: opportunities and challenges

Retail Pharmacy Clinical Services: Influence of ACOs & Healthcare Financing Models

Boehringer Ingelheim- sponsored Satellite Symposium. HCV Beyond the Liver

Articles Presented. Journal Presentation. Dr Albert Lo. Dr Albert Lo

With Big Data Comes Big Responsibility

Data Mining. Supervised Methods. Ciro Donalek Ay/Bi 199ab: Methods of Sciences hcp://esci101.blogspot.

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Performance Measurement for the Medicare and Medicaid Eligible (MME) Population in Connecticut Survey Analysis

Guide to Biostatistics

Demonstrating Meaningful Use Stage 1 Requirements for Eligible Providers Using Certified EMR Technology

Introduction to Post marketing Drug Safety Surveillance: Pharmacovigilance in FDA/CDER

Hormones and cardiovascular disease, what the Danish Nurse Cohort learned us

Confounding in health research

Atrial Fibrillation 2014 How to Treat How to Anticoagulate. Allan Anderson, MD, FACC, FAHA Division of Cardiology

Does referral from an emergency department to an. alcohol treatment center reduce subsequent. emergency room visits in patients with alcohol

HEdis Code Quick Reference Guide Disease Management Services

Nodes, Ties and Influence

STROKE PREVENTION IN ATRIAL FIBRILLATION. TARGET AUDIENCE: All Canadian health care professionals. OBJECTIVE: ABBREVIATIONS: BACKGROUND:

Building patient-level predictive models Martijn J. Schuemie, Marc A. Suchard and Patrick Ryan

4/9/2015. Risk Stratify Our Patients. Stroke Risk in AF: CHADS2 Scoring system JAMA 2001; 285:

The PROTECT project. Introduction. Xavier Kurz Pharmacovigilance and Risk management Patient Health Protection Unit European Medicines Agency

Sample Size Planning, Calculation, and Justification

Diabetes Prevention in Latinos

patient-centered SCAlable National Network for Effectiveness Research

Medical management of CHF: A New Class of Medication. Al Timothy, M.D. Cardiovascular Institute of the South

Inhibit terminal acid secretion from parietal cells by blocking H + /K + - ATPase pump

Technical Issues in Aggregating and Analyzing Data from Heterogeneous EHR Systems

Psoriasis Co-morbidities: Changing Clinical Practice. Theresa Schroeder Devere, MD Assistant Professor, OHSU Dermatology. Psoriatic Arthritis

Psychiatrists and Reporting on Meaningful Use Stage 1. August 6, 2012

SOUTH EAST WALES CARDIAC NETWORK INTEGRATED CARE PATHWAY CARDIAC REHABILITATION MAY 2005

HEDIS Code Quick Reference Guide Preventive/Ambulatory Services

New Oral Anticoagulants. How safe are they outside the trials?

New Treatments for Stroke Prevention in Atrial Fibrillation. John C. Andrefsky, MD, FAHA NEOMED Internal Medicine Review course May 5 th, 2013

Antiplatelet and Antithrombotic Therapy. Dr Curry Grant Stroke Prevention Clinic Quinte Health Care

Research Skills for Non-Researchers: Using Electronic Health Data and Other Existing Data Resources

Medicare & Medicaid EHR Incentive Program Meaningful Use Stage 1 Requirements Summary.

Na#onal Asbestos Forum 2013: Advance in Medical Research on Asbestos- Related Diseases

Overview of FDA s active surveillance programs and epidemiologic studies for vaccines

FOR THE PREVENTION OF ATRIAL FIBRILLATION RELATED STROKE

Committee Approval Date: September 12, 2014 Next Review Date: September 2015

Apixaban Plus Mono vs. Dual Antiplatelet Therapy in Acute Coronary Syndromes: Insights from the APPRAISE-2 Trial

Use of Observa,onal Data to Make Causal Inferences About Treatment Decisions in Mul,ple Sclerosis. Brian Healy, PhD

Stage 1 Meaningful Use for Specialists. NYC REACH Primary Care Information Project NYC Department of Health & Mental Hygiene

Biostatistics and Epidemiology within the Paradigm of Public Health. Sukon Kanchanaraksa, PhD Marie Diener-West, PhD Johns Hopkins University

Clinical Quality Measure Crosswalk: HEDIS, Meaningful Use, PQRS, PCMH, Beacon, 10 SOW

Analyzing Clinical Trial Findings of the Efficacy and Safety Profiles of Novel Anticoagulants for Stroke Prevention in Atrial Fibrillation

DISCLOSURES RISK ASSESSMENT. Stroke and Heart Disease -Is there a Link Beyond Risk Factors? Daniel Lackland, MD

Standard Medicare Part D* checklist

Understanding Diseases and Treatments with Canadian Real-world Evidence

Parisa Vatanka, PharmD Pharmacy Strategic Alliances Manager Assistant Clinical Professor, UCSF. Stan Leung, PharmD Clinical Division Lead

BIG DATA SCIENTIFIC AND COMMERCIAL APPLICATIONS (ITNPD4) LECTURE: DATA SCIENCE IN MEDICINE

Big Data for Population Health and Personalised Medicine through EMR Linkages

Health risks and benefits from calcium and vitamin D supplementation: Women's Health Initiative clinical trial and cohort study

Introduction to Statistics and Quantitative Research Methods

Case-control studies. Alfredo Morabia

If several different trials are mentioned in one publication, the data of each should be extracted in a separate data extraction form.

Basic Study Designs in Analytical Epidemiology For Observational Studies

Health Care and Life Sciences

Hormone therapy and breast cancer: conflicting evidence. Cindy Farquhar Cochrane Menstrual Disorders and Subfertility Group

Radiology Business Management Association Technology Task Force. Sample Request for Proposal

NHS Diabetes Prevention Programme (NHS DPP) Non-diabetic hyperglycaemia. Produced by: National Cardiovascular Intelligence Network (NCVIN)

Electronic Health Record (EHR) Data Analysis Capabilities

William B. Smith, MD President. Patrick R. Ayd, RN, MBA Chief Operating Officer. SNBL-CPC Baltimore, Maryland

S1. Which of the following age categories do you fall into? Please select one answer only years of age years of age years of age

Duration of Dual Antiplatelet Therapy After Coronary Stenting

Telehealth care Closing the Gap to Specialty Care. Dietra Watson, MSN, RN Clinical Informa7cs

Conserva)ve Treatment of PE/ DVT

Antiplatelet and Antithrombotics From clinical trials to guidelines

Barriers to Healthcare Services for People with Mental Disorders. Cardiovascular disorders and diabetes in people with severe mental illness

Centers for Medicare & Medicaid Services 7500 Security Boulevard Baltimore, MD 21244

A Population Based Risk Algorithm for the Development of Type 2 Diabetes: in the United States

Transcription:

Big- Data- Driven Medicine Can big data save medicine? David Madigan Department of Sta9s9cs Columbia University

Evidence- Based Medicine as against???

How does the combina9on of Clinical Judgment and Evidence- Based Medicine work in prac9ce?

John went to see a world- class cardiologist Should John have an angiogram?

Should John have an 48 years old HDL = 59 CRP normal LDL = 70 angiogram? triglycerides = 106 father died of heart disease (47) university Clinical professor calcium judgment? score in 2003 = 19 calcium score in 2008 = 42 non- smoker calcium score in 2010 = 70 BMI = 21.6 stress test normal in 2007 no diabetes mother died of cancer (83) EKG unusual in 2009 Who are we kidding? aspirin exercise lipitor red wine arrhythmia in 2008 genotyping normal heart ultrasound (2008)

Data- Driven Medicine Observa=onal Medical Outcomes Partnership Mul=ple years of medical records for 200+ million people Largest collec=on of medical records in the world 32,430 pa=ents just like John

Many Challenges Sta9s9cal/Epidemiological Computa9onal Drug Safety

Safety in Lifecycle of a Drug/Biologic product

Drug Safety Pre-Approval: High quality data (but small) No "data mining" Post- Approval: Low quality data (but lots of it) Extensive use of "data mining"

Problems with Spontaneous Reports Under-reporting Duplicate reports No temporal information No denominator Sentinel EU-ADR OMOP Vioxx Baycol etc. IOM FDAAA Active surveillance in large, longitudinal databases

BMJ 2010; 341:c4444 15

BMJ study design choices Data source: General Prac9ce Research Database Study design: Nested case- control Inclusion criteria: Age > 40 Case: cancer diagnosis between 1995-2005 with 12- months of follow- up pre- diagnosis 5 controls per case Matched on age at index date, sex, prac9ce, observa9on period prior to index Exposure defini9on: >=1 prescrip9on during observa9on period RR es9mated with condi9onal logis9c regression Covariates: smoking, alcohol, BMI before outcome index date Sensi9vity analyses: exposure = 2+ prescrip9ons covariates not missing 9me- at- risk = >1 yr post- exposure Subgroup analyses: Short vs. long exposure dura9on Age, Sex, smoking, alcohol, BMI Osteoporosis or osteopenia Fracture pre- exposure Prior diagnosis of Upper GI dx pre- exposure NSAID, cor9costeroid, H2blocker, PPI u9liza9on pre- exposure 16

BMJ Results BMJ 2010; 341:c4444 17

JAMA 2010; 304(6): 657-663 18

JAMA study design choices Data source: General Prac9ce Research Database Study design: Cohort Inclusion criteria: Age > 40 Exclusion criteria: Cancer diagnosis in 3 years before index date Exposed cohort: Pa9ents with >=1 prescrip9on between 1996-2006 Unexposed cohort: 1- to- 1 match with exposed cohort 1995-2005 in BMJ Matched on year of birth, sex, prac9ce Match exposure vs. HR es9mated with Cox propor9onal hazards model Not observa9on length outcome status; not 5- to- 1 Time- at- risk: >6mo from index date Covariates: Smoking, alcohol, BMI before exposure index date Time- at- risk is between two defini9ons used in Hormone therapy, NSAIDs, H2blockers, PPIs BMJ: All 9me post- exposure and >1yr aker index Sensi9vity analyses: Different index date Excluding people that were in both exposed and unexposed cohorts BMJ didn t stra9fy by hormone therapy Exclude pa9ents with missing confounders (not reported) Subgroup analyses: Low vs. medium vs. high use, based on defined daily dose Alendronate vs. nitrogen- containing bisphosphonates vs. non- nitrogen- contraining bisphosphonates 19

JAMA Results JAMA 2010; 304(6): 657-663 20

OMOP Research Experiment Open- source Standards- based Common Data Model OMOP Methods Library Incep9on cohort Case control Logis9c regression 10 data sources Claims and EHRs 200M+ lives 14 methods Epidemiology designs Sta9s9cal approaches adapted for longitudinal data Outcome Angioedema Aplastic Anemia Acute Liver Injury Bleeding Hip Fracture Hospitalization Myocardial Inf arction Mortality af ter MI Renal Failure GI Ulcer Hospitalization Drug ACE Inhibitors Amphotericin B Benzodiazepines Antibiotics: erythromycins, sulfonamides, tetracyclines Antiepileptics: carbamazepine, phenytoin Beta blockers Bisphosphonates: alendronate Tricyclic antidepressants Typical antipsychotics Warfarin 23 Legend True positive' benefit Total 2

What do the data look like? M44 pa9ent 1 MI! AVANDIA! CELECOXIB! ] ] ] ] ] ] M78 pa9ent 2 FLUTICASONE! ROFECOXIB! MI! ROFECOXIB! ] ] ] ] ] ] ] ] F24 pa9ent 3 ROFECOXIB! ROFECOXIB! MI! MI! ] ] ] ] ] ] ] ] ] ] QUETIAPINE! OLANZAPINE! Computa(onal considera(ons require few passes through the data

OBSERVATIONAL,, MEDICAL, OUTCOMES, PARTNERSHIP, An4bio4cs, ini4a4on, Acute,renal, failure,event,

Typical scenario: Es9mate the effect of one drug on one outcome using one method against one database Drug: ACE inhibitor Outcome: Angioedema Method: High- dimensional propensity score (HDPS) Database: Thomson MarketScan Commercial Claims and Encounters (CCAE) True TP + Data source Typically focus on magnitude of the effect: rela9ve risk (RR) and sta4s4cal significance: lower and upper bound of confidence interval (CI) If this had been an randomized trial, we would know the CI has 95% coverage of the true effect size. Because this is an observa9onal study with the poten9al for bias, the opera9onal characteris9cs are uncertain: Is the es9mated associa9on consistent with the direc9onality of the true causal rela9onship? How oken does the CI actually contain the truth? Rela9ve risk 26

Systema9c sensi9vity analysis: Es9mate the effect using mul9ple methods across the network of databases Data source Data sources in OMOP network: CCAE: Thomson MarketScan Commerical Claims and Encounters MDCR: Thomson Medicare Supplemental MDCD: Thomson Mul9state Medicaid MSLR: Thomson Lab Supplement GE: GE Centricity EHR HUM: Humana PHCS: Partners Healthcare System RI: Regenstrief Ins9tute SDI_MID: SDI Health VA: Department of Veteran s Affairs MedSAFE False FN - True TP + Methods in OMOP network: CCO: Case crossover DP: Dispropor9onality analysis HDPS: High- dimensional propensity score ICTPD: Temporal papern discovery USCCS: Univariate self- controlled case series ACE Inhibitors are believed to have a causal rela9onship with Angioedema Essen9ally all methods and databases correctly es9mate a posi9ve associa9on direc9onally consistent with prior beliefs Rela9ve risk 27

Consistent false posi9ve observed for nega9ve control of An9bio9cs and Acute Renal Failure TN True - FP False + Data source An9bio9cs are observed to have a significant, posi9ve associa9on with acute renal failure across mul9ple methods and databases. This false posi9ve may be due to protopathic bias, but several methods that employ analy9cal strategies to address that issue failed to control for it. Rela9ve risk 28

Measuring method performance Drug- condi9on associa9on status Y true associa9on, N nega9ve control Y N Method predic9on: Drug- condi9on pair met a specific threshold Y N True posi=ves False nega=ves False posi=ves True nega=ves Ques9on: For any method applied to any data source, what are the expected opera9ng characteris9cs? 29

Method predic9on: Drug- condi9on pair met a specific threshold: (LB 95% CI > 1) Y N Measuring method performance example: Random- effect meta- analysis of es9mates from High- dimensional propensity score Drug- condi9on associa9on status Y true associa9on, N nega9ve control Y N True posi9ves: 5 False nega9ves: 4 False posi9ves: 8 True nega9ves: 36 Posi9ve predic9ve value = precision = TP / (TP+FP) = 5 / (5+8) = 0.38 Nega9ve predic9ve value = TN / (FN+TN) = 36 / (4+36) = 0.90 Sensi9vity = Recall = TP / (TP+FN) = 5 / (5+4) = 0.56 Specificity = TN / (FP+TN) = 36 /(8+36) = 0.82 False posi9ve rate = 1 0.82 = 0.18 Accuracy = (TP+TN) / (TP+TN+FP +FN) =(5+36)/(9+44) = 0.77 30

ROC curves of random- effects meta- analysis es9ma9ons for all methods True - False - False + True + NS p<.05 Sensi9vity At 50% sensi9vity, false posi9ve rate ranges 16%- 30% At 10% false posi9ve rate, sensi9vity ranges 9%- 33% False posi9ve rate (1- Specificity)

AUC performance by method AUC Method

Heterogeneity

Range of es9mates across high- dimensional propensity score incep9on cohort (HDPS) parameter seungs Rela9ve risk True - False - False + True + Parameter sexngs explored in OMOP: Washout period (1): 180d Surveillance window (3): 30 days from exposure start; exposure + 30d ; all 9me Each row represents a drug- from exposure start outcome pair. Covariate The horizontal eligibility span window reflects (3): the 30 days range prior of to point exposure, es9mates 180, observed all- 9me pre- exposure across the parameter seungs. # of confounders (2): 100, 500 Ex. Benzodiazepine- Aplas9c covariates used to es9mate propensity anemia: HDPS parameters vary in score es9mates from RR= 0.76 and 2.70 Propensity strata (2): 5, 20 strata Analysis strategy (3): Mantel- Haenszel stra9fica9on (MH), propensity score adjusted (PS), propensity strata adjusted (PS2) Comparator cohort (2): drugs with same indica9on, not in same class; most prevalent drug with same indica9on, not in same class 34

Range of es9mates across univariate self- controlled case series (USCCS) parameter seungs USCCS Parameter sexngs explored in OMOP: Condi=on type (2): first occurrence or all For Bisphosphonates- GI Ulcer hospitaliza9on, occurrences of outcome USCCS using incident events, excluding the first day Defining exposure =me- at- risk: of exposure, and using large prior of 2: Days from exposure start (2): should we When surveillance window = length of include the drug start index date in the exposure, no associa9on is observed period at risk? Adding 30d of 9me- at- risk to the end of Surveillance window (4): exposure increased to a significant RR=1.14 30 d from exposure start Dura9on of exposure (drug era start through drug era end) Dura9on of exposure + 30 d Dura9on of exposure + 60 d Precision of Normal prior (4): 0.5, 0.8, 1, 2 True - False - False + True + Rela9ve risk 35

Fix everything except the database

Cohort

SCCS

Interpre9ng Biased Es9mates

p- values are tail probabili9es null distribu9on for nega9ve controls observed RR density p- value X 2 0.9 1.0 1.1 1.2 1.3 1.4 es9mated RR

We also have posi9ve controls null distribu9on for nega9ve controls observed RR distribu9on for posi9ve controls density much more likely to have come blue than red 0.9 1.0 1.1 1.2 1.3 1.4 es9mated RR moderate p- value, small Pr(posi9ve control)

But if AUC is small null distribu9on for nega9ve controls observed RR distribu9on for posi9ve controls distribu9on for posi9ve controls density than blue much more likely to have come from red 0.9 1.0 1.1 1.2 1.3 1.4 es9mated RR despite the moderate p- value, Pr(posi9ve control) is large! the null distribu9on doesn t tell the whole story

Revising prior expecta9ons in light of new evidence from an ac9ve surveillance system If you observe a RR = 2.0 (1.78 2.25), then your posterior probability depends on your prior expecta9ons With moderate variance (SElogRR = 0.06), observing RR<2.0 is only modestly informa9ve Prior: p=0.9 p=0.5 p=0.1

Revising prior expecta9ons in light of new evidence from an ac9ve surveillance system: Impact of precision of observed es9mates but the same effect size with large uncertainty does not significantly revise prior expecta9ons Prior: p=0.9 p=0.5 p=0.1 A large and highly significant associa9on can be highly informa9ve Scenarios: You observe RR=2.0 with confidence intervals based on standard error (SE): Large SE: (1.01 3.97) Medium SE: (1.78 2.25) Small SE: (1.96 2.04)

Exploring clopidogrel and upper gastrointes9nal bleeding 45

Clopidogrel GI Bleed Method: CC 2000314, Source: CCAE, HOI: GI Bleed Probability 0.0 0.1 0.5 0.9 1.0 RR:1.86 (1.79, 1.93) SE: 0.02 Prior: p=0.9 p=0.5 p=0.1 β 1 = 0.00 β 2 = 0.14 0.25 0.5 1 2 5 10 Relative risk

Exploring isoniazid and acute liver injury 47

Isoniazid - Acute Liver Injury Method: CM 21000211, Source: MDCR, HOI: Acute Liver Injury Probability 0.0 0.1 0.5 0.9 1.0 RR:4.03 (2.69, 6.03) SE: 0.21 Prior: p=0.9 p=0.5 p=0.1 β 1 = 0.79 β 2 = 0.02 0.25 0.5 1 2 5 10 Relative risk

Exploring ibuprofen and acute kidney injury 49

Ibuprofen Acute Kidney Injury Method: CC 2000241, Source: MDCD, HOI: Acute Kidney Injury Probability 0.0 0.1 0.5 0.9 1.0 RR:1.26 (1.16, 1.36) SE: 0.04 Prior: p=0.9 p=0.5 p=0.1 β 1 = 0.00 β 2 = 0.07 0.25 0.5 1 2 5 10 Relative risk

Exploring indomethacin and acute myocardial infarc9on 51

Indomethacin Acute Myocardial Infarc9on Method: CC 2000241, Source: GE, HOI: Acute Myocardial Infarction Probability 0.0 0.1 0.5 0.9 1.0 RR:1.44 (1.20, 1.72) SE: 0.09 Prior: p=0.9 p=0.5 p=0.1 β 1 = 0.00 β 2 = 0.00 0.25 0.5 1 2 5 10 Relative risk

Indomethacin Acute Myocardial Infarc9on Method: SCCS 1953010, Source: GE, HOI: Acute Myocardial Infarction Probability 0.0 0.1 0.5 0.9 1.0 RR:1.16 (1.06, 1.72) SE: 0.05 Prior: p=0.9 p=0.5 p=0.1 β 1 = 4.97 β 2 = 0.37 0.25 0.5 1 2 5 10 Relative risk

Predic9ve Modeling

Discussion Current approaches to repor9ng observa9onal studies are broken Big data can transform medicine but we have to stop kidding ourselves about the role of clinical judgment in a data- rich world, and the reliability of observa9onal studies

Three Pieces of the Solu9on 1. Sensi9vity Analysis 2. Establish Opera9ng Characteris9cs - How well do observa9onal studies work? - How big is the bias in prac9ce? - How oken do 95% intervals contain the true RR? - What does this mean?

Three Pieces of the Solu9on 3. Predic9on of Future Observables 4. Avoid reliance on human art

Collaborators Patrick Ryan, J&J Mar9jn Schuemie, Erasmus University Marc Suchard, UCLA Bill DuMouchel, Oracle Marc Overhage, Siemens Paul Stang, J&J Ivan Zorych, Columbia Tyler McCormick, U. Washington Cynthia Rudin, MIT Ben Latham, MIT 60

Let s not do this!