How Big Data and Causal Inference Work Together in Health Policy Lisa Lix, University of Manitoba Thematic Program on Statistical Inference, Learning, and Models for Big Data June 12, 2015
Presentation Overview Background, purpose and objectives of the Health Policy Workshop Speakers and topics Comments from participants: What was most valuable? Conclusions Research and training opportunities
Major Inferential Challenges Assessment of sampling biases Inference about tails Resampling inference Change point detection Reproducibility of analyses Causal inference for observational data Efficient inference for temporal streams
Manitoba Centre for Health Policy Data Repository CancerCare Registry Family Services Income Assistance Social Housing Education Healthy Child MB Family First Healthy Baby EDI Hospital Pharmaceuticals Home Care Population- Based Health Registry Immunization Medical Services Lab Health Links Nursing Home Census Data at DA/EA Level ER Vital Statistics Provider Clinical ICU FASD Pediatric Diabetes
Characteristics of Big Data in Health Policy Diversity of linked databases Unstructured data Physician notes, laboratory test results Genomics, imaging, longitudinal data Population-based data vs. clinical registry data (wide/broad versus deep/narrow) Privacy-conscious analysis environments
Workshop Rationale A massive numbers of variable associations can be explored in big healthcare databases Causality is hard to establish and is a vague and poorly specified construct Traditional approaches for causal inference, such as regression adjustment and stratification, have limitations in big data environments
Workshop Context Theoretical frameworks, study designs and analytic methods that allow causality to be inferred from large, observational data. What is challenging and unique about exploring causality in big data settings? Examples of large-scale investigations in which causal inferences are being explored.
Workshop Objectives To provide a forum for health policy analysts/program staff to engage with statisticians to discuss research challenges and opportunities; To serve as a catalyst for exploring research collaborations; To expose participants to innovations in study design techniques and methods for health policy research
Overview of Topics Day 1: Opening panel session; data quality; graphical methods for causal inference Day 2: Pragmatic trials; causality in comparative effectiveness research; healthcare system applications; microsimulation modeling Day 3: Propensity score models; drug safety and effectiveness; diagnostic testing Day 4: marginal structural models; external validity of observational studies and clinical trials; models to test for periodicity and trend effects Day 5: medical device safety and effectiveness; cautions when working in big data environments
Opening Panel Speakers Michael Schull, Institute for Clinical Evaluative Sciences (ICES) Arlene Ash, University of Massachusetts Medical School Mark Smith, Manitoba Centre for Health Policy (MCHP) Therese Stukel, University of Toronto & ICES Overview of data repositories at ICES, MCHP, and in other jurisdictions
Opening Panel Challenges Data privacy & confidentiality Differences in data structures (even for common sources) Siloes: government, academia, industry Opportunities Emphasis on evidence to inform decision making Skill development amongst researchers & analysts Increased emphasis on the value of cross-disciplinary research
Session on Data Quality Speakers Mark Smith, Manitoba Centre for Health Policy Mahmoud Azimaee, Institute for Clinical Evaluative Sciences Topics Data quality frameworks Automated graphical, and inferential techniques for data quality evaluation Data scale issues
Workshops Peter Austin, University of Toronto Propensity score methods for estimating treatment effects using observational data The propensity score is the probability of treatment assignment conditional on observed baseline covariates Four different methods of using the propensity score were discussed: matching, weighting, stratification, and covariate adjustment Emphasized the role of the propensity score as a balancing score
Workshops Erica Moodie, McGill University Marginal structural models Produce semi-parametric estimates that adjust for timedependent confounding in observational studies about the effect of time-varying treatments on binary outcomes Assumptions required for model identification were discussed Three approaches to estimation were presented: inverse probability weighting, g-computation, and g-estimation
Other Speaker Topics Jonas Peters, Max Planck Institute for Intelligent Systems Three Ideas for Causal Inference Ideas that aim at solving the problem of causal inference in observational data: additive noise models: assume that the involved functions are of a particularly simple form constraint-based methods: relate conditional independences in the distribution with a graphical criterion called d-separation invariant prediction: makes use of observing the data generating process in different "environments"
Other Speaker Topics Patrick Heagerty, University of Washington Pragmatic Trials and the Learning Health Care System Stepped wedge cluster randomised trial increasingly being used in the evaluation of service delivery type interventions design involves random and sequential crossover of clusters from control to intervention until all clusters are exposed. use is on the increase: HIV, cancer treatment, healthcare associated infections, healthcare treatments well suited to evaluations that do not rely on individual patient recruitment
Common Trial Designs
Stepped Wedge Design
Other Speaker Topics Elizabeth Stuart, Johns Hopkins University Using big data to estimate population treatment effects Analysis methods for improved external validity Goal: to make statements about the likely effects of a treatment in the target population Assessing and enhancing external validity with respect to the characteristics of trial and population subjects Some approaches: meta-analysis, cross-design synthesis, reweighting (reweight trial members to full population using inverse probability of participation weights)
Applications Xiochun Li, Indiana University EMR² : Evidence Mining Research in Electronic Medical Records Towards Better Patient Care Indianapolis Network for Patient Care Created in 1995 Houses clinical data from over 80 hospitals, public health departments, local laboratories and imaging centers, surgical centers, and a few large-group practices closely tied to hospital systems, for approximately 13.4 million unique patients Data are being used for comparative effectiveness and pharmaco-epidemiology research
Applications Danica Marinac-Dabic, Center for Devices and Radiological Health, Food and Drug Administration, USA MDEpiNet: Strengthening Medical Device Ecosystem for Surveillance and Innovation Public-private partnership that provides leadership in innovative data source development and analytic methodologies for implementation of medical device research and surveillance to enhance patient- centered outcomes
National Medical Device Postmarket Surveillance Plan 24
Applications Dan Chateau, University of Manitoba Implementing a research program using data from multiple jurisdictions: The Canadian Network for Observational Drug Effect Studies (CNODES)
Canadian Network for Observational Drug Effect Studies (CNODES) Network of over 60 Canadian pharmacoepidemiologists, biostatisticians, clinicians, clinical pharmacologists, pharmacists, IT professionals, data analysts, and students using linked administrative data in 7 provinces plus CPRD and US data. Timely responses to queries from Canadian public stakeholders regarding drug safety and effectiveness Funded by the Canadian Institutes of Health Research (CIHR) through the government s Drug Safety and Effectiveness Network (DSEN) program to create CNODES, in March 2011 for five years.
Typical CNODES Study Query (potential safety signal) from Health Canada Review and feasibility studies Distributed network» Up to 9 sites (7 provinces + GPRD + US MarketScan) Common protocol Different data structures/availability Typically > 500 potential (but non-specific) confounders Meta-analysis combines results across studies Methods team: provides statistical/epidemiological expertise across projects
2-1 -
C. Blais, S. Jean, C. Sirois, L. Rochette, C. Plante, I. Larocque, M. Doucet, G. Ruel, M. Simard, P. Gamache, D. Hamel, D. St-Laurent, V. E mond, Quebec Integrated Chronic Disease Surveillance System (QICDSS), an innovative approach. Chronic Diseases and Injuries in Canada, 2014:34(4):226-235. Quebec Integrated Chronic Disease Surveillance System
Participant Comments The challenges of working with big data requires an interdisciplinary and collaborative approach Significant investments of time, expertise, and resources are needed to work with large datasets Important to have a grounding in statistical theory: e.g., structural models Uncertainty: in data acquisition, data quality, and modeling Big data can be blended or combined in creative ways to address policy-relevant problems
Conclusions: An Era of Data-Centric Science Requires new paradigms that address how data are captured, processed, discovered, exchanged, distributed, analyzed Traditional methods have largely focused on analysts being able to develop and analyze data within their own computing environment(s) The distributed and heterogeneous nature of large databases provides substantial challenges
Conclusions: Training Opportunities Real-world simulations Observational Medical Outcomes Partnership/Observational Health Data Sciences Distributed data models Increased emphasis on standardized protocols (e.g., common data model) Model development, selection, evaluation
Thanks to Planning Committee Constantine Gatsonis Thérèse Stukel Sharon-Lise Normand Special thanks to Nancy Reid for advice and guidance
Contact Information Lisa Lix, PhD P.Stat. e-mail: lisa.lix@umanitoba.ca website: www.chimb.ca