Data Intensive Science and the Transformation of Knowledge



Similar documents
Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013

How To Understand The World Of Simple Programs

Navigating the Population Health Management Software Vendor Landscape

How To Use Predictive Analytics To Improve Health Care

How to gather and evaluate information

I N T E L L I G E N T S O L U T I O N S, I N C. DATA MINING IMPLEMENTING THE PARADIGM SHIFT IN ANALYSIS & MODELING OF THE OILFIELD

The Business Case for Using Big Data in Healthcare

MSD Information Technology Global Innovation Center. Digitization and Health Information Transparency

A STRATIFIED APPROACH TO PATIENT SAFETY THROUGH HEALTH INFORMATION TECHNOLOGY

Formal Methods for Preserving Privacy for Big Data Extraction Software

European University Association Contribution to the Public Consultation: Science 2.0 : Science in Transition 1. September 2014

Causation in Systems Medicine: Epistemological and Metaphysical Challenges

Chapter 11. Managing Knowledge

Scholarly Use of Web Archives

Big Data Comes of Age: Shifting to a Real-time Data Platform

The cross-channel insight imperative

Big Data R&D Initiative

PREDICTIVE ANALYTICS FOR THE HEALTHCARE INDUSTRY

User Needs and Requirements Analysis for Big Data Healthcare Applications

Healthcare Challenges and Trends The Patient at the Heart of Care

Liability Claims Trends: Emerging Risks and Rebounding Economic Drivers

ANALYTICS FOR SUPPLY CHAIN AND OPERATIONS

Healthcare Measurement Analysis Using Data mining Techniques

Conquering the Astronomical Data Flood through Machine

Opportunities for Optimism?

THE NEXT NORMAL Five reasons why e-invoicing is fast becoming business as usual

Information Visualization WS 2013/14 11 Visual Analytics

Amajor benefit of Monte-Carlo schedule analysis is to

SAP Makes Big Data Real Real Time. Real Results.

How To Analyze Health Data

Secondary Uses of Data for Comparative Effectiveness Research

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012

CAPTURING UNTAPPED REVENUE: How Customer Experience Insights Improve Remarketing and Customer Recovery Efforts

Web Archiving and Scholarly Use of Web Archives

HOTJOBS FORECAST OF TOP EXECUTIVE JOBS

DEMYSTIFYING BIG DATA. What it is, what it isn t, and what it can do for you.

Statistics 215b 11/20/03 D.R. Brillinger. A field in search of a definition a vague concept

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

WRITING A RESEARCH PAPER FOR A GRADUATE SEMINAR IN POLITICAL SCIENCE Ashley Leeds Rice University

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

Moving from BI to Big Data Analytics in Pharma: Build It or Buy It?

Big Data a threat or a chance?

WHITE PAPER Big Data Analytics. How Big Data Fights Back Against APTs and Malware

2015 Global Payments Insight: Bill Pay Services. With big change comes big opportunity

Three powerful analytics use cases for Customer Link. How linked data powers smarter analytics and better predictive models

DISCOVER MERCHANT PREDICTOR MODEL

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

Interview: Professor Adrian Payne. Handbook of CRM: Achieving Excellence in Customer Management.

The Directors Cut. The power of data: What directors need to know about Big Data, analytics and the evolution of information.

2015 Report on the Current State of Enterprise Risk Oversight:

OpenAIRE Research Data Management Briefing paper

THE HR GUIDE TO IDENTIFYING HIGH-POTENTIALS

AUTOMOTIVE AND SERVICE PARTS

Process Intelligence: An Exciting New Frontier for Business Intelligence

Good morning. It is a pleasure to be with you here today to talk about the value and promise of Big Data.

Sundsvall Statement on Supportive Environments for Health

FIVE NEW BUSINESS DASHBOARDS every Life Insurer Needs

Clintegrity 360 QualityAnalytics

Data Driven Discovery In the Social, Behavioral, and Economic Sciences

Accountable Care: Implications for Managing Health Information. Quality Healthcare Through Quality Information

Find the signal in the noise

Operational Risk Management - The Next Frontier The Risk Management Association (RMA)

Mobile E-Commerce: Friend or Foe? A Cyber Security Study

Health Management Information Systems: Clinical Decision Support Systems

Major Trends in the Insurance Industry

The Digital Economy: Promise and Peril in the Age of Networked Intelligence By Don Tapscott Reviewed by Rupali Babu

Putting IBM Watson to Work In Healthcare

Building and deploying effective data science teams. Nikita Lytkin, Ph.D.

Sino Belgian Business Survey Results. Comparing Apples to Apples

Certificate Program in Applied Big Data Analytics in Dubai. A Collaborative Program offered by INSOFE and Synergy-BI

WHITE PAPER. QualityAnalytics. Bridging Clinical Documentation and Quality of Care

Transcription:

Data Intensive Science and the Transformation of Knowledge ISMPP Conference April 30, 2013 Carol J. McCall, FSA, MAAA Chief Strategy Officer, GNS Healthcare @CarolMcCall

The Coming Era of Value-Based Healthcare Reform is creating the most comprehensive set of changes in US healthcare since Medicare in 1965 - Re-design healthcare to reward value over volume and outcomes over activity - Driving unprecedented innovation - Creating entirely new notions of value - A fundamental shift indeed, a new paradigm whose scope cannot be overstated

Reform is Tough Medicine Bringing unprecedented challenges - Exposing fundamental gaps in capabilities and knowledge - Threatening long-standing areas of competitive advantages - Re-writing underlying business models - Shifting the balance of power and creating entirely new players - Holds the potential to redraw the entire competitive landscape Need to aggressively adapt or risk long-term viability In ten years, the pharma industry will be paid on outcomes and we have no idea how to get there CEO, Pharmaceutical Company

On the Eve of Crisis USA Inc. Crushing economics that threaten our entire economy - $2.7T annually (~18% of GDP and growing) - 30+% of care doesn t create better outcomes - Entering the boomer wave (8k people per day turn age 65) A Wanamaker Problem We lack the detailed evidence we need for value-based healthcare

An Example of Staggering Differences U.S. health care costs for the aged are sky high December 13, 2009 By Mark Roth / Pittsburgh Post-Gazette It's a startling graph. Annual Per Capita Costs by Age in Different Countries "We are not seeing dramatically higher survival rates at any age in the U.S., notwithstanding much greater expenditures. Life Expectancy and Costs in Different Countries The US has no well-defined strategy for how to deal with this and that often leads to a lot of unnecessary care. * Similar per capita expenditures as Germany or UK would reduce total US healthcare costs by 40%

In the Race for Evidence, Knowing Things is Hard Retractions are on the rise Mistakes in Scientific Studies Surge WSJ August, 2011 When a study is retracted, it can be hard to make its effects go away. In a sign of the times, a blog called "Retraction Watch" has popped up to monitor the flow Theories suggested on why the backpedaling? Journals better at detecting errors Easier to uncover plagiarism Competition / temptation for fraud

In the Race for Evidence, Knowing Things is Hard We Turn Out to Be Just Plain Wrong Studies of Studies Show We Get Things Wrong The Guardian, July 2011 Two recent studies analyzed landmark research on clinical effectiveness Only ~50% have stood the test of time Remainder of them have been Reversed outright Supported, but to a lesser degree Inconclusive (or still unchallenged) Half of what you ll learn in medical school will be shown to be either dead wrong or out of date within five years of graduation. Dr. David Sackett 1. Prasad V, Gall V, Cifu A. The Frequency of Medical Reversal. Arch Intern Med. 2011;171(18):1675-1676. 2. Ioannidis JP. Contradicted and Initially Stronger Effects in Highly Cited Clinical Research. JAMA. 2005;294(2):218-228.

Mark Twain was Right These findings suggest that There's NEVER an excuse to stop monitoring outcomes Such medical reversals, if we pursued them, could be common To do that, we need to: Create ways to find what we re NOT actually looking for Get better at Being Wrong It ain't what you don't know that gets you into trouble. It's what you know for sure that just ain't so. - Mark Twain

Preparing for Surprise A fascinating tour of human fallibility and a new way of looking at wrongness Schulz sees our capacity to err as inseparable from our imagination She links error to human creativity, and in particular, to how we generate and revise our beliefs about the world With new ways to do this, we can get better at Being Wrong and just perhaps, unleash our creativity in healthcare

The New Gold Rush: Big Data Can Big Data fix healthcare? Is our system too broken, or does it need something different? Can it help us find what we weren t looking for?

The New Gold Rush: Big Data Big Data is a Hot Topic Massive data generation Maturing technologies and plummeting costs Hot topic in business Called The Next Frontier for innovation and competition

Data generation and storage is no longer the issue. The bottleneck is now the analytics to turn healthcare data into actionable knowledge to match health interventions to patients. - Participant @ Strata Rx conference October 17, 2012

A Cautionary Note: Correlation vs. Causation Getting it wrong Overweight In Dogs Related To Overweight Owners Public Health Nutrition; June 2009 Correlation: Answers the question What happens when I see? Traditional statistics as well as data mining & pattern matching fall in this category Valuable for many things, but can be misleading Causation: Answers the question What happens when I do? Healthcare demands we know causation (i.e. actions, events or processes that bring about specific effects) Predominantly established through RCTs

A Cautionary Note in the Gold Rush these [lead] to a third change: a move away from the age-old search for causality There is a treasure hunt underway, driven by insights to be extracted [and] the dormant value that can be unleashed by a shift from causation to correlation.

A Cautionary Note in the Gold Rush Only incidentally about potential side effects of a treatment Real target was the observational study - and whether it could be trusted Issue is of paramount importance - Becoming more common - Fast-becoming a toweringly important type of investigation Big Data actually makes spurious correlations more common (not less)

A New Paradigm in Analytics Re-inventing the Science of Evidence Dr. Pearl was recently awarded for his body of work to develop and synthesize two branches of calculus # 1: The de facto standard for reasoning under uncertainty (used everywhere, from voice recognition to self-driving cars) # 2: A calculus for determining cause-and-effect relationships directly from data - A mathematical language for expressing concepts explicitly - Precision and computational benefits of a formal logic - Ability to transfer knowledge reliably (and computationally) Judea Pearl 2011 Turing Award

A New Paradigm in Analytics Re-inventing the Science of Evidence Number theory & RSA encryption algorithms Causal mathematics World Wide Web Rapid-learning directly from data

GNS Healthcare Hypothesis-free discovery of cause-and-effect relationships directly and at scale from observational data Big Data Causal Mathematics Machine Learning Models

An Example of Discovery @ Scale Planning for Surprise The Setting Innovative Healthcare Company National research reputation, a portfolio of publications and rich data assets Recently published on an important drug-drug interaction The Goal Expand Their Ability to Discover Important Results Frustrated by time required; concerned about questions they weren t asking Test GNS approach Reproduce their finding and explore evidence of other (unasked) impacts The Data Details with ICD-9, CPT-4 and NDC codes Patients relevant to their earlier finding 3 Years of Detailed Claims Data GNS Challenge Identify causal links between drugs and outcomes Data completely blinded (all codes were dummies) Reproduce Their Finding (while blindfolded)

Big Data? # Patients 111,641 # Transaction Records 58,181,059 # Diagnosis Codes 12,241 # Procedure Codes 11,174 # Drug Codes (NDC level) 24,447

Big Data! # Patients 111,641 # Transaction Records 58,181,059 # Diagnosis Codes 12,241 # Procedure Codes 11,174 # Drug Codes (NDC level) 24,447 # Hypotheses with Biasing Driver Variables ~45 quadrillion hypotheses 44,690,959,998,504,000

A Penny for Your Hypothesis

The Hypothesis Space You need 44 more of these 1 quadrillion pennies

Challenges The Approach Exhaustive search of hypotheses Modeled time-ordering & interplay of events and exposures Automatically identified causal drivers and adjusted for bias Preserved uncertainty (probabilistic causality) Distributed computational load for fast results (hours) 24

The Results Hypotheses (45x) Adverse Effects Beneficial Effects # Total Hypotheses 44,690,959,998,504,000 # Detected Correlations* 31,481,043 42,471,231 # Detected Causal Relationships* 248 151 * Statistically significant at p=.05 Reduced the space to the meaningful few Reproduced the finding Found things we weren t looking for, including a notable surprise: Possible adverse effect for a commonly prescribed drug Initially replicated in (2) out-of-sample datasets Pursuing additional validation (no blindfolds this time) Correlations Causal Relationships 25

The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record Communicates findings Organizes and collects related works Documents and manages controversies Establishes precedence Ensures confidence and trust Supports reproducibility How does this change as we enter data-intensive discovery?

The Evolution of the Scientific Record Today (the 3 rd Paradigm) Much more complicated and technology-mediated - Data is no longer fully documented, only summarized - The link between evidence and writings is more complex - Computation (and software) integral to reproducibility - Reproducibility itself extends beyond data access and understanding methods - Literature has become huge (tools to handle sheer scale) - Affordances of a scientific record based on print and physical artifacts offer small relief

The Paradigm of Data-Intensive Science We Have Reached a Janus Moment With the arrival of the data-intensive computing paradigm, the scientific record and the supporting system of communication and publication has reach a Janus moment, where we are both looking forward and backward. Clifford Lynch Janus Moments the moment where a new norm is established. They may be planned or unplanned, predictable or not, good or bad; but they effect what is considered normal in society, technology, economics, and politics on a personal and macro level

The Scientific Record in the Fourth Paradigm Data and Software First-class objects Need systematic management and curation in their own right Scientific Journals Bid (slow) farewell to storage & delivery that are essentially images of the printed page Papers will become computational windows to actively understand, reproduce and extend results Reference Data Collections will become an integral part (computed upon rather than read) When updated, will trigger new computations, lead to new or reassessed results Scientific Record Will become a major object of ongoing computation itself THE central reference collection

Engaging the Data-Intensive Scientific Record In the Small Go beyond the paper, with computational tools that engage underlying science and data Move between papers and reference data with great ease and flexibility Integrate with collaborative environments with tools for annotation, authoring, simulation, and analysis In the Large As a large corpus of text and interlinked data resources using a wide range of computational tools Will identify relevant papers of interest, suggest hypotheses that can be tested elsewhere, or allow production of new data or results

Implications for Publication Data-intensive science will ultimately transform both scientific culture and publishing practice, including - Views on open access - Applications of markups and choice of authoring tools - Disciplinary norms about data curation, data sharing and overall data lifecycle In the practice of data-intensive science, one set of data will, over time, figure prominently, persistently, and ubiquitously in scientific work: the scientific record itself Clifford Lynch I urge you to take on the mantle of stewardship for helping make data-intensive science a reality

Thank you Carol J. McCall, FSA, MAAA Chief Strategy Officer, GNS Healthcare @CarolMcCall