Data Intensive Science and the Transformation of Knowledge ISMPP Conference April 30, 2013 Carol J. McCall, FSA, MAAA Chief Strategy Officer, GNS Healthcare @CarolMcCall
The Coming Era of Value-Based Healthcare Reform is creating the most comprehensive set of changes in US healthcare since Medicare in 1965 - Re-design healthcare to reward value over volume and outcomes over activity - Driving unprecedented innovation - Creating entirely new notions of value - A fundamental shift indeed, a new paradigm whose scope cannot be overstated
Reform is Tough Medicine Bringing unprecedented challenges - Exposing fundamental gaps in capabilities and knowledge - Threatening long-standing areas of competitive advantages - Re-writing underlying business models - Shifting the balance of power and creating entirely new players - Holds the potential to redraw the entire competitive landscape Need to aggressively adapt or risk long-term viability In ten years, the pharma industry will be paid on outcomes and we have no idea how to get there CEO, Pharmaceutical Company
On the Eve of Crisis USA Inc. Crushing economics that threaten our entire economy - $2.7T annually (~18% of GDP and growing) - 30+% of care doesn t create better outcomes - Entering the boomer wave (8k people per day turn age 65) A Wanamaker Problem We lack the detailed evidence we need for value-based healthcare
An Example of Staggering Differences U.S. health care costs for the aged are sky high December 13, 2009 By Mark Roth / Pittsburgh Post-Gazette It's a startling graph. Annual Per Capita Costs by Age in Different Countries "We are not seeing dramatically higher survival rates at any age in the U.S., notwithstanding much greater expenditures. Life Expectancy and Costs in Different Countries The US has no well-defined strategy for how to deal with this and that often leads to a lot of unnecessary care. * Similar per capita expenditures as Germany or UK would reduce total US healthcare costs by 40%
In the Race for Evidence, Knowing Things is Hard Retractions are on the rise Mistakes in Scientific Studies Surge WSJ August, 2011 When a study is retracted, it can be hard to make its effects go away. In a sign of the times, a blog called "Retraction Watch" has popped up to monitor the flow Theories suggested on why the backpedaling? Journals better at detecting errors Easier to uncover plagiarism Competition / temptation for fraud
In the Race for Evidence, Knowing Things is Hard We Turn Out to Be Just Plain Wrong Studies of Studies Show We Get Things Wrong The Guardian, July 2011 Two recent studies analyzed landmark research on clinical effectiveness Only ~50% have stood the test of time Remainder of them have been Reversed outright Supported, but to a lesser degree Inconclusive (or still unchallenged) Half of what you ll learn in medical school will be shown to be either dead wrong or out of date within five years of graduation. Dr. David Sackett 1. Prasad V, Gall V, Cifu A. The Frequency of Medical Reversal. Arch Intern Med. 2011;171(18):1675-1676. 2. Ioannidis JP. Contradicted and Initially Stronger Effects in Highly Cited Clinical Research. JAMA. 2005;294(2):218-228.
Mark Twain was Right These findings suggest that There's NEVER an excuse to stop monitoring outcomes Such medical reversals, if we pursued them, could be common To do that, we need to: Create ways to find what we re NOT actually looking for Get better at Being Wrong It ain't what you don't know that gets you into trouble. It's what you know for sure that just ain't so. - Mark Twain
Preparing for Surprise A fascinating tour of human fallibility and a new way of looking at wrongness Schulz sees our capacity to err as inseparable from our imagination She links error to human creativity, and in particular, to how we generate and revise our beliefs about the world With new ways to do this, we can get better at Being Wrong and just perhaps, unleash our creativity in healthcare
The New Gold Rush: Big Data Can Big Data fix healthcare? Is our system too broken, or does it need something different? Can it help us find what we weren t looking for?
The New Gold Rush: Big Data Big Data is a Hot Topic Massive data generation Maturing technologies and plummeting costs Hot topic in business Called The Next Frontier for innovation and competition
Data generation and storage is no longer the issue. The bottleneck is now the analytics to turn healthcare data into actionable knowledge to match health interventions to patients. - Participant @ Strata Rx conference October 17, 2012
A Cautionary Note: Correlation vs. Causation Getting it wrong Overweight In Dogs Related To Overweight Owners Public Health Nutrition; June 2009 Correlation: Answers the question What happens when I see? Traditional statistics as well as data mining & pattern matching fall in this category Valuable for many things, but can be misleading Causation: Answers the question What happens when I do? Healthcare demands we know causation (i.e. actions, events or processes that bring about specific effects) Predominantly established through RCTs
A Cautionary Note in the Gold Rush these [lead] to a third change: a move away from the age-old search for causality There is a treasure hunt underway, driven by insights to be extracted [and] the dormant value that can be unleashed by a shift from causation to correlation.
A Cautionary Note in the Gold Rush Only incidentally about potential side effects of a treatment Real target was the observational study - and whether it could be trusted Issue is of paramount importance - Becoming more common - Fast-becoming a toweringly important type of investigation Big Data actually makes spurious correlations more common (not less)
A New Paradigm in Analytics Re-inventing the Science of Evidence Dr. Pearl was recently awarded for his body of work to develop and synthesize two branches of calculus # 1: The de facto standard for reasoning under uncertainty (used everywhere, from voice recognition to self-driving cars) # 2: A calculus for determining cause-and-effect relationships directly from data - A mathematical language for expressing concepts explicitly - Precision and computational benefits of a formal logic - Ability to transfer knowledge reliably (and computationally) Judea Pearl 2011 Turing Award
A New Paradigm in Analytics Re-inventing the Science of Evidence Number theory & RSA encryption algorithms Causal mathematics World Wide Web Rapid-learning directly from data
GNS Healthcare Hypothesis-free discovery of cause-and-effect relationships directly and at scale from observational data Big Data Causal Mathematics Machine Learning Models
An Example of Discovery @ Scale Planning for Surprise The Setting Innovative Healthcare Company National research reputation, a portfolio of publications and rich data assets Recently published on an important drug-drug interaction The Goal Expand Their Ability to Discover Important Results Frustrated by time required; concerned about questions they weren t asking Test GNS approach Reproduce their finding and explore evidence of other (unasked) impacts The Data Details with ICD-9, CPT-4 and NDC codes Patients relevant to their earlier finding 3 Years of Detailed Claims Data GNS Challenge Identify causal links between drugs and outcomes Data completely blinded (all codes were dummies) Reproduce Their Finding (while blindfolded)
Big Data? # Patients 111,641 # Transaction Records 58,181,059 # Diagnosis Codes 12,241 # Procedure Codes 11,174 # Drug Codes (NDC level) 24,447
Big Data! # Patients 111,641 # Transaction Records 58,181,059 # Diagnosis Codes 12,241 # Procedure Codes 11,174 # Drug Codes (NDC level) 24,447 # Hypotheses with Biasing Driver Variables ~45 quadrillion hypotheses 44,690,959,998,504,000
A Penny for Your Hypothesis
The Hypothesis Space You need 44 more of these 1 quadrillion pennies
Challenges The Approach Exhaustive search of hypotheses Modeled time-ordering & interplay of events and exposures Automatically identified causal drivers and adjusted for bias Preserved uncertainty (probabilistic causality) Distributed computational load for fast results (hours) 24
The Results Hypotheses (45x) Adverse Effects Beneficial Effects # Total Hypotheses 44,690,959,998,504,000 # Detected Correlations* 31,481,043 42,471,231 # Detected Causal Relationships* 248 151 * Statistically significant at p=.05 Reduced the space to the meaningful few Reproduced the finding Found things we weren t looking for, including a notable surprise: Possible adverse effect for a commonly prescribed drug Initially replicated in (2) out-of-sample datasets Pursuing additional validation (no blindfolds this time) Correlations Causal Relationships 25
The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record Communicates findings Organizes and collects related works Documents and manages controversies Establishes precedence Ensures confidence and trust Supports reproducibility How does this change as we enter data-intensive discovery?
The Evolution of the Scientific Record Today (the 3 rd Paradigm) Much more complicated and technology-mediated - Data is no longer fully documented, only summarized - The link between evidence and writings is more complex - Computation (and software) integral to reproducibility - Reproducibility itself extends beyond data access and understanding methods - Literature has become huge (tools to handle sheer scale) - Affordances of a scientific record based on print and physical artifacts offer small relief
The Paradigm of Data-Intensive Science We Have Reached a Janus Moment With the arrival of the data-intensive computing paradigm, the scientific record and the supporting system of communication and publication has reach a Janus moment, where we are both looking forward and backward. Clifford Lynch Janus Moments the moment where a new norm is established. They may be planned or unplanned, predictable or not, good or bad; but they effect what is considered normal in society, technology, economics, and politics on a personal and macro level
The Scientific Record in the Fourth Paradigm Data and Software First-class objects Need systematic management and curation in their own right Scientific Journals Bid (slow) farewell to storage & delivery that are essentially images of the printed page Papers will become computational windows to actively understand, reproduce and extend results Reference Data Collections will become an integral part (computed upon rather than read) When updated, will trigger new computations, lead to new or reassessed results Scientific Record Will become a major object of ongoing computation itself THE central reference collection
Engaging the Data-Intensive Scientific Record In the Small Go beyond the paper, with computational tools that engage underlying science and data Move between papers and reference data with great ease and flexibility Integrate with collaborative environments with tools for annotation, authoring, simulation, and analysis In the Large As a large corpus of text and interlinked data resources using a wide range of computational tools Will identify relevant papers of interest, suggest hypotheses that can be tested elsewhere, or allow production of new data or results
Implications for Publication Data-intensive science will ultimately transform both scientific culture and publishing practice, including - Views on open access - Applications of markups and choice of authoring tools - Disciplinary norms about data curation, data sharing and overall data lifecycle In the practice of data-intensive science, one set of data will, over time, figure prominently, persistently, and ubiquitously in scientific work: the scientific record itself Clifford Lynch I urge you to take on the mantle of stewardship for helping make data-intensive science a reality
Thank you Carol J. McCall, FSA, MAAA Chief Strategy Officer, GNS Healthcare @CarolMcCall