How can you unlock the value in real-world data? A novel approach to predictive analytics could make the difference.

What if you could diagnose patients sooner, start treatment earlier, and prevent symptoms from worsening? The secrets to developing more effective treatments and enabling better outcomes likely reside in the volumes of data captured from sources such as patient registries, administrative claims databases, patient and provider surveys, and electronic medical records. This is what the life sciences industry terms real-world data: data used for decision making that are not collected in conventional controlled randomized trials (RCTs). 1 Many companies, though, struggle to derive clear and useful insight from what is effectively a massive chopped salad of information that resides in disparate locations and formats. In our experience, two things can accelerate analysis of complex real-world data and development of more effective treatment approaches: 1. Analytic tools that have the necessary power to identify the complex variable interactions that are predictive of diagnosis and treatment efficacy 2. A discovery-driven research approach that uses data analytics to reveal answers to challenging questions as opposed to traditional hypothesisbased approaches that test pre-defined theories This article describes, at a high level, how this is possible. 1 Using Real World Data for Coverage and Payment Decisions: The ISPOR Real World Data Task Force Report, 2007.

Getting from real-world data to actionable insight Maximizing a therapy s potential requires answers to many questions: What predictors or markers could speed accurate diagnoses and avoid the possibility of misdiagnosis and years of improper treatment? How much earlier could diagnosis occur? Which patients are most likely to require drug treatment? Which patients are most likely to respond to a treatment? Which patients are at risk of negative reaction to a drug treatment? Why do patients go off therapy? Typically, the answers are far from straightforward. They depend on complex and interacting combinations of variables, some obvious and many not so obvious. To find answers, life sciences organizations frequently turn to real-world data (see sidebar) available through sources such as Truven Health Analytics MarketScan Databases or Symphony Health Solutions Integrated Dataverse TM. Despite an increasing focus on real-world data, many life sciences organizations still struggle to derive clear and useful conclusions (known as real-world evidence), limiting their ability to answer questions such as those above. Why is this the case? The biggest challenge is often identifying predictive information within the massive amount of collected and stored real-world data. Determining the best way to organize and analyze years of data from doctor visits, hospital stays, walk-in clinics, lab results, insurance What is real-world data? Most organizations definitions of real-world data center on the premise that it includes data captured without the biases traditionally involved in clinical trials. For example: claims, and pharmacies to find meaningful combinations of variables is a major roadblock for most companies. Studies that do incorporate real-world data often consider hundreds or thousands of potential indicators across tens of thousands of patients, necessitating powerful data management and cleansing processes prior to analysis. Moreover, real-world data collection frequently is not well organized, includes many instances of missing data, and is not readily processed with standard techniques. Quite simply, working with real-word data is messy and time consuming. How can you navigate the path from a messy mix of real-world data to real-world evidence and insights? One way is by rethinking your approach to analyzing real-world data. An ISPOR (International Society for Pharmacoeconomics and Outcomes Research) task force defined real-world data as data used for decision making that are not collected in conventional controlled randomized trials (RCTs) 1. Rather, these data come from patient registries, administrative claims databases, patient and provider surveys, and electronic medical records, as well as large simple trials and supplement data collected alongside randomized clinical trials. 1 ISPOR task force: Using Real-World Data for Coverage and Payment Decisions: The ISPOR Real- World Data Task Force Report; 2007 2 ABPI: The Vision for Real World Data Harnessing the Opportunities in the UK; 2011

Use the most effective predictive analytics tools Effective analysis of real-world data requires sophisticated analytic tools and predictive modeling techniques that can overcome common data challenges and identify complex interactions among a myriad of potential variables. Despite the existence of many techniques to perform these analyses, many predictive models cannot easily identify the hidden relationships among the numerous variables, the interaction of which frequently underlie disease risk, the onset of an exacerbation event, or the likelihood of a positive treatment response. Even when these techniques are able to identify important relationships among predictor variables, there is often difficulty bridging the gap between the results of these analyses and actionable business or medical insights. The result is suboptimal use of real-world data. HyperCube TM, a unique analytic tool (see sidebar), overcomes many of these challenges to reveal important relationships among large numbers of predictor variables and generate insights in a human-readable format that allows for easy interpretation by both technical experts and business users. We have seen HyperCube work particularly well in the life sciences environment. While HyperCube s predictive capabilities are quite powerful, it still faces one of the biggest challenges in predictive modeling that is, appropriate steps to organize, prepare, and cleanse datasets prior to analysis. As with any predictive modeling of real-world data, a substantial amount of data cleansing and transformation is necessary to prepare data for proper analytic processing. Thus, before we analyze data with HyperCube, we employ in-depth data cleaning and preparation processes, including SQL integration work, ETL (extract-transform-load) processes, and big data frameworks such as Apache TM Hadoop. These data preparation steps can account for a significant portion of the work involved in complex real-world data analysis, but they are essential to success. Hypercube HyperCube is a proprietary predictive modeling algorithm and software package that can predict disease/disorder/treatment outcomes from analysis of data sets that include hundreds to tens of thousands of potential predictor variables. HyperCube has similar predictive capabilities as many standard modeling techniques including regression modeling, random forest analysis, etc. but also many features and capabilities that go above and beyond, including Rule mining, variable selection, bagging algorithm, cross validation and visualization. HyperCube is able to take large and complex datasets and derive a group of predictive business rules that identify rare cases in a highly targeted fashion. These rules are human readable and can be understood by both business and technical users. For example, a typical output from HypeCube might read like this: If the patient is male, has blood sugar levels between X and Y, and is positive for biomarker 123c, then the likelihood of developing Disease A is 20 times greater than average. Importantly, analysts can optimize HyperCube to generate more precise sets of predictive rules (e.g., for risky and expensive treatments) or wider sets of predictive rules that cover broad events (e.g., for general screening of patients likely to need a flu vaccine).

Rather than test hypotheses, let data drive discovery The scientific method guides standard approaches to data analysis. These approaches begin with hypotheses formulated from existing research results, intuition, and judgment. The hypotheses are very specific predictions of an answer to the research question, and analyses then test whether or not that single answer is accurate. For example, if our goal is to understand the characteristics of patients who respond better to a certain drug treatment for a chronic condition, we might hypothesize that some patients respond better because they 1) are more likely to take the medication as prescribed and 2) attended a seminar that explained the importance of adherence to therapy. So we test that specific hypothesis by collecting and analyzing relevant data about drug use by individuals who did and did not attend seminars. We use the findings to draw a conclusion, from which we refine the hypothesis or develop a new one (see visual at right) and then repeat the analysis cycle to refine our understanding or test other possible explanatory hypotheses. While this process is rigorous and logically sound, it can be particularly time consuming and frequently results in outcomes that are not easily actionable or interpretable. We have seen life sciences companies accelerate data analysis by taking a different approach one that aims to discover variables for further testing by letting real-world data act as the guide. Rather than employing a sequence of projects to test and refine a single hypothesis, this discovery-driven approach (see visual at right) uses predictive analytic capabilities to reveal unique characteristics of a group under study for example, the distinctive combination of variables common to people likely to choose one drug versus its competitor. This method obviates the dependence on pre-determined hypotheses, and it allows the data to drive an understanding of predictor variables.

Why is this approach beneficial? In biological applications, such as rare diseases, cause is typically not determined by a discrete combination of factors identified in a formulated hypothesis (for example, A+B+C causes rare disease X). Rather, it is more nuanced; for example, for someone of a particular age and ethnicity, a certain combination of biomarkers in conjunction with environmental exposure can predict the likelihood of contracting a disease or reacting to a particular therapy. When viewed this way, it becomes clear why traditional hypothesisbased approaches for identifying predictor variables would have been much slower to identify those combinations if they could at all. Used early in a research project, a discovery-driven approach can pinpoint variables and research avenues that researchers may not previously have considered. It also can reinforce and expand upon existing hypotheses in meaningful ways. HyperCube and a discovery-based approached in action In one case, we used HyperCube to support clinical evaluation of patients with a well-known chronic disease. The study s goal was to identify indicators of patients who were likely to experience a deleterious and dangerous disease-state event as opposed to patients whose conditions remained stable. In this way, it would be possible to advise physicians to screen for high-risk patients and take preventative action. Rather than starting with a hypothesis about the factors that caused some patients to be unstable over a period of years, our analysis with HyperCube examined 100+ variables across populations of both stable and unstable patients to allow the data to drive our understanding of important predictor variables. By doing so, the research team quickly isolated a combination of five variables among the many studied that were associated with patient disease events during therapy. This discovery-based analysis led to identification of previously unconsidered variable combinations and a faster conclusion. In a different case, we used HyperCube to identify a combination of characteristics indicative of an oftenundiagnosed rare disease, enabling the pharmaceutical company to educate physicians to test for this rare disease when they encountered that specific combination of characteristics. This study included a massive amount of data: more than 10 million rows of patient data representing five years of insurance claims data and more than 12,000 columns of potential predictor variables representing all ICD-9 diagnoses occurring before those patients were diagnosed with the rare disease. The analysis considered both single diagnosis codes as well as aggregate variables representing certain combinations of ICD-9 codes as defined by previously published studies. Through this analysis, the research team was able to confirm the predictive strength of the previously published variable set and also identify additional variables that increased predictive coverage and strength.

In both cases HyperCube s unique discovery capabilities were important but so, too, was its humanreadable prescriptive output, which provides direction for taking action. For the studies described above, HyperCube produced specific business rules: if/then statements that allow experts such as doctors and researchers to use the results to advance studies and act on insights gained. Better clinical outcomes, greater commercial success A holistic and robust data-science platform with the right tools, approaches, and expertise is more critical than ever for problem solving and successful therapy. As outlined above, a discovery-driven approach can help improve and focus research and analysis, particularly when used early in a project and combined with the right analytic capabilities for dealing with the complexity and volume of real-world data. Of course, regardless of the sophistication of analytics at hand, analysis does not replace the need for critical thought. Analysts and researchers should always apply a common-sense filter when looking at the results. But taking a discovery-driven approach that challenges common sense can help speed up analysis and speed can make a big difference. The ability to hone in quickly on the populations most affected by a particular therapy can create significant value, both in clinical outcomes and in commercial success. For more information about applying HyperCube to unlock the potential of real-world data, please contact Jim Bedford, jbedford@westmonroepartners.com or [312.980.9393].