Creating signal from noise: Applying Big Data and advanced analytics Presented by: Stacy Coggeshall, Tufts Health Plan Chris Coloian, CEO Predilytics, Inc. OCTOBER 23-25, 2013 SCOTTSDALE, AZ
Discussion objectives: Discuss how Computer Science and the Big Data revoluion is entering health care Introduce a machine learning approach Demo Machine Learning Technology Present pracical applicaion for the use of Big Data and advanced analyic approach Review a case study 2
Computer Science and Big Data Hype or a new way of business... 3
Several factors are driving the need... 45% annual growth in consumer and healthcare data Resource constraints: Healthcare funding Caregivers Aging populaion Explosion of healthcare mobility soluions 4
Advanced Analytics... Healthcare data analyics that generate insight from big data to focus limited resources to: improve quality of care coordinate care retain membership increase reimbursement manage costs Achieved by using the latest machine learning technology and computer science techniques to idenify opportuniies at both the populaion and member level This approach and technology has been proven in the financial and adverising industries, but has not yet been applied to healthcare AnalyIc soluions: Maximize financial performance Manage limited resources PrioriIze appropriate intervenions Illuminate performance issues UIlizes non- tradiional data sources All at the member and provider level 5
Big data approach How does it work and why is it different? Big Data is the essence of collecing and storing data, both structured and unstructured, from as many different sources as are readily available Big Data comes in the form of clinical data, claims, Rx, lab results, visit notes, messages, updates, images, social networks, sensors, GPS signals, cell phones, demographic, financial, etc. Examples include EMRs, free text, monitoring devices (wellness tracking, scales, blood pressure readings, glucometer readings, etc.), Facebook, Twitter, Smartphones, government census, and more. 6
A different approach to healthcare analysis and prediction State- of- the- art machine learning algorithms Gene-c Algorithms Consistently generates Excep-onal Results: IdenIfies correlaive, causal and data flow relaionships (e.g. emerging relaionships, triggers for acions, early idenificaion) Provides microsegmentaion down to the level of individual healthcare consumers Itera-vely explores both structured and Unstructured Data, Internal and External Client & Private Data Client Data Updated Database of All Available Data AnalyIcs Engine 7
Expansive use of data: Demographic, administrative, operational, clinical, etc. Incorporates data in any format, structured or unstructured Sales / Clinical / Service Ops etc. Approach maximizes data intake to drive highest order prospecive models 8
Illustrative external data sources: Public, consumer, financial, social media, etc. Data Type Source Public Healthcare Medicare, Medicaid PopulaIon Stats Healthcare Providers, Cost, Quality AHRQ, NIH, CDC Consumer Consumer Behavior Ethnicity Social Security / Death Records Voter RegistraIon Financial Consumer spending Credit risk Public records Social Media Facebook AcIvity Foursquare Check- in Twiler AcIvity Google Services, ETC. 9
Analysis methodology Data and Analy-c Design Data intake and QA AnalyIcs Approach Design, including definiion of dependent variable, target populaion, and reporing requirements Data development and enhancement Ongoing feedback for coninuous improvement Model review and acceptance by customer Model tuning with iniial review by customer Model development using machine- learning technology Model Development 10
Background on the technology How does it work and why is it different? Machine learning is a technology in which somware evaluates a data set and combinaions of data sets millions of Imes Predictive patterns in the data are discovered and retained The software builds on previous learnings and highly predictive equations evolve Genetic Algorithms (GAs) are a form of machine learning that are highly effective in spotting subtle patterns in data sets. GA modeling technology and the output are transparent and more actionable Machine learning is capable of exploring more data, faster and more thoroughly than tradiional staisical techniques Traditional modeling relies on statistical analyses of data, in particular various forms of regression, which carry with it certain limitations that are not found in iterative based learning models The results are more predictive models or more equations 11
Genetic Algorithms (GA) Generation One Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Generation Two Model 7 Model 8 Model 9 Model 10 Model 11 Model 12 Generation (n) Model 13 Model 14 Model 15 Model 16 Model 17 Model 18 Model (n) Fitness Accuracy Scale 125 models per generaion in 10 seconds 10,000 generaions performed Low 1.25 Million equaions evaluated with learning past to next generaion High 12
The genetic algorithm advantage
Machine learning technology is optimized for Big Data predictive analytics Tradi-onal Analy-cs Structure PredicIve Modeling Task X = f (A,B,C D,E) + e GA Enhances: Linear Regression Logistic Regression Time Series Survival Analysis Segmentation Data Valuation Variable Reduction Machine Learning OpImize PredicIon of X Start with Random Walks Learns Quickly & Transparently AutomaIon saves analyst Ime for more value- added tasks Descriptive Summary Train / Test Samples Univariate Graphs Variable Transformation Missing Data Candidate Model Development GA Automatic Features Lift Chart / ROC Curve Scoring Code 14
Unstructured data mining and linguistic analysis can provide more accurate and predictive model results Claims and membership data omen represent the majority of model input data However, specific words and word pairs in the comment fields can increase the predicive lim of the models (natural language engine) Examples of data with free text that can be mined: HRA data Clinical visits Sales force notes Call center notes EXAMPLE: Presence of the words: SON or DAUGHTER maps to the concept of family involvement and changing situaion ANALYSIS: When a son (or daughter or other family member) becomes involved, it may be an early indicaion that the parent is experiencing health issues it can also be an early flag for disenrollment or exploring health plan changes 15
Model Prioritizes Interventions Step 1 Predict & Score Predict probability of Event Step 2 Op-mize / Engage Refine score based on likelihood to Impact and Recep-vity of member Step Deploy 3 Apply predicive models and business rules to assign members to opimal intervenion Predict Cost of episode Create Member Score Refined Member Score Member list with Recommended IntervenIon Improved deployment of resources will lead to improved outcomes with lower investment 16
Sample Output Understanding Model Performance COPYRIGHT, 2013 PREDILYTICS, INC. ALL RIGHTS 17
Sample Output COPYRIGHT, 2013 PREDILYTICS, INC. ALL RIGHTS 18
Risk Adjustment Op-miza-on 19
Overview Chart Review, CHA and hospitalizaion predicion model goals Appropriate idenificaion of member health status and risk: IdenIfy members who have undocumented Chronic Co- morbidiies, IdenIfy appropriate provider charts Enhance Comprehensive Home Assessment yield by improving acceptance rates by 20% and yield per assessment by 25%. Increasing Revenue Capture IdenIficaIon of populaion appropriate for care management Decrease uilizaion & opimize revenue AnalyIcs charge: Support year round aciviies vs. end of year crunch Beler access to informaion and disseminaion to providers Provide decision support and reporing ProspecIve targeing to increase recapture and support documentaion Support more effecive provisioning of CHA resources 20
Executive Summary IdenIfied individuals in the target populaion with a high likelihood to accept an assessment (54% projected acceptance) a high likelihood to yield the greatest value (0.52 projected incremental lim) represening a potenial 50% improvement in yield Insights Individuals likely to yield the greatest RAF value: Had the highest disease burden including: High risk adjustment factor Previous hospital admissions These individuals are also half as likely to be enrolled under a year. Individuals most likely to accept an assessment: Were more likely than those who denied a assessment to have a high disease burden Were more likely to have previous hospital admissions Were more likely to be more recent enrollees to Health Plan Were more likely to be unmarried The model found that the top 10% of Individuals in the target populaion are 3 Imes more likely to be hospitalized in the next six months than average. Those most likely to be hospitalized: Had higher overall disease burden Had more previous admissions Had more prescripions than those who were not hospitalized Age and gender distribuion was similar to the member populaion. Were more likely to live in a muli- family structure. 21 21
Optimized assessment targeting Individuals projected to be more receptive to a home assessment and result in higher coding value Members Projected Acceptance Rate Projected Incremental Lim Total Projected Value Engagement / Outreach OpIons Projected revenue opportunity indicates that at only 40% yield, Plan will reach the prior year performance point Projected incremental lim includes all members acceping assessments (including those with 0 incremental lim) Total projected value = (members accepted)*(incremental lim)*($870.42)*(12) 22
Additional insight Identify those who would benefit from care coordination ProspecIve Vendor/High Contacted for Assessments (2012) With Likelihood for HospitalizaIon Overlay High High Low High Low 23
Results Opportunity Value: validation 1,520 members had accepted assessments in 2012 who were in the randomly selected member validaion group (not used to create the model equaion) To verify the model s predicive power, the model equaion was applied to this group as they appeared on the file in June 2012 RAF LiU 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 Model ProjecIon Actual 2012 Result 1 2 3 4 5 6 7 8 9 10 Decile The model projecion tracks closely with the actual 2012 results 24
Results Accepted assessment model validation 3,677 members were selected for assessments in 2012 who were in the randomly selected member validaion group (not used to create the model equaion) To verify the model s predicive power, the model equaion was applied to this group as they appeared on the file in June 2012 Acceptance Rate 70% 60% 50% 40% 30% 20% 10% 0% Model ProjecIon Actual 2012 Result 1 2 3 4 5 6 7 8 9 10 Decile The model projecion tracks closely with the actual 2012 results 25
Discussion 26