Predictive Modelling of High Cost Healthcare Users in Ontario



Similar documents
Predicting High Cost Users of Ontario s Healthcare System. Health Analytics Branch November 19, 2012

High User Discussion Day. November 19 th, 2012 Nam Bains Health Analytics Branch, HSIMI

Binary Logistic Regression

Truven Health Analytics: Market Expert Inpatient Volume Projection Methodology

STATISTICA Formula Guide: Logistic Regression. Table of Contents

PLANNING CONSIDERATIONS FOR RE-CLASSIFICATION OF REHAB/CCC BEDS (PCRC) Final Report Recommendations for LHINs and HSPs March 2015

Patient Flow Pressures

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Evaluation of Predictive Models

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Best Practice Recommendations for Inpatient Stroke Care: Rationale and Evidence for Integrated Stroke Units in North Simcoe Muskoka LHIN

Guidelines on Person-Level Costing Using Administrative Databases in Ontario. Working Paper Series Volume 1 May 2013

PharmaSUG2011 Paper HS03

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL

Developing Risk Adjustment Techniques Using the System for Assessing Health Care Quality in the

Health Analyst s Toolkit

A Property and Casualty Insurance Predictive Modeling Process in SAS

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic

Quality-Based Procedures

Interim Director, Health Data Branch. Fredrika Scarth Director, Health Quality Liaison and Program Development Branch

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

EXPANDING THE EVIDENCE BASE IN OUTCOMES RESEARCH: USING LINKED ELECTRONIC MEDICAL RECORDS (EMR) AND CLAIMS DATA

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

Risk Adjustment in the Medicare ACO Shared Savings Program

Introduction to Longitudinal Data Analysis

S The Difference Between Predictive Modeling and Regression Patricia B. Cerrito, University of Louisville, Louisville, KY

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

Patient Reported Outcome Measures (PROMs) in England: Update to reporting and case-mix adjusting hip and knee procedure data.

The Common Quality Agenda Measuring Up. Technical Appendix

Customer Profiling for Marketing Strategies in a Healthcare Environment MaryAnne DePesquo, Phoenix, Arizona

Analysis of Care Coordination Outcomes /

LOGISTIC REGRESSION ANALYSIS

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Multi-Sector Accountability Agreement Compliance Reporting to Board of Directors

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Copyright 2006, SAS Institute Inc. All rights reserved. Predictive Modeling using SAS

Total Cost of Care and Resource Use Frequently Asked Questions (FAQ)

Medicine, Complex Continuing Care, and Rehab. Community Forum Presentation

11. Analysis of Case-control Studies Logistic Regression

Predictive Modeling of Titanic Survivors: a Learning Competition

The Sector Linkage Model for Improved Patient Flow. Dr. Peter Nord

Building risk prediction models - with a focus on Genome-Wide Association Studies. Charles Kooperberg

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar

MORTGAGE LENDER PROTECTION UNDER INSURANCE ARRANGEMENTS Irina Genriha Latvian University, Tel ,

HealthForceOntario Ontario s Health Human Resources Strategy

A predictive analytics platform powered by non-medical staff reduces cost of care among high-utilizing Medicare fee-for-service beneficiaries

Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller

Building patient-level predictive models Martijn J. Schuemie, Marc A. Suchard and Patrick Ryan

PHARMACEUTICAL BIGDATA ANALYTICS

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

Advancing High Quality & High Value Hospice Palliative Care

The Basics of Regression Analysis. for TIPPS. Lehana Thabane. What does correlation measure? Correlation is a measure of strength, not causation!

How CDI is Revolutionizing the Transition to Value-Based Care

Building a High Performance Integrated Population Health Infrastructure. Fulfilling Our New Medical Management Responsibilities

An Introduction to HealthInfoNet s HIE Reporting & Analytics. 6th Annual APS Healthcare Maine Conference May 14, 2015

The Impact of Moving to Stroke Rehabilitation Best Practices in Ontario

Hospital Value-Based Purchasing (VBP) Program

Samuel Zuvekas and Gary Olin. Agency for Healthcare Research and Quality Working Paper No March 2008

MedInsight Healthcare Analytics Brief: Population Health Management Concepts

Neurodegenerative diseases Includes multiple sclerosis, Parkinson s disease, post-polio syndrome, rheumatoid arthritis, lupus

Integrated Comprehensive Care Bundled Care

Didacticiel Études de cas

Quick Start. Creating a Scoring Application. RStat. Based on a Decision Tree Model

Mississauga Halton Local Health Integration Network (MH LHIN) Health Service Providers Forum - May 5, 2009

CMS National Dry Run: All-Cause Unplanned Readmission Measure for 30 Days Post Discharge from Inpatient Rehabilitation Facilities

Overview of Factor Analysis

Explanation of care coordination payments as described in Section of the PCMH provider manual

College Readiness LINKING STUDY

A Property & Casualty Insurance Predictive Modeling Process in SAS

Relationship between patient reported experience (PREMs) and patient reported outcomes (PROMs) in elective surgery

What Providers Need To Know Before Adopting Bundling Payments

Statistics, Data Analysis & Econometrics

IBM SPSS Direct Marketing 23

North of Superior Healthcare Group

Leadership Summit for Hospital and Post-Acute Long Term Care Providers May 12, 2015

How To Identify A Churner

IBM SPSS Direct Marketing 22

Aaisha Ghauri Savvas Amber Curry

Interpretation of Somers D under four simple models

Data Mining Techniques Chapter 6: Decision Trees

Using Cost Accounting Data to Develop Capitation Rates

A LOGISTIC REGRESSION MODEL TO PREDICT FRESHMEN ENROLLMENTS Vijayalakshmi Sampath, Andrew Flagel, Carolina Figueroa

Medicare Risk-Adjustment & Correct Coding 101. Rev. 10_31_14. Provider Training

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Transcription:

Predictive Modelling of High Cost Healthcare Users in Ontario Health Analytics Branch, MOHLTC SAS Health User Group Forum April 12, 2013

Introduction High cost healthcare users (HCUs) are patients who incur the highest costs to healthcare The top 5% of users account for 68% of healthcare costs (FY 2010/11) Studying high cost healthcare users is important for: Improving health outcomes Effectively managing HCUs Providing appropriate care Allocating resources appropriately Easing fiscal pressures on healthcare 2

Introduction A predictive model is a statistical model that uses information on characteristics of units to predict a future outcome (for those units) We can use predictive modelling to predict who would become an HCU in the future, using various demographic, SES, clinical, and utilization information A Model that predicts who will become an HCU in the future can help: Forecast expenditures and manage budgets appropriately Implement proactive healthcare to prevent patients from becoming HCUs Reduce resource use/impact/cost of HCUs 3

Purpose of our predictive model: To predict who will / will not become an HCU in the immediate future year, given various patient-level characteristics in the current year and two previous years Study period: Model will estimate HCU status among patients from FY 10/11 using patient characteristics from FY 07/08 - FY 09/10 Model is validated by applying it to patient characteristics from FY 06/07 - FY 08/09 to predict HCU status in FY 09/10 (out of sample prediction power) Statistical technique: Logistic regression model 4

Population scope All Ontario residents that are serviced by the health care system in Ontario during FY 09/10 in one of the following care types (database in brackets): Physician services OHIP (CHDB) Acute care AIP (DAD) Day surgery DS (NACRS) Emergency ER (NACRS) Complex continuing care CCC (CCRS) Rehabilitation Rehab (NRS) Inpatient mental health MH (OMHRS) Long-term care LTC (CCRS) Home care HC (HCD) Dialysis (NACRS) Oncology (NACRS) Outpatient clinic (NACRS) 5

Population scope (Cont.) Exclusions: Patients who die during the FY 09/10 Patients who are under 5 years of age in FY 09/10 WSIB claims Telemedicine claims 6

Identify HCUs High cost healthcare users: the top 5% cost incurring users in FY 10/11 Procedure: 1. Sum costs across all in-scope care types for each user a) Patient cost for AIP, ER, DS, Rehab, CCC, MH, and HC are derived from unit cost X weighted volume of services b) Cost for OHIP claims are represented by fees approved c) Patient cost for LTC are estimated using average cost per patient per day X patient length of stay d) Oncology, dialysis, and outpatient clinic costs were not included 2. Sort users in descending order of total expenditures, and classify the top 5% of users as HCUs 3. Create and add a binary variable to the data to identify patients as either HCU or not 7

Data preparation Identify potentially relevant variables for predicting HCUs from corpus of healthcare databases Filter and extract data from various databases (e.g., remove duplicates, select only most updated record for an assessment) Create rules for resolving discrepancies (e.g., conflicting postal codes, records with overlapping assessment periods) Merge all data (care-specific databases, RPDB, PCCF+, etc) Derive predictor variables (e.g., visit count variables, clinical group variables) Transform continuous variables (e.g., on a log scale, or to a categorical variable) if necessary Reduce levels for a categorical variable through clustering Impute missing values using multiple imputation Reduce number of variables (e.g., identify co-linearity, cluster related variables, screen out redundant and irrelevant variables) 8

More details on imputation of missing variables Imputation of missing variables was performed using PROC MI (multiple imputation) procedure in SAS Monotone regression method was a method of choice In the regression method, a regression model is fitted for a variable with the covariates constructed from a set of effects. Based on the fitted regression model, a new regression model is simulated from the posterior predictive distribution of the parameters and is used to impute the missing values for each variable 9

More details on variable reduction through variable clustering The PROC VARCLUS procedure in SAS divides a set of numeric variables into disjoint or hierarchical clusters PROC VARCLUS was used as a variable-reduction method. A large set of variables was replaced by the set of cluster components with little loss of information. A given number of cluster components does not generally explain as much variance as the same number of principal components on the full set of variables, but the cluster components are usually easier to interpret than the principal components. MAXEIGEN value was chosen 0.7 This option specifies that when choosing a cluster to split, VARCLUS should choose the cluster with the largest second eigenvalue, provided that its second eigenvalue is greater than the MAXEIGEN= value. 10

Creating a predictive model Using data, create a statistical regression model that estimates the outcome (HCU or not) using factors (covariates) that may be influencing the outcome, such as: Demographic variables (e.g., age, sex, RIO score) Clinical variables (e.g., ICD-10 based chapters, with additional splits for diabetes, CHF, COPD) SES variables (e.g., deprivation index (material and social deprivation)) Utilization variables for all care types from current year and previous two years, to account for disease progression (e.g., Number of visits, length of stay) Execute model in SAS using PROC LOGISTIC 11

Creating a predictive model (Cont.) Measure the performance of the model using: Model goodness-of-fit (e.g., Akaike Information Criterion (AIC)) Predictive ability of model for current year (e.g., c statistic) Significance and impact of parameter estimates (e.g., p-value, standardized estimates) 12

Evaluating the predictive power of the model Apply model to FY 06/07 FY 08/09 data to predict HCU status of each patient in FY 09/10, given knowledge of model covariates, as if HCU status is unknown (out of sample prediction power) Measure model performance using: Specificity and sensitivity, positive and negative predictive value, accuracy Select the top 1%, 5%, 10%, 15%, etc. of patients with the highest risk of becoming HCU Receiver operating characteristic (ROC) curve Calibration (goodness-of-fit) curve 13

Results Population of patients: 10,300,856 Number of HCUs: 520,492 (5% of population) Number of variables in initial model: 97 Variables that were transformed: 64 Number of variables reduced due to clustering: 28 Number of variables in final model: 69 14

Results Performance of model: C statistic: 0.865 Percent Concordant: 86.1 Percent Discordant: 13.0 Percent tied: 0.9 Predictive (out-of-sample) performance of model: Metric Selection of patients based on predicted probabilities the top: Formula Notes 1% 5% 10% 15% Sensitivity 15.8% 42.2% 57.1% 66.4% TP/(TP+FN) picks up % of all high users Specificity 99.8% 97.0% 92.5% 87.7% TN/(FP+TN) correctly identifies % of those who are not high users Positive Predictive Value 79.9% 42.6% 28.8% 22.4% TP/(TP+FP) good at confirming high users Negative reassuring that a patient will not 95.7% 96.9% 97.6% 98.0% TN/(FN+TN) Predictive Value become a high user % of true positive and true negative Accuracy 95.5% 94.2% 90.7% 86.7% (TP+TN)/(P+N) out of all patients 15

Results Receiver Operating Characteristic (ROC) plot of model performance on scored 2008 data 16

Results Goodness of fit (calibration) curve on scored 2008 data 17

Discussion Highlights Very strong (in sample) performance (c = 0.865) Graphs show very strong out-of-sample performance Sensitivity/specificity analysis shows good predictive power Limitations No population based case-mix groupers are accessible Used ICD-9 and ICD-10 codes to group patients into ICD-10 chapters as a proxy, where available Large number of predictor variables complicates the application of the model 18

Thank You Project members: Yuriy Chechulin, Senior Methodologist, Health Analytics Branch Amir Nazerian, Methodologist, Health Analytics Branch Saad Rais, Methodologist, Health Analytics Branch Kamil Malikov, Manager, Health Analytics Branch The Health Analytics Branch (HAB) of the Ministry of Health and Long-Term Care, provides high quality information, analyses, and methodological support to enhance evidence-based decision making in the health system. As part of the Health System Information Management and Investment (HSIMI) Division, HAB manages health analytics requests, identifies methods, and creates reports and tools to meet ministry, LHIN and other client needs for accurate, timely, and useful information. Health Analytics Branch: Evidence you can count on.