Using electronic health records to predict severity of condition for congestive heart failure patients



Similar documents
Uncovering Value in Healthcare Data with Cognitive Analytics. Christine Livingston, Perficient Ken Dugan, IBM

Using Predictive Analytics to Improve Sepsis Outcomes 4/23/2014

Big Data Integration and Governance Considerations for Healthcare

Mortality Assessment Technology: A New Tool for Life Insurance Underwriting

Electronic Medical Record Mining. Prafulla Dawadi School of Electrical Engineering and Computer Science

Advanced Ensemble Strategies for Polynomial Models

Measure Information Form (MIF) #275, adapted for quality measurement in Medicare Accountable Care Organizations

Divide-n-Discover Discretization based Data Exploration Framework for Healthcare Analytics

A Population Health Management Approach in the Home and Community-based Settings

Summary Evaluation of the Medicare Lifestyle Modification Program Demonstration and the Medicare Cardiac Rehabilitation Benefit

Reducing Readmissions with Predictive Analytics

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Home Health Care Today: Higher Acuity Level of Patients Highly skilled Professionals Costeffective Uses of Technology Innovative Care Techniques

The TeleHealth Model

Wireless Remote Monitoring System for ASTHMA Attack Detection and Classification

Risk Adjustment in the Medicare ACO Shared Savings Program

Measuring road crash injury severity in Western Australia using ICISS methodology

Leveraging EHR to Improve Patient Safety: A Davies Story

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013

The Initial and 24 h (After the Patient Rehabilitation) Deficit of Arterial Blood Gases as Predictors of Patients Outcome

A General Approach to Incorporate Data Quality Matrices into Data Mining Algorithms

Accountable Care: Implications for Managing Health Information. Quality Healthcare Through Quality Information

Knowledge Discovery from patents using KMX Text Analytics

James E. Bartlett, II is Assistant Professor, Department of Business Education and Office Administration, Ball State University, Muncie, Indiana.

Clinic + - A Clinical Decision Support System Using Association Rule Mining

Five Myths Surrounding the Business of Population Health Management

D-optimal plans in observational studies

Intro Who should read this document 2 Key Messages 2 Background 2

Crime Hotspots Analysis in South Korea: A User-Oriented Approach

Course Requirements for the Ph.D., M.S. and Certificate Programs

Benjamin M. Marlin Department of Computer Science University of Massachusetts Amherst January 21, 2011

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Big Data Analytics in Health Care

Using EHRs, HIE, & Data Analytics to Support Accountable Care. Jonathan Shoemaker June 2014

Medical Care Costs for Diabetes Associated with Health Disparities Among Adults Enrolled in Medicaid in North Carolina

American Society of Addiction Medicine

Telemedicine as Part of Your Service Line Strategy. Howard J. Gershon, FACHE Principal, New Heights Group March 2011

A Guide for the Utilization of HIRA National Patient Samples. Logyoung Kim, Jee-Ae Kim, Sanghyun Kim. Health Insurance Review and Assessment Service

5/6/2014. Physiologic Monitoring Tools & Use with Patients with Chronic Health Conditions. Objectives. The Issue at Hand

CARE GUIDELINES FROM MCG

Using Predictive Analytics to Reduce COPD Readmissions

How To Create A Health Analytics Framework

Big Data Infrastructure and Analytics for Healthcare Applications Arjun Shankar, Ph.D.

Modernizing Healthcare

Clinic/Provider Name (Please Print or Type) North Dakota Medicaid ID Number

Speaker First Plenary Session THE USE OF "BIG DATA" - WHERE ARE WE AND WHAT DOES THE FUTURE HOLD? William H. Crown, PhD

MobiHealthcare System: Body Sensor Network Based M-Health System for Healthcare Application

Remote Patient Monitoring- An Implementation in ICU Ward

Exploration and Visualization of Post-Market Data

Total Cost of Care and Resource Use Frequently Asked Questions (FAQ)

The HYDRA project. Personal health monitoring

INSIGHT on the Issues

An Analysis of the Transitions between Mobile Application Usages based on Markov Chains

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

BASIC MEDICAL RECORD DEPARTMENT PROCEDURES

How To Transition From Icd 9 To Icd 10

CMS Innovation Center Improving Care for Complex Patients

Coding with. Snayhil Rana

BIG DATA IN HEALTHCARE THE NEXT FRONTIER

Data Mining using Artificial Neural Network Rules

White Paper. Medicare Part D Improves the Economic Well-Being of Low Income Seniors

Study of Wireless Sensor Networks and their application for Personal Health Monitoring. Abstract

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Results from the Commonwealth Fund s State Scorecard on Health System Performance Kansas in comparison to Iowa

A Demonstration of a Robust Context Classification System (CCS) and its Context ToolChain (CTC)

Problem-Centered Care Delivery

Improving Outcomes and Saving Lives in Real Time: How Hospitals Can Use Predictive Analytics Across the Care Continuum Essential Hospitals Engagement

Internet of Things on HealthCare and Chinese Wearable Medical Devices

Ruchika D. Husa, MD, MS Assistant t Professor of Medicine in the Division of Cardiology The Ohio State University Wexner Medical Center

Big Data Analytics Driving Healthcare Transformation

Patient Activation and Engagement for ACOs

New Ensemble Combination Scheme

Risk Adjustment Models for Medicare Part D Capitation Payments Modeling

Assessing the effectiveness of medical therapies finding the right research for each patient: Medical Evidence Matters

Brief Research Report: Fountain House and Use of Healthcare Resources

What Providers Need To Know Before Adopting Bundling Payments

DATA MINING TECHNIQUES AND APPLICATIONS

CMS-1461-P Medicare Program; Medicare Shared Savings Program: Accountable Care Organizations

PharmaSUG2011 Paper HS03

Transcription:

Using electronic health records to predict severity of condition for congestive heart failure patients Costas Sideris costas@cs.ucla.edu Behnam Shahbazi bshahbazi@cs.ucla.edu Mohammad Pourhomayoun mpourhoma@cs.ucla.edu Nabil Alshurafa nabil@cs.ucla.edu Majid Sarrafzadeh Boelter Hall majid@cs.ucla.edu Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. Ubicomp 14, September 13-17 2014, Seattle, WA, USA Copyright c 2014 ACM 978-1-4503-3047-3/14/09$15.00. http://dx.doi.org/10.1145/2638728.2638815 Abstract We propose a novel way to design an analytics engine based exclusively on electronic health records (EHR). We focus our efforts on Congestive Heart Failure (CHF) patients, although our approach could be extended to other chronic conditions. Our goal is to construct statistical models that predict a CHF patient s length of stay and by extension the severity of his/her condition. We show that it is possible to predict length of hospital stay based on physiological data collected from the first day of hospitalization. Using 10-fold cross validation we achieve accurate predictions with a root mean square error of 3.3 days for hospital stays that are less than 15 days in duration. We also propose a clustering of patients that organizes them to risk groups according to their estimated severity of condition. Author Keywords Electronic Health Records, Remote Monitoring Systems, Heart Failure, Intervention, Readmission ACM Classification Keywords J.3 [Computer Applications]: Life and Medical Sciences; H.1.2 [Information Systems]: Human information processing 1187

Introduction Recent advances in wireless sensors, mobile technologies, and cloud computing have made continuous remote monitoring of patients possible [1],[6],[8],[12],[13]. These Remote Health Monitoring Systems (RMS) provide a continuous stream of patient physiological data that allows nurses and doctors to make timely decisions and help patients manage their chronic conditions while minimizing hospital readmission rates. Conventional RMS rely on threshold based alerts. That is, thresholds based on medical expertise are put in place to alert clinicians when physiological data deviate from the set thresholds. Analytics-based RMS on the other hand employ machine learning algorithms to predict the risk of a medical adverse event. It has been shown [9] that analytics-based RMS work better than threshold-based ones and can help reduce treatment costs. Traditionally, designing such machine learning algorithms is guided by an initial, usually smaller pilot study that provides sufficient data for design and validation. This approach however suffers from several limitations. First of all, it requires a substantial effort to collect such initial data, spanning months or even years depending on the studied chronic condition. In addition, selecting which physiological data to collect as well as the collection frequency is based on guesswork since there is limited data to support the decision. In this paper we advocate the use of electronic health records to guide the design of such machine learning algorithms. The benefits of such an approach is that it is based on a vast amount of electronic health records that are available and can be created before the RMS is even deployed. We focus our efforts on Congestive Heart Failure (CHF) patients. We present two methodologies to predict and classify the severity of a patient s condition. These algorithms can be used to identify severity of condition for patients in an RMS and assess the need for intervention, as well as to help schedule clinician time. Methodology Data Processing We used electronic health records (EHR) from the Ronald Reagan UCLA Medical Center between 2005 and 2009. We first identified CHF hospitalizations for adult patients using the same International Classification of Diseases version 9 (ICD-9) codes used in the Centers for Medicare & Medicaid Services (CMS) 30-day readmission measure. A total of 1179 admissions were extracted with primary diagnosis a CHF related ICD-9 code. These records correspond to 913 unique patients out of which 169 had more than one CHF related admission. For each admission record we extracted 7 features: min, max, range, average, median, standard deviation and variance, computed from the patient s heart rate, systolic/diastolic blood pressure, and weight during the first day of hospital stay. Furthermore, 19 categorical features are calculated based on the patient s age, gender, race, ethnicity, ZIP code, type of external care, insurance coverage, perceived severity of condition, perceived mortality and the first 10 comorbidities reported. Attribute Selection We rank these features based on their correlation with a patient s length of stay in the hospital. Using correlation-based feature subset selection [5] we find that the most prominent categorical and numerical features are age, gender, ethnicity, minimum systolic pressure and heart rate range for the first day (See Table 1). 1188

WORKSHOP: SMARTHEALTHSYS 1. age 2. gender 3. ethnicity 4. min. systolic pressure first day 5. range of heart rate first day 6. std systolic pressure first day 7. std diastolic pressure first day 8. perceived severity 9. perceived mortality 10. insurance 11. type of outside care 12. 2nd co-morbidity 13. 4th co-morbidity 14. 5th co-morbidity 15. 8th co-morbidity Table 1: Attributes as ranked by the correlation-based feature subset selection Predicting Length of Stay A good indicator of a patient s severity of condition is the Length of Stay (LOS) in the hospital. As multiple previous studies have indicated [3],[10],[11] contextual information such as age, sex and comorbidities can affect the LOS of a patient along with the actual condition. To predict the LOS of a patient, we use the most correlated categorical features as well as the most correlated vitals that correspond to the first day. Using generalized linear regression we combine the effects of a patient s vitals with the effects of his contextual information to predict LOS. Table 2 presents the average root mean square error (RMSE) using 10-fold cross validation and including only patients that stayed less than 15 days, 30 days, and 60 days respectively. It can be seen that as we try to predict longer stays the RMSE increases.this could be due to the lack of data for these longer stays since the vast majority of patients stay less than 15 days. It could also be due to the fact that for longer stays we need intermediate information from a patient s hospital stay. LOS Considered RMSE (days) No.Records 15 days 3.2788 948 30 days 4.9527 1049 60 days 8.2770 1125 Table 2: Average root mean square error of generalized linear regression with 10-fold cross validation. Grouping Patients by Severity of Condition In the previous section we showed that it is possible to predict a patient s length of stay based on contextual information and first day vitals. This prediction model can be useful when evaluating a patient on a RMS. It provides nursing staff with an estimate of how long that patient would stay in the hospital if he/she were to be admitted that day. To simplify the decision process for the support staff, we design a clustering algorithm that groups patients in terms of their individual risk based on the top two vital features (minimum systolic pressure, heart rate range for the first day). In this study, we define LOS plus 3 times ICU stay as a risk factor : riskfactor = LOS + 3 ICU (1) We cluster the admission records based on hierarchical clustering [7]. Through experimentation we found that using Chebyshev s distance function [2] and complete linkage for the cost function [4] results in the most stable clusters. Figure 1 demonstrates the results of the hierarchical clustering using 12 clusters. We rank and color the clusters based on the average risk from low to 1189

high. Red corresponds to the highest and blue/green to the lowest risk. As it can be seen in Figure 1, there are some issues with our risk factor as the rankings do not follow medical expertise. For example, people with really high systolic blood pressure appear to have shorter duration of stay at the hospital than people with normal levels. The main reason for these discrepancies is that there is a strong age bias on how long a patient stays in the hospital as previous studies have shown [3]. Figure 2 shows that CHF patients between the ages of 55 and 65 years old for example end up staying in the hospital longer on average than older or younger patients. Figure 1: Hierarchical Clustering of patients according to their minimum systolic pressure, heart rate range during the first day of hospitalization. Colors represent average risk per group, with blue/green and red representing the lowest and highest risk respectively. Figure 2: Age versus Length of Stay for CHF patients. It can be seen that there is an age bias on how long a patient stays in the hospital. To amend these issues and obtain a more objective risk grouping regardless of age/sex, we normalize the risk factor of each patient by multiplying it with a value 1190

WORKSHOP: SMARTHEALTHSYS proportional to the average risk factor of his age and sex group. With that modification, and with the same clustering methodology, we obtain a risk ranking that is closely aligned with medical expertise. Figure 3 demonstrates the results of the hierarchical clustering with the modified risk function. Figure 3: Hierarchical Clustering of patients with the risk factor normalized with regards to age/sex. The above clustering suggests that it is possible to rank a patient s severity of condition based on systolic pressure and heart rate measurements. This information can be useful in the context of a RMS for clinicians. Threshold alerts such as systolic pressure above 130 millimeters of mercury end up overwhelming the clinicians as they tend to accumulate very fast with a growing number of patients. Our clustering methodology provides a much more detailed status for each patient as it allows ranking severity and making more informed decisions. In addition, such a scheme could be employed to suggest short term and long term health improvements for the patients. Conclusions Using statistical models constructed from electronic health records, we have shown that it is possible to predict the length of stay of a CHF patient with high accuracy. We have also presented a methodology for modeling patient groups in terms of severity of condition based solely on hospital records. These models can be used to analyze daily information collected from an RMS and allow a nurse/doctor to better design their intervention. The risk factor information could also be used to provide personalized advice to the patient on how to improve their condition in the short and long term as to avoid readmission. In the future we plan to further validate our approach by analyzing data collected by a RMS for congestive heart failure patients and comparing it with our predicted patient risk. We also plan to extend our approach to different chronic conditions such as diabetes and liver disease. 1191

References [1] Bar-Or, A., Healey, J., Kontothanassis, L., and Van Thong, J. Biostream: A system architecture for real-time processing of physiological signals. In Engineering in Medicine and Biology Society, 2004. IEMBS 04. 26th Annual International Conference of the IEEE, vol. 2, IEEE (2004), 3101 3104. [2] Cha, S.-H. Comprehensive survey on distance/similarity measures between probability density functions. City 1, 2 (2007), 1. [3] Charlson, M. E., Pompei, P., Ales, K. L., and MacKenzie, C. R. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. Journal of chronic diseases 40, 5 (1987), 373 383. [4] Defays, D. An efficient algorithm for a complete link method. The Computer Journal 20, 4 (1977), 364 366. [5] Hall, M. A., and Smith, L. A. Practical feature subset selection for machine learning. [6] Jovanov, E., Raskovic, D., Price, J., Krishnamurthy, A., Chapman, J., and Moore, A. Patient monitoring using personal area networks of wireless intelligent sensors. Biomedical Sciences Instrumentation 37 (2001), 373 378. [7] Kaufman, L., and Rousseeuw, P. J. Finding groups in data: an introduction to cluster analysis, vol. 344. John Wiley & Sons, 2009. [8] Lan, M., Samy, L., Alshurafa, N., Suh, M.-K., Ghasemzadeh, H., Macabasco-O Connell, A., and Sarrafzadeh, M. Wanda: An end-to-end remote health monitoring and analytics system for heart failure patients. In Proceedings of the Conference on Wireless Health, WH 12, ACM (New York, NY, USA, 2012), 9:1 9:8. [9] Lee, S. I., Ghasemzadeh, H., Mortazavi, B., Lan, M., Alshurafa, N., Ong, M., and Sarrafzadeh, M. Remote patient monitoring: What impact can data analytics have on cost? [10] Librero, J., Peiró, S., and Ordiñana, R. Chronic comorbidity and outcomes of hospital care: length of stay, mortality, and readmission at 30 and 365 days. Journal of clinical epidemiology 52, 3 (1999), 171 179. [11] Rochon, P. A., Katz, J. N., Morrow, L. A., McGlinchey-Berroth, R., Ahlquist, M. M., Sarkarati, M., and Minaker, K. L. Comorbid illness is associated with survival and length of hospital stay in patients with chronic disability: a prospective comparison of three comorbidity indices. Medical care 34, 11 (1996), 1093 1101. [12] Suh, M.-k., Chen, C.-A., Woodbridge, J., Tu, M. K., Kim, J. I., Nahapetian, A., Evangelista, L. S., and Sarrafzadeh, M. A remote patient monitoring system for congestive heart failure. Journal of medical systems 35, 5 (2011), 1165 1179. [13] Suh, M.-k., Moin, T., Woodbridge, J., Lan, M., Ghasemzadeh, H., Bui, A., Ahmadi, S., and Sarrafzadeh, M. Dynamic self-adaptive remote health monitoring system for diabetics. In Engineering in Medicine and Biology Society (EMBC), 2012 Annual International Conference of the IEEE, IEEE (2012), 2223 2226. 1192