USING DATA SCIENCE TO DISCOVE INSIGHT OF MEDICAL PROVIDERS CHARGE FOR COMMON SERVICES



Similar documents
MASSACHUSETTS RESIDENTS CENTRAL MA. Acute Care Hospital Utilization Trends in Massachusetts FY

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR.

Using Data Mining for Mobile Communication Clustering and Characterization

MASSACHUSETTS RESIDENTS NORTHEAST MA. Acute Care Hospital Utilization Trends in Massachusetts FY

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Social Media Mining. Data Mining Essentials

MASSACHUSETTS RESIDENTS WESTERN MA. Acute Care Hospital Utilization Trends in Massachusetts FY

Database Marketing, Business Intelligence and Knowledge Discovery

Data Mining Applications in Higher Education

PharmaSUG2011 Paper HS03

Estimating the Impact of the Transition to ICD-10 on Medicare Inpatient Hospital Payments

Data Mining Part 5. Prediction

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Healthcare Measurement Analysis Using Data mining Techniques

IWCC 50 ILLINOIS ADMINISTRATIVE CODE Section Illinois Workers' Compensation Commission Medical Fee Schedule

Working with telecommunications

Don t Underestimate the Impact of MS-DRGs on Your Bottom Line

RAC Lessons Learned Medicare s s Recovery Audit Contractor (RAC) Program

Keywords data mining, prediction techniques, decision making.

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

An Overview of Knowledge Discovery Database and Data mining Techniques

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III

Overview Applications of Data Mining In Health Care: The Case Study of Arusha Region

Selection of a DRG Grouper for a Medicaid Population

Comparison of K-means and Backpropagation Data Mining Algorithms

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

Data Mining System, Functionalities and Applications: A Radical Review

CLINICAL DECISION SUPPORT FOR HEART DISEASE USING PREDICTIVE MODELS

Qi Liu Rutgers Business School ISACA New York 2013

An Introduction to Data Mining

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

How To Analyze Health Data

Data Mining and Soft Computing. Francisco Herrera

Why is Internal Audit so Hard?

Schneps, Leila; Colmez, Coralie. Math on Trial : How Numbers Get Used and Abused in the Courtroom. New York, NY, USA: Basic Books, p i.

Exploring the Impact of the RAC Program on Hospitals Nationwide. Results of AHA RACTRAC Survey, 4 th Quarter 2012

Chapter 12 Discovering New Knowledge Data Mining

not possible or was possible at a high cost for collecting the data.

benchmarking tools for reducing costs of care

CMS Updates. CMS Releases FY 2015 Proposed IPPS. Protecting Access to Medicare Act of 2014 (H.R. 4302)

How To Know The Nursing Workforce

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Web Data Mining: A Case Study. Abstract. Introduction

Moving Towards Bundled Payment

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

Postacute Care Transfer Rule Review. HFMA Northern California COMPLIANCE WEBINAR SERIES California Statewide Webinar February 2012.

CLUSTERING AND PREDICTIVE MODELING: AN ENSEMBLE APPROACH

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

Data Mining: Using Data to Get Results. Catherine (Kate) H. Clark, CPC, CRCE-I Vice President Kohler HealthCare Consulting, Inc.

Using multiple models: Bagging, Boosting, Ensembles, Forests

Introduction to Data Mining

FY2015 Final Hospital Inpatient Rule Summary

Johns Hopkins HealthCare LLC: Care Management and Care Coordination for Chronic Diseases

Optum One. The Intelligent Health Platform

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

DATA MINING AND REPORTING IN HEALTHCARE

Impelling Heart Attack Prediction System using Data Mining and Artificial Neural Network

MS1b Statistical Data Mining

Additional Utilization, Payer and Case Mix Information for Florida Acute Care Hospitals

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

All Patient Refined DRGs (APR-DRGs) An Overview. Presented by Treo Solutions

Data Warehousing and Data Mining in Business Applications

Using Predictive Analytics to Detect Fraudulent Claims

Data Mining + Business Intelligence. Integration, Design and Implementation

Patient Criteria: Modeling in LTRAX

Improving Inpatient Psychiatric Payment Methods

Final Project Report

Data Mining: Overview. What is Data Mining?

WARRANTY CREDITS GINGER MANWELL SR. DIRECTOR, INTERNAL AUDIT CLEVELAND CLINIC

Data Mining Solutions for the Business Environment

A Survey on Intrusion Detection System with Data Mining Techniques

Data Mining Techniques in CRM

Measure Information Form (MIF) #275, adapted for quality measurement in Medicare Accountable Care Organizations

DATA MINING TECHNIQUES AND APPLICATIONS

Medicare- Medicaid Enrollee State Profile

Role of Social Networking in Marketing using Data Mining

Machine Learning using MapReduce

IBM SPSS Modeler Professional

WHITE PAPER. Payment Integrity Trends: What s A Code Worth. A White Paper by Equian

Introduction to Data Mining

O N L I N E A P P E N D I X E S. Hospital inpatient and outpatient services

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Azure Machine Learning, SQL Data Mining and R

Healthcare Data Mining: Prediction Inpatient Length of Stay

Transcription:

USING DATA SCIENCE TO DISCOVE INSIGHT OF MEDICAL PROVIDERS CHARGE FOR COMMON SERVICES Irron Williams Northwestern University IrronWilliams2015@u.northwestern.edu Abstract--Data science is evolving. In the minds of many business leaders, data science has evolved into a business necessity. Forward thinking leaders have formed data science departments which have proven to bring new business values across the entire organization. Data science departments are equipped with professionals who can work autonomously and create new data science applications. These data science applications are positioned to derive valuable insights from data. Effective data science teams have proven to create tremendous competitive advantages. This paper explores the concept of data mining. Data mining is the process of extracting knowledge from data using different technologies, processes and algorithms. The paper looks into data mining from the perspective of an analysis conducted on data retrieved from a government website. The analysis considers a dataset which includes information comparing the charges for the 100 most common inpatient services and 30 common outpatient services. The analysis highlights the variation across the US in what medical providers charge for common services. The paper discusses data mining applications, data mining themes, CRISP-DM, data understanding, data preparation and the results of a clustering algorithm. The complexity and size of healthcare data are unlikely to be managed and processed by traditional data analysis methods. Data mining in the healthcare domain provides the methodology and technology to discover patterns, trends and relationships within complex healthcare data. Healthcare organizations such as United HealthCare and The Group Health Cooperative, among others, have implemented effective data mining applications which have resulted in more effective treatment, better management of healthcare, improved customer relationships and detection of medical fraud and abuse. The cluster analysis of the IPPS dataset retrieved from data.cms.gov resulted in six distinct clusters which highlighted the variation in what medical providers charge for common hospital services. The analysis grouped each cluster based upon medical diagnosis, medical provider city and state, total number of discharges, average covered charges and average total payment received from Medicare. The analysis focused on medical providers in New York, Florida, Pennsylvania, California, Maryland and New Jersey. Considering the identified clusters, further analysis can be conducted to determine if operating expenses or capital expenses should be evaluated or adjusted in effort to control costs and infuse equity in the payment system. Additional data can be collected to develop a predictive modeling application. The predictive model can be leveraged to predict future expenses and billings. Artificial neural networks and decisions trees can also be a part of the predictive modeling strategy. I. INTRODUCTION Data are being released that show significant variation across the country and within communities in what medical providers charge for common hospital services. These data include information comparing the charges for the 100 most common inpatient services and 30 common outpatient services. The Department of Health And Human Services (DHHS) is looking to gain insight into the Inpatient Prospective Payment System (IPPS). IPPS is the Medicare payment system for hospital services; based on diagnosis related groups (DRG s). The DHHS has requested an analysis which identifies variations in hospital service charges among common patient services. The analysis will provide detailed information on characteristics of the dataset. The insight will help DHHS determine if base operating 1

and capital rates paid to medical providers should be adjusted. II. DATA MINING APPLICATIONS The extensive amounts of data generated by healthcare transactions are both complex and voluminous. The complexity and size of the data are unlikely to be managed and processed by traditional data analysis methods. Data mining provides the methodology and technology to discover patterns, trends and relationships within complex healthcare data. Data mining can take this complex and voluminous healthcare data and identify insights which will improve decision making. Several factors have increased the use of data mining applications in healthcare. The existence of medical insurance fraud and abuse has led to multiple healthcare providers leveraging data mining tools to decrease their losses, reduce healthcare fraud and detect abuse. Another contribution factor for the increase in data mining applications in healthcare is healthcare organizations are under heightened pressure to make sound decisions based upon the analysis of clinical and financial data. Insights gained from data mining can influence cost, revenue and operating efficiency while maintaining a high level of care. Healthcare organizations that perform data mining are better positioned to meet their long term needs (Koh & Tan). There is a vast potential for data mining applications in healthcare. Generally, these applications can be grouped as the evaluation of treatment effectiveness, management of healthcare, customer relationship management and detection of fraud and abuse. Data mining applications have been developed to evaluate the effectiveness of medical treatment. United HealthCare has developed an application where it compares and contrasts causes, symptoms and courses of treatments which provides a most effective course of action. To aid in healthcare management, data mining applications have been developed to better identify and track chronic disease states and high-risk patents, design appropriate interventions and reduce the number of hospital admission and claims. For example, The Group Health Cooperative stratifies its patent populations by demographic characteristics and medical conditions to determine which groups use the most resources. The data mining application enables The Group Health Cooperative to develop programs to help educate these populations and prevent or manage their conditions. Data mining applications have been developed in the healthcare industry to determine the preferences, usage patterns and needs of individuals to improve their level of satisfaction. Customer Potential Management Corp has developed a Consumer Healthcare Utilization Index that provides an indication of an individual s propensity to use specific healthcare services. In regards to fraud and abuse, the Utah Bureau of Medical Fraud has minded the mass of data generated by millions of prescriptions, operations and treatment courses to identify unusual patterns and uncover fraud (Koh & Tan). III. DATA MINING THEMES Data mining is a technology that blends traditional data analysis methods with sophisticated algorithms for processing large volumes of data (Tan, Kumar, Steinbach). Data mining is focused specifically on processes and algorithms that discover patterns from raw data. An efficient way to discover such patterns from raw data is by way of data mining themes. Data mining tasks can be classified into themes such as classification, regression, association rule learning, clustering, and anomaly detection, sequential pattern mining and artificial neural networks. Classification analysis uses historical data patterns to predict behavior of new data. An example of such a prediction model is to identify customers who are likely to respond to a given offer. A regression analysis attempts to predict the numerical value of some variable. The predicted value of home prices on the East Coast during the next five years is an example of a regression analysis. Clustering attempts to group individuals in a population together by their similarity. Grouping individuals by political associations, gender and income is an example of a clustering analysis. Association rule learning identifies relationships between attributes within a dataset. A market analysis identifying commonly purchased items is a type of association rule learning. Anomaly detection helps identify hard to detect patterns in a large data set. Credit card companies and the IRS use such analysis to detect fraud. An artificial neural network is a connectionist computational system. It is a complex adaptive system. Artificial neural networks assist 2

doctors with their diagnosis by analyzing reported symptoms and or image data such as MRI s or X-rays. 1V. DATA UNDERSTANDING The analysis is based upon a dataset retrieved from data.cms.gov. The dataset represents a provider level summary of Inpatient Prospective Payment System (IPPS) discharges, average charges and average Medicare payments for the Top 100 Diagnosis-Related Groups (DRG). The dataset contains 163,065 instances and 11 attributes. The dataset contained a mixture of nominal and numeric values or discrete and continuous attributes. The attributes are defined as follows: DRG Definition: Definition of diagnosis.. Provider ID: Provider Identifier billing for inpatient hospital services. Provider Name: Name of provider. Provider Street Address: Street address in which the provider is physically located. Provider City: City in which the provider is physically located. Provider State: State in which the provider is physically located. Provider Zip code: Zip code in which the provider is physically located. Provider Region: Region in which the provider is physically located. Total Discharges: The number of discharges billed by the provider for inpatient hospital services. Average Covered Payments: The provider's average charge for services covered by Medicare for all discharges in the DRG. These will vary from hospital to hospital because of differences in hospital charge structures. Average Total Payments: The average of Medicare payments to the provider for the DRG including the DRG amount, teaching, disproportionate share, capital, and outlier payments for all cases. Also included are co-payment and deductible amounts that the patient is responsible for. V. DATA PREPARATION Upon initial review of the dataset retrieved from data.cms.gov, there were no issues with data quality. The dataset was checked for missing values, duplicate data and outliers. There were no issues related to missing, duplicate or outliers within the data. Attributes such as Provider ID and attributes related to street address were excluded from the analysis. VI. DATA MINING ALGORITHM Cluster analysis groups data objects based on information found in the data that describes the objects and their relationships. The goal is that the objects within a group be similar or related to one another and different from or unrelated to the objects in other groups. The greater the similarity or homogeneity within a group and the greater the difference between groups leads to a better or more distinct clustering. Clustering is useful in data concept construction, pattern detection and unsupervised learning processes. The most popular centroid-based clutering algorithm is called k-means clustering. The k-means algorithm starts by creating k-initial cluster centers and assigns a centroid for each cluster. Each data point will be compared to a centroid and based upon the distance between the data point and centroid point, the data point is assigned to the closest centroid. The process iterates until the clusters reaming constant. Considering the structure of the dataset and the objective of the analysis, clustering was selected as an appropriate data mining algorithm. I was in search of a richer IPPS data file which included discrete binary attributes or an IPPS data file for another time period. A more robust or additonal dataset would have allowed for additonal analysis. Completing a cluster analysis with a more robust data file using the SimpleKMeans algorithm in WEKA, would have resulted in a dataset that assigned each recored to a cluster. This ancillary datasource would have been a qualifying dataset to be used for a classification model. If the dataset retreived from data.cms.gov contained such data; I would have been able to apply the J48 alogrithm to the resulting dataset in effort to develop a predictive model. So in essence, I would have used the original IPPS dataset to cluster data into similar/dissimiliar groups. I would 3

have then used the resulting dataset that assigned each record to a cluster for the basis of a classification This approach would have been a combination of a supervised and unsupervised learning analtyics. Considering the sructure of the dataset retrieved from data.cms.gov, the classifcation algorithms were not available in WEKA. Thus a cluster analysis was applied to the IPPS dataset. analysis. The classification analysis can be used to make a prediction of unknown future records. WEKA was the tool used to conduct the cluster analysis. WEKA is a collection of machine learning algorithms used for data mining tasks. The k-means algorithm was selected and resulted in 6 clusters. Each cluster is unique in terms of IPPS characteristics. The 6 clusters are summarized below. VII. RESULTS AND ANALYSIS DRG Definition Cluster 0 Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 25,310 68,110 26,347 19,532 10,455 13,311 HEART FAILURE & SHOCK W CC MISC DISORDERS OF NUTRITION,METABOLISM, FLUIDS/ELECTROLYTES W/O MCC CHRONIC OBSTRUCTIVE PULMONARY DISEASE W/O CC/MCC SEPTICEMIA OR SEVERE SEPSIS W/O MV 96+ HOURS W/O MCC MAJOR SMALL & LARGE BOWEL PROCEDURES W MCC PERMANENT CARDIAC PACEMAKER IMPLANT W CC Provider State NY FL PA CA MD NJ Provider Region East Long Island Orlando Philadelphia Los Angeles Baltimore Camden Total Discharges 62 41 35 36 37 48 Average Covered Charges $29,810 $21,244 $32,253 $61,401 $77,372 $62,556 Average Total Payments $11,220 $5,821 $8,120 $11,656 $25,517 $14,575 Cluster0: Heart Failure & Shock: Medical providers billing for inpatient hospital services tend to be in East Long Island NY. Medical providers in this region tend to bill for 62 inpatient hospital discharges. On average, the providers charge $29,810 for services covered by Medicare. Medical providers receive an average of $11,220 in Medicare payments. Cluster1: Nutrition & Metabolism: Medical providers billing for inpatient hospital services tend to be in Orlando FL. Medical providers in Orlando tend to bill for 41 inpatient hospital discharges. On average, the providers charge $21,244 for services covered by Medicare. Medical providers receive an average of $5,821 in Medicare payments. Cluster2: Pulmonary Disease: Medical providers billing for inpatient hospital services tend to be in Philadelphia PA. Medical providers located in Philadelphia tend to bill for 35 inpatient hospital discharges. On average, the providers charge $32,253 for services covered by Medicare. Medical providers receive an average of $8,120 in Medicare payments. Cluster3: Severe Sepsis: Medical providers billing for inpatient hospital services tend to be in Los Angeles CA. Medical providers in this region tend to bill for 36 inpatient hospital discharges. On average, the providers charge $61,401 for services covered by Medicare. Medical providers receive an average of $11,656 in Medicare payments. Cluster4: Bowel Procedures: Medical providers billing for inpatient hospital services tend to be in Baltimore MD. Medical providers in Baltimore tend to bill for 37 inpatient hospital discharges. On average, the providers charge $77,372 for services covered by Medicare. Medical providers receive an average of $25,517 in Medicare payments. Cluster5: Cardiac Pacemaker: Medical providers billing for inpatient hospital services tend to be in Camden NJ. Medical providers in Camden NJ tend to bill for 48 inpatient hospital discharges. On average, the providers charge $62,556 for services covered by Medicare. Medical providers receive an average of $14,575 in Medicare payments. 4

VIII. CONCLUSION The characteristics of medical providers billing for inpatient hospital services vary. Specifically, the characteristics of providers with the highest average billing to Medicare tend to be located in Baltimore MD and also have one of the lower inpatient hospital discharges. Naturally, the medical providers with the highest average covered charges receive on average the highest IPPS payments. However, medical providers in Baltimore tend to receive 33% of what is billed to Medicare. Medical providers in East Long NY tend to receive a higher percentage of what is billed. Providers in East Long Island receive 38% of what is billed to Medicare or IPPS payments. Medical providers in Los Angeles tend to have one of the lower inpatient hospital discharges and also tend to receive the lowest % of IPPS payment. The percent of what medical providers in Los Angeles bill to Medicare vs. what they receive is 18%. The DHHS can use the clustering data to assess the significant variation across the country and within communities in what medical providers charge for common services. The DHHS can further analyze the data to determine if the base operating and capital rates paid to hospitals should be adjusted in effort to control costs and reduce the variation of what medical providers charge for services. Based upon the analysis, cost adjustments can be made on the local, regional and national levels. The analysis identified 6 clusters based upon attributes unique in terms of IPPS characteristics. The analysis clustered 6 medical procedures by region and provided metrics such as number of discharges and average Medicare payments to medical providers. Further analysis can be conducted to determine if operating expenses or capital expenses (significant variables that drive the IPPS payment structure) should be evaluated in effort to control costs and infuse equity within the payment system. IX. FUTURE WORK To continue work on this project, the next step will be to collect additional data which will allow for a predictive modeling application. The predictive model can be designed to predict future expenses and billings. An evolved phase of the analysis can be to also predict healthcare fraud. A model can also be developed to estimate or predict the length of time a patient will stay in a hospital. Artificial neural networks and decisions trees can also be a part of the predictive modeling strategy. REFERENCE Hian Chye Koh and Gerald Tan. Data Mining Applications in Healthcare. Journal of Healthcare Information Management Vol. 19, No. 2 Pang-Ning Tan, Vipin Kumar and Michael Steinback. Introduction to Data Mining. Boston: Pearson Addison Wesley, 2005 Retrieved Aug 25, 2014 from http://www.biomedcentral.com/1472-6963/13/333 5

6