A Multitier Fraud Analytics and Detection Approach Jay Schindler, PhD MPH DISCLAIMER: The views and opinions expressed in this presentation are those of the author and do not necessarily represent official policy or position of HIMSS.
Conflict of Interest Disclosure Jay Schindler, PhD MPH Salary Stock Ownership 2013 HIMSS
Learning Objectives Describe the major components of a fraud analytics workstation Identify the major components of the CRISP-DM model as used within a fraud analytics framework List 2 different data visualization approaches or methods for high-dimensional data with 4 or 5 variables Identify at least 3 benefits of using a fraud analytics workstation when examining large datasets
Understanding big data Velocity Volume BIG DATA Variety Complexity Based on Big Data in the Cloud by Jeffrey Yu, Director of Technology and Engineering, Civil Systems, Northrop Grumman
Explosion of Data in Healthcare Frost & Sullivan estimates that picture archiving and communication system (PACS) storage requirements in U.S. hospitals grew at a rate of more than 20 percent per year for the past five years and reached 27,000 Terabytes in 2011. As a result of the increased use of data in the provision of care, data storage and access solutions are becoming more strategic decisions and pressing issues for hospital administrators to address. http://www.frost.com/sublib/display-market-insighttop.do?id=264652254
Volume of the Fraud/Abuse Problem 2/22/2013 6 http://www.paymentaccuracy.gov/
Analytics-Supported Decision Making Resource Production - Workforce - Facilities - Commodities - Knowledge Management - Planning - Administration - Regulation - Legislation Organization of Programs - Public agencies - Private market - Voluntary agencies - Enterprises Health System* Service Delivery - Prevention - Primary, specialty - Secondary, Tertiary, Long-Term Surveillance Resources ED Emerg Srv. Long-Term Registries Intervention s Outpatient Home Health Self-Ins Disparate Data Community Needs Risk Factors Inpatient Mental Health Private Ins. Decisions & Actions Economic Support - Private Insurance - Social Security - Governmental CHIP Medicaid Medicare Integrated Insights Probable fraudulent claims County health service utilization ACO performance Projected Medicaid costs Resource allocation change impacts Practice performance Cost of care outliers Market-wide expenditures Regional health outcomes
An integrated health analytics platform Provides decision-makers with a platform to visualize and analyze population health characteristics Characterize costs of care Analyze conditions and risk Identify improvement opportunities Estimate future costs Allows flexible and dynamic reporting capability Integrates disparate databases via data virtualization Serves as a foundational capability for health and human services Fraud detection and prevention EHR & Automated reporting
A Layered Framework for Health Analytics Service cost comparisons, outliers Proactive fraud detection Predictive Modeling Program Integrity Estimations of future costs Statistical Geospatial Care seeking behavior of populations Schema Matching Data Model Semantic Integration Ontology Encryption/Decryption Data Cleansing Data Governance Data Standards Data Security Need Analysis Web Service Data Virtualization Data Sources Data Sources Data Warehouse Clinical Informatics Health Analytics Public Health Surveillance Systems Analyst
Conceptual Framework for Health Analytics Platform Case Management Data Visualization Analytics & Integration Virtual Data Layer Data Preparation Mobile Devices, Portals, Web Applications Data Feeds, Workflows, Champion Challenger Dashboards, Displays, Pattern Recognition Exploratory Data Analysis, Interactive Iteration Predictive Modeling, Simulation, Neural Nets NLP, CRISP DM, Business Rules Enterprise Data Sharing & Integration Data Federation, Linking, Matching Data Quality, Cleaning, Transformation Synthetic Variable Generation, Data Cubes
A Sample Scenario Architecture Statistics Data Visualization Analytic/Presentation Services Data Mining / BI Tools Virtual Data Layer Services Mem Temp Persisted WS Cloud Integration/Delivery Discovery Database / Data Management HCUP BRFSS PF MSIS Health Indicators Warehouse Web Services Health Data Services Sources
Development Process via CRISP-DM Integrated Health Analytics Lifecycle* Business Understanding Data Understanding Platform & Data Prep Exploration Evaluation Production Determine business objectives Identify desired insights Assess environments Form project plan Review data sources Verify data quality and completeness Form analytics plan Identify needed reference arch elements Analytics Fan Construct tailored platform Access data sources Preprocess data Format and integrate data Case Management Data Visualization Analytics & Integration Virtual Data Layer Data Preparation Apply analytics techniques Generate initial insights Describe findings Mobile Devices, Portals, Web Applications Data Feeds, Workflows, Champion Challenger Dashboards, Displays, Pattern Recognition Exploratory Data Analysis Visuals Predictive Modeling, Simulation, Neural NLP, CRISP DM, Business Rules Enterprise Data Sharing & Integration Data Federation, Linking, Matching Data Quality, Cleaning, Transformation Synthetic Variable Generation, Data Cubes Conceptual Framework Evaluate results Assess alignment with business objectives Plan for ongoing access Determine next steps Tailored Platform Add new analytic views Sustain platform Monitor and maintain data source access *Adapted from: Cross Industry Standard Process for Data Mining (CRISP-DM), Visual Guide by Nichole Leaper
Anomaly Detection: Payment per Medicare beneficiary by hospital type of service code Identify services and individual cases with extreme values Cluster Analysis: Clusters of high average costs vs. low average costs in Medicare patients Investigation of patient groups & procedures Predictive Modeling: Predicting number of child Medicaid beneficiaries from last 10 years Increased eligibility & reimbursement requirements for ARA
Key Point High cost outliers for specific types of service codes are identified among diabetic claims, User interface for FAW used to demonstrate different fraud scenarios Outliers Detection tab connects to SAS product for identifying anomalies Using CMS PUF of over 9.7 million rows of claims data sample from 2008. Subset of claims by ICD-9 coding for diabetics. Identifies the high cost outliers for different type of service codes Several kinds of charts can be output for user.
This user interface tab shows a flash file of a bubble chart that displays the percent of Medicaid eligibles and percent of population on SNAP (food stamps) over time Bubbles float to show changes: population, percentage of SNAP recipients as well as percentage of Medicaid eligibles over time for the counties shown
Dynamic Cost Projections from Existing Data Use Case: Enabling dynamic what if scenarios to project future Medicaid costs Context: LA Medicaid Director adjusts various population parameters to project annual cost with the new population Results: Ability to estimate future costs based on historical data and growing understanding of future population Rapidly gain insights to main factors contributing to Medicaid cost expenditures Explore correlations among variables to gain insights to key cost drivers using Profiler. Estimated Enrollment Dynamic Cost Projection
Capabilities of an integrated health analytics platform Provides flexibility to work with pre-existing architecture as well as new architectures Reduces costs and time for integration among different data sources Offers robust analytics, visualizations and reporting customized to customer needs managing big data Cuts operational costs (e.g., eliminates need for a data warehouse) Generates resources and support for evidence-based decision-making within big data
Thank You! Jay V Schindler, PhD MPH jay.schindler@ngc.com