MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012



Similar documents
Data Refinery with Big Data Aspects

The Business Case for Using Big Data in Healthcare

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Hexaware E-book on Predictive Analytics

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

From Data to Foresight:

An Introduction to Health Informatics for a Global Information Based Society

Global Headquarters: 5 Speen Street Framingham, MA USA P F

Big Data, Analytics, Intelligence: Potenziale und Nutzen

A Hurwitz white paper. Inventing the Future. Judith Hurwitz President and CEO. Sponsored by Hitachi

Uncovering Value in Healthcare Data with Cognitive Analytics. Christine Livingston, Perficient Ken Dugan, IBM

Find the signal in the noise

Big Data. Fast Forward. Putting data to productive use

locuz.com Big Data Services

Delivering Smart Answers!

Data Isn't Everything

Healthcare Measurement Analysis Using Data mining Techniques

Chapter 11. Managing Knowledge

Big Data Analytics in Health Care

Value of. Clinical and Business Data Analytics for. Healthcare Payers NOUS INFOSYSTEMS LEVERAGING INTELLECT

Big Data and Healthcare Payers WHITE PAPER

The Future of Business Analytics is Now! 2013 IBM Corporation

TEXT ANALYTICS INTEGRATION

Solutions For. Information, Insights, and Analysis to Help Manage Business Challenges

What Lies Ahead? Trends to Watch: Health Care Product Development in North America

An Enterprise Framework for Business Intelligence

Turning Big Data into Big Decisions Delivering on the High Demand for Data

MediSapiens Ltd. Bio-IT solutions for improving cancer patient care. Because data is not knowledge. 19th of March 2015

Extracting Value from Health Care Big Data with Predictive Analytics

Secure Cloud Computing Concepts Supporting Big Data in Healthcare. Ryan D. Pehrson Director, Solutions & Architecture Integrated Data Storage, LLC

Solution Overview. Fusion Health Advantage Big Data Accelerator Platform

Discover more, discover faster. High performance, flexible NLP-based text mining for life sciences

IBM's Fraud and Abuse, Analytics and Management Solution

STATE OF CONNECTICUT State Innovation Model Health Information Technology (HIT) Council Answers to Questions for Zato

Using Predictions to Power the Business. Wayne Eckerson Director of Research and Services, TDWI February 18, 2009

Exploration and Visualization of Post-Market Data

Improving Data Processing Speed in Big Data Analytics Using. HDFS Method

PREDICTIVE ANALYTICS FOR THE HEALTHCARE INDUSTRY

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Statistics for BIG data

Introduction to Data Mining

Introduction to Data Mining

The Canadian Realities of Big Data and Business Analytics. Utsav Arora February 12, 2014

Transforming the Telecoms Business using Big Data and Analytics

A leader in the development and application of information technology to prevent and treat disease.

Big Analytics: A Next Generation Roadmap

A New Era Of Analytic

Computer-assisted coding and documentation for a changing healthcare landscape

Big Data Integration: A Buyer's Guide

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI

NOUS. Health Management. Importance of Population. White Paper INFOSYSTEMS LEVERAGING INTELLECT

Auto-Classification for Document Archiving and Records Declaration

FIVE INDUSTRIES. Where Big Data Is Making a Difference

Working with telecommunications

PALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management

SUCCESSES AND CHALLENGES DEVELOPING AN ENTERPRISE CLINICAL ANALYTICS SOLUTION Ambulatory Information Management (AIM)

Data Mining Solutions for the Business Environment

SURVEY REPORT DATA SCIENCE SOCIETY 2014

The 4 Pillars of Technosoft s Big Data Practice

PALANTIR HEALTH. Maximizing data assets to improve quality, risk, and compliance. 100 Hamilton Ave, Suite 300 Palo Alto, California 94301

Master big data to optimize the oil and gas lifecycle

Beyond listening Driving better decisions with business intelligence from social sources

Data Mining Applications in Higher Education

BPM for Structural Integrity Management in Oil and Gas Industry

Loss Prevention Data Mining Using big data, predictive and prescriptive analytics to enpower loss prevention

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

3 MUST-HAVES IN PUBLIC SECTOR INFORMATION GOVERNANCE

Promises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends

Data Warehousing and Data Mining in Business Applications

Big Data 101: Harvest Real Value & Avoid Hollow Hype

How Big Data is Different

NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR.

CyberArk Privileged Threat Analytics. Solution Brief

SUSTAINING COMPETITIVE DIFFERENTIATION

The Business Value of Predictive Analytics

Content Analyst's Cerebrant Combines SaaS Discovery, Machine Learning, and Content to Perform Next-Generation Research

GETTING REAL ABOUT SECURITY MANAGEMENT AND "BIG DATA"

Data Mining System, Functionalities and Applications: A Radical Review

Integrating a Big Data Platform into Government:

Big Data & Analytics for Semiconductor Manufacturing

IBM Content Analytics adds value to Cognos BI

GE Healthcare. Proven revenue cycle management supporting profitability in an era of healthcare reform.

Transcription:

MEDICAL DATA MINING Timothy Hays, PhD Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012

2 Healthcare in America Is a VERY Large Domain with Enormous Opportunities for Data Mining US Healthcare (2009) $2.5 Trillion 17.3% of GDP Healthcare system: Providers, Payers and Patients, Government (Federal and State) and Private/Commercial, Research to (Best) Practice, Regulations, Laws, and Policies (i.e. Affordable Care Act, etc.)

3 Healthcare / Medical Data Mining Patients and Consumers Providers: Government, Private or Commercial, hospitals, pharmacies, clinics, doctors offices, and other provider services Payers: employers, insurance carriers, other third-party payers, health plan sponsors (employers, unions, DOD, VA, HHS, etc.) Areas where data mining can help!

4 Healthcare / Medical Data Mining Healthcare crosscuts: Regulatory: laws, regulations, coding, guidelines, best practices, performance, costs, reporting (i.e., adverse event), etc. Research: basic research, pharmaceuticals, medical devices, genetics, drug-drug interaction, diagnostic test decision support, biomedical research data mining (basic or clinical results), etc. IT systems: interoperability, software development, information/data storage, security and access, reporting, Big Data and small data, usability, data transfer, training/educating/communicating, and so much more! Areas where data mining can help!

5 So How Do We Get There? Understanding and Then Tackling the Pieces!

6 Medical Data Mining Where are the opportunities?

7 Build On History and Knowledge Practice domains / Fields we can learn from: Knowledge management is the theory behind knowledge capture and use Informatics is the science of information, the practice of information processing, and the engineering of information systems Analytics is the practical application of tools (i.e. algorithms) upon information to gain new insights. Others: Business Intelligence Competitive Intelligence Computational Science Bioinformatics Health Informatics Predictive Modeling Decision Support Artificial Intelligence, etc.

8 Data Mining Effective Use of Data, Tools, and Analyses Decisions, Plans, Actions Knowledge Discovery and Analysis The application and productive use of information Information Data organized into meaningful patterns Data Unstructured, structured, Multiple sources

9 Medical Data Mining Leads to: Question based answers Anomaly based discovery New Knowledge discovery Informed decisions Probability measures Predictive modeling Decision support Improved health Personalized medicine

10 So, Again, How Do We Get There? What questions are you trying to answer? Ultimately, identify answers to questions we didn t know we had Do you have data, tools and analyses to answer the question? Example areas: Healthcare management (provider care practices) Fraud and abuse Treatment effectiveness Patient involvement and relationship Understanding and Then Tackling the Pieces!

11 Dilbert on Data There are Exabyte s of data! But is it the right data to answer your question?

So, Again, How Do We Get There? 12

13 Evolution of the Lung Cancer Portfolio: 2003-2009 2003 2008 2009 Abstracts

14 So, Again, How Do We Get There? Data: the V s (Volume, Velocity, Variety, Veracity, and Visibility) Tools/Applications: Search algorithms, text mining, natural language processing (NLP), machine learning, etc. clustering, predictive modeling, relationship and link analysis, taxonomy generation, statistical analysis, neural networks, visualization, heat maps, etc. Analysis: Methodologies (exploration and drill down) and subject matter/domain experts Understanding and Then Tackling the Pieces!

15 The Data V s Data Volume - large, small, combined, separate, etc. Velocity transfer: capture and retrieval includes streaming, batch processing, utilization, etc. Variety - text [structured unstructured], images, audio, video, etc. Veracity - meaningfulness [use], value, variability, quality, etc. Visibility - access, security, Understanding and Then Tackling the Pieces!

16 Tools / Applications 1000 s of Tools Integrated or add-on Question/Domain/Analysis specific Use with Repositories and Platforms Helpful but not necessary for analysis Relevant to data veracity/quality Understanding and Then Tackling the Pieces!

17 Analysis Methodologies Research Algorithms Proprietary Subject matter experts Domain (healthcare centric) expertise Analysis expertise Tool and application experience Understanding and Then Tackling the Pieces!

18 Medical Data Mining Spectrum Complexity Planning/Forecasting Modeling/Predicting Decision Support / Optimizing Dashboards Reporting Capability

Medical Data Mining 19 It sounds good, but are there standards for data capture, use, definitions, sharing, etc.? Are standards needed (yet)? Are there sufficient tools, applications, analyses and staff available to identify valuable information to improve healthcare? Are there sufficient benefits and incentives in core areas where data mining is essential? Personalized and predictive medicine Fraud and abuse Research advancements Improved treatments and medical devices

Dilbert on Federal & Corporate Realities 20 An integrated approach is key!

NIH Research, Condition, and Disease Categorization Project

22 National Institutes of Health (NIH): Case Study The purpose of the Research, Condition, and Disease Categorization (RCDC) project is to 1. Consistently categorize NIH-funded research projects according to research areas/categories 2. Use an automated process, and 3. Make the results available to Congress and the public

23 How does it do that? NIH: Case Study RCDC system uses Elsevier s Collexis technology to text mine biomedical concepts from research descriptions NIH research experts define a weighted classification system for each of 238 categories All NIH research is then categorized through an automated process The output is reported publicly at Report.NIH.Gov

24 NIH: Case Study The Research, Condition, and Disease Categorization (RCDC) project is an example of providing new, timely information out of unstructured and structured data RCDC pulls data from 7 databases (all containing well over 4 terabytes of content) that gets routed, compartmentalized, validated and ultimately used for regular reports and on-demand queries Allows research information to be explored proactively Is adaptable as 1) Science evolves over time and 2) Increased needs for data usage are identified

How Does RCDC Work? 25 Category Definition developed by NIH staff Matching Process Projects with matching categories Project Description Matching compares individual project descriptions to the category definition the resultant category matching score for a project is a reflection of how closely the project is related to a particular category

Category Definition Created in the New System 26 A definition is a list of scientific terms from a thesaurus (300,000 terms and synonyms). Terms are selected by NIH Scientific Experts to define that research category. Terms are weighted to fine-tune the matching process. Terms from grants/projects are matched against definitions to produce category project lists.

Sample: Sleep Research Draft Fingerprint 27 Analytical Tool

Prior View of Data (FY 2009) 28 http://www.nih.gov/news/fundingresearchareas.htm

29 Summary level data http://report.nih.gov/rcdc/categories/

30 Drill down level data, not available on web in the past http://report.nih.gov/rcdc/categories/

31 NIH: Case Study Ahead of its time RCDC opened the door for analysis and review of research that was not previously possible (including decision intelligence practices) Additional uses of new analytical, visualization, and exploration technologies are now taking place because the platform exists!

Benefits Enhanced NIH s ability to: Leverage existing information and processes Conduct text mining and perform scientific portfolio analysis Provide transparency into government spending on research Greatly improved process Consistent methodology (one definition per category) Reproducible numbers New open platform to support decision intelligence Improved public understanding of NIH spending Access to project listings not available previously Searchable, accessible query tools and reports 32

33 Summary The opportunity and future for Medical Data Mining is HUGE! Practice areas cover the landscape: Patient, Provider, Payer, Research, Regulatory and IT Tackle it in chucks! Question based data mining Don t try to build the be-all end-all data source use what s available to begin to answer critical questions sooner rather than later Aspects of Data are critical The right Tool for the right job Analysis requires well trained analysts