EHR CURATION FOR MEDICAL MINING

Similar documents
Ernestina Menasalvas Universidad Politécnica de Madrid

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

KNOWLEDGE DISCOVERY CHALLENGES ON BIG MEDICAL DATA. Tutorial at ECML PKDD 2014 Nancy, France, 09/2014

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Massive Labeled Solar Image Data Benchmarks for Automated Feature Recognition

Open issues and research trends in Content-based Image Retrieval

Principles of Data Mining by Hand&Mannila&Smyth

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

SURVEY REPORT DATA SCIENCE SOCIETY 2014

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Natural Language Querying for Content Based Image Retrieval System

Machine Learning for Medical Image Analysis. A. Criminisi & the InnerEye MSRC

ENHANCED WEB IMAGE RE-RANKING USING SEMANTIC SIGNATURES

An Overview of Knowledge Discovery Database and Data mining Techniques

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

Introduction to Data Mining

Blog Post Extraction Using Title Finding

Keywords data mining, prediction techniques, decision making.

Multimedia Data Mining: A Survey

The Scientific Data Mining Process

Image Area. View Point. Medical Imaging. Advanced Imaging Solutions for Diagnosis, Localization, Treatment Planning and Monitoring.

Big Data: Image & Video Analytics

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Pattern Recognition

Knowledge Discovery from patents using KMX Text Analytics

Big Data: Rethinking Text Visualization

Introduction to Data Mining

A Semantic Model for Multimodal Data Mining in Healthcare Information Systems. D.K. Iakovidis & C. Smailis

A Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan

Software Development Training Camp 1 (0-3) Prerequisite : Program development skill enhancement camp, at least 48 person-hours.

Master s Program in Information Systems

SPATIAL DATA CLASSIFICATION AND DATA MINING

Master of Science in Health Information Technology Degree Curriculum

ADVANCED MACHINE LEARNING. Introduction

How To Use Neural Networks In Data Mining

DATA MINING TECHNIQUES AND APPLICATIONS

Statistics for BIG data

The Data Mining Process

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

Information Management course

Medical Big Data Interpretation

How To Make Sense Of Data With Altilia

Healthcare Measurement Analysis Using Data mining Techniques

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Chapter ML:XI. XI. Cluster Analysis

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Part 5. Prediction

TIETS34 Seminar: Data Mining on Biometric identification

Steven C.H. Hoi School of Information Systems Singapore Management University

ANALYTICS IN BIG DATA ERA

A Survey of Classification Techniques in the Area of Big Data.

MusicGuide: Album Reviews on the Go Serdar Sali

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Industrial Challenges for Content-Based Image Retrieval

Galaxy Morphological Classification

Healthcare Data Mining: Prediction Inpatient Length of Stay

Using Artificial Intelligence to Manage Big Data for Litigation

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM

SIGNATURE VERIFICATION

Module II: Multimedia Data Mining

Clustering Technique in Data Mining for Text Documents

Promises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends

Integrated Data Mining and Knowledge Discovery Techniques in ERP

Hexaware E-book on Predictive Analytics

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

Multiscale Object-Based Classification of Satellite Images Merging Multispectral Information with Panchromatic Textural Features

Florida International University - University of Miami TRECVID 2014

Structural Health Monitoring Tools (SHMTools)

Annotated bibliographies for presentations in MUMT 611, Winter 2006

INTERACTIVE MANIPULATION, VISUALIZATION AND ANALYSIS OF LARGE SETS OF MULTIDIMENSIONAL TIME SERIES IN HEALTH INFORMATICS

User Needs and Requirements Analysis for Big Data Healthcare Applications

Data Mining and Machine Learning in Bioinformatics

Chapter 3 Data Warehouse - technological growth

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Introduction to Successful Association Data Mining

DICON: Visual Cluster Analysis in Support of Clinical Decision Intelligence

Specific Usage of Visual Data Analysis Techniques

I n t e r S y S t e m S W h I t e P a P e r F O R H E A L T H C A R E IT E X E C U T I V E S. In accountable care

COMPARISON OF OBJECT BASED AND PIXEL BASED CLASSIFICATION OF HIGH RESOLUTION SATELLITE IMAGES USING ARTIFICIAL NEURAL NETWORKS

(b) How data mining is different from knowledge discovery in databases (KDD)? Explain.

Spatial Data Mining Methods and Problems

Operations Research and Knowledge Modeling in Data Mining

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Deep Learning Meets Heterogeneous Computing. Dr. Ren Wu Distinguished Scientist, IDL, Baidu

Intelligent Tools For A Productive Radiologist Workflow: How Machine Learning Enriches Hanging Protocols

Simultaneous Gamma Correction and Registration in the Frequency Domain

Self Organizing Maps for Visualization of Categories

Introduction. A. Bellaachia Page: 1

A Dynamic Approach to Extract Texts and Captions from Videos

Biomedical Informatics Applications, Big Data, & Cloud Computing

Open & Big Data for Life Imaging Technical aspects : existing solutions, main difficulties. Pierre Mouillard MD

Mining Signatures in Healthcare Data Based on Event Sequences and its Applications

Learning outcomes. Knowledge and understanding. Competence and skills

Part-Based Recognition

A Semantic Model for Multimodal Data Mining in Healthcare Information Systems

Business Analytics and Data Mining for CRM Business Analytics and Data Mining for CRM: Jumpstart workshop

Machine Learning: Overview

Transcription:

EHR CURATION FOR MEDICAL MINING Ernestina Menasalvas Medical Mining Tutorial@KDD 2015 Sydney, AUSTRALIA

2 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Agenda Motivation the potential of EHR EHR a closer look BIG DATA in the health domain Focus: Medical image data

3 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 MOTIVATION

4 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Motivation In 2012, worldwide digital healthcare data was estimated to be equal to 500 petabytes and is expected to reach 25,000 petabytes in 2020 Can we learn from the past to become better in the future? Healthcare Data is becoming more complex!! The problem : Milllions of reports, tasks, incidents, events, images, Complete availability Lack of protocols and structure Organization oriented processes Need for patient oriented processes

5 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 From Mckensey: Big data in health report 2013 From physicians judgment to evidence-based medicine Standard medical practice is moving from relatively adhoc and subjective decision making to evidence-based healthcare Is the health-care industry prepared to capture big data s full potential, or are there roadblocks that will hamper its use? Holistic, patient-centered approach to value, one that focuses equally on health-care spending and treatment outcomes.

6 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 ELECTRONIC HEALTH RECORDS (EHR) Benefits Roadblocks

7 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 EHR Knowledge Extraction Electronic Health Records use has been increasing in the last ten years. Digitalization of patients histories have led to enormous data stores. Most hospitals do not take advantage of analytic processes to improve patient care.

8 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 BIG DATA IN THE HEALTH DOMAIN

9 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 The average hospital (300 beds) 500.000 patients (reference population) 1300 users (250 physicians, 900 nurses and technicsian, 150 administrative tasks) Monthly activity: 20.000 consultations, 1300 admissions, 800 interventions 10.000 emergencies 75.000 annotations 25.000 reports 90.000 interdepartamental orders 450.000 lab results (analytical) 13.000 images analysis 24.000 pharmacological prescriptions

10 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Goal Provide right intervention to the right patient at the right time ACQUIRE, PROCESS, ANALYZE UNDERSTAND PREDICT Prediction will enable Personalized care to the patient. Early diagnose Lower cost Improved outcomes

11 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 The goal Integrate, annotate,. Analytics Pattern Extraction Evaluation Decission Support

12 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Process

13 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Process Data Acquisition Data processing Modelling Validation Apply

14 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 By 2015, the average hospital will have two-thirds of a petabyte of patient data, 80% of which will be unstructured image data like CT scans and X-rays. http://medcitynews.com/2013/03/the-body-inbytes-medical-images-as-a-source-ofhealthcare-big-data-infographic/

15 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 MEDICAL IMAGE DATA

16 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Most frequent ComputedTomography (CT), X-Ray, Positron Emission Tomography (PET) The main challenge with the image data is that it is not only huge, but is also high-dimensional and complex. Extraction of the important and relevant features is a daunting task.

17 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Technical Challenges Imaging Physics - be0er images by Detector design - Spa8al resolu8on - Sensi8vity Radiochemistry - be0er tracers (PET imaging) Image processing - Correc/ons for physical effects - Mul/modal image fusion - Image reconstruc/on algorithms Data Analysis à be<er interpreta/on of images

18 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Methodology for image processing Overall process of image mining Data Preprocessing Extracting multidimensional feature vectors Mining of vectors and acquire high level knowledge

19 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Methodology for image processing 1. Data pre-process Calibration: (depending on the device registering the image) Clean up the noise. (noisy pixels) Registration (check the stack of images) 2. Extracting multi-dimensional feature vectors Segmentation Algorithm. Search for homogenous voxels Super-Voxels have to be characterized using low-level features selection - Spectral! digital levels - Shape!compactness,.. - Textural! smooth, - Context! neighborhood supervoxels - Spatial relationship! up/down,left/right

20 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Methodology for image processing 3. Mining of vectors and acquire high level knowledge Image annotation Indexing and retrieval

21 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Methodology: Image annotation Image annotation Classical approach! manual annotation. It is impractical to annotate a huge amount of images manually Second approach! content based image retrieval (CBIR), where images are automatically indexed and retrieved with low level content features like color, shape and texture Third approach of image retrieval is the automatic image annotation

22 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Methodology: Image annotation Automatic image annotation Single labelling annotation using conventional classification methods: methods (support vector machines (SVM), Artificial Neural Networks, Decision Tree) There three types of AIA approaches: Single labelling annotation using conventional classification methods (support vector machines (SVM), artificial neural network (ANN), and decision tree (DT)) Multi-labelling annotation! annotates an image with multiple concepts using the Bayesian methods

23 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Methodology: Image annotation Automatic image annotation Single labelling annotation using conventional classification methods: methods (support vector machines (SVM), Artificial Neural Networks, Decision Tree) Binary classification! Tumor / non-tumor cell

24 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Methodology: Image annotation Automatic image annotation Multi-labelling annotation! annotates an image with multiple semantic concepts/categories using the Bayesian methods Concept of multi-instance multi-label (MIML) represents an image with a bag of features or a bag of regions. The image is annotated with a concept label if any of the regions/instances in the bag is associated with the label. Then the image is annotated with multiple labels

25 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Methodology: Image annotation Multi-labelling annotation Given a set of images I 1,I 2,...,I N classes { c 1,c 2,...,c n }. { } from a set of given semantic Bayesian models try to determine the posterior probability from the priors and conditional probabilities Model of conditional approaches Non-parametric Parametric

26 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Methodology: Image annotation Model of conditional approaches Non-parametric No prior assumption about the distribution of the image features is considered. The actual feature distribution is learned from the features of the training samples using certain statistics.

27 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Automated image annotation: nonparametric approach Source: D. Zhang, M. M. Islam, and G. Lu, A review on automa8c image annota8on techniques, Pa0ern Recogni8on, vol. 45, no. 1, pp. 346 362, Jan. 2012.

28 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Methodology: Image annotation Model of conditional approaches Parametric: The feature space is assumed to follow a certain type of known continuous distribution. Therefore, the conditional probability is modelled using this feature distribution and it is usually modelled as a multivariate Gaussian distribution.

29 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Methodology: Indexing and retrieval Indexing and retrieval Two different frameworks - Text-based - Content-based Researh areas Low-level image feature extraction Similarity measurement Deriving high level sematic features

30 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Methodology: Indexing and retrieval Levels of queries in CBIR: - Level 1: retrieval by primitive features (color, texture, spatial location, ).- Eg.: find pictures like this Semantic Gap - Level 2: retrieval of objects of given type identified by derived features, with some degree of logical inference.- Eg.: find a picture of a flower - Level 3: retrieval by abstract attributes (emotional, religious,.., significance). Eg.: find pictures of a joyful crowd

31 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Methodology: Indexing and retrieval Most current systems perform retrieval at level 2 - Low-level image feature extraction (global/regions)! segmentation +characterization - Similarity measure Distances between regions! Distance at image level One-one match: each region in the query image is only allowed to match one region in the target image Many-many match: each region in the query image is allowed to match more than one region in the target image Semantic gap reduction

32 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Methodology: Indexing and retrieval Narrowing down the semantic gap techniques - Object ontology to define high-level concepts - Machine learning to associate low-level features with query concepts - Relevance feedback to learn users intention - Generating semantic template to support high-level image retrieval

33 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Methodology: Indexing and retrieval Source: Y. Liu, D. Zhang, G. Lu, and W.- Y. Ma, A survey of content- based image retrieval with high- level seman8cs, Pa#ern Recogni-on, vol. 40, no. 1, pp. 262 282, Jan. 2007.

34 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Methodology: Indexing and retrieval Object ontology to define high-level concepts Source: Y. Liu, D. Zhang, G. Lu, and W.- Y. Ma, A survey of content- based image retrieval with high- level seman8cs, Pa#ern Recogni-on, vol. 40, no. 1, pp. 262 282, Jan. 2007.

35 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Methodology: Indexing and retrieval Machine learning to associate low-level features with query concepts - Supervised learning - Unsupervised learning - Object recognition techniques

36 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Methodology: Indexing and retrieval Relevance feedback to learn users intention - Typical scenario 1. The system provides initial retrieval results through query-by-example, sketch, etc. 2. User judges the above results as to whether and to what degree, they are relevant (positive examples)/irrelevant (negative examples) to the query. 3. Machine learning algorithm is applied to learn the user feedback. Then go back to (2).

37 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Methodology: Indexing and retrieval Relevance feedback to learn users intention Source: Y. Liu, D. Zhang, G. Lu, and W.- Y. Ma, A survey of content- based image retrieval with high- level seman8cs, Pa#ern Recogni-on, vol. 40, no. 1, pp. 262 282, Jan. 2007.

38 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Methodology: Indexing and retrieval Generating semantic template (ST) to support high-level image retrieval - ST is a map between high-level concept and low-level visual features - Different levels of user interaction

39 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 THANKS