Survey of clinical data mining applications on big data in health informatics



Similar documents
BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Medical Informatics II

Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives

How To Cluster

Social Media Mining. Data Mining Essentials

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Machine Learning CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

Data Mining Fundamentals

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Master's projects at ITMO University. Daniil Chivilikhin PhD ITMO University

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data

Analysis of Illumina Gene Expression Microarray Data

Chapter 12 Discovering New Knowledge Data Mining

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

The Data Mining Process

Practical Introduction to Machine Learning and Optimization. Alessio Signorini

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

Classification algorithm in Data mining: An Overview

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis

Sequencing and microarrays for genome analysis: complementary rather than competing?

An Overview of Knowledge Discovery Database and Data mining Techniques

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Microarray Data Mining: Puce a ADN

Twister4Azure: Data Analytics in the Cloud

Data Mining Techniques for DNA Microarray Data

Data Mining Part 5. Prediction

CHROMOSOMES Dr. Fern Tsien, Dept. of Genetics, LSUHSC, NO, LA

Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009

An Introduction to Data Mining

Machine Learning, Data Mining, and Knowledge Discovery: An Introduction

Data Mining and Machine Learning in Bioinformatics

Biomedical Big Data and Precision Medicine

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Wireless Remote Monitoring System for ASTHMA Attack Detection and Classification

Next Generation Sequencing

How many of you have checked out the web site on protein-dna interactions?

Keywords data mining, prediction techniques, decision making.

Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Visualizing Networks: Cytoscape. Prat Thiru

Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Techniques of Data Mining In Healthcare: A Review

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

BIRCH: An Efficient Data Clustering Method For Very Large Databases

Data Mining Analytics for Business Intelligence and Decision Support

Integration of biospecimen data with clinical data mining

CSci 538 Articial Intelligence (Machine Learning and Data Analysis)

Databases and platforms for data analysis from NGS of MTB

Farming of the black tiger prawn Challenges and Opportunities

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics

Data Mining for Knowledge Management. Classification

Chapter 7. Cluster Analysis

Protein Protein Interaction Networks

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Introduction to Pattern Recognition

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

The Extension of the DICOM Standard to Incorporate Omics

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

COC131 Data Mining - Clustering

Using multiple models: Bagging, Boosting, Ensembles, Forests

Introduction to Data Mining

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Effective Analysis and Predictive Model of Stroke Disease using Classification Methods

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

MS1b Statistical Data Mining

Data Mining + Business Intelligence. Integration, Design and Implementation

Data Mining Techniques in CRM

A leader in the development and application of information technology to prevent and treat disease.

Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)

High Performance Spatial Queries and Analytics for Spatial Big Data. Fusheng Wang. Department of Biomedical Informatics Emory University

Health Care 2.0: How Technology is Transforming Health Care

Data Mining Techniques Chapter 6: Decision Trees

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Genevestigator Training

6 ELIXIR Domain Specific Services

Introduction to Data Mining

DATA MINING TECHNIQUES AND APPLICATIONS

Information Management course

Machine learning for algo trading

Classification and Regression by randomforest

Unit I: Introduction To Scientific Processes

Next Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa

Clustering Artificial Intelligence Henry Lin. Organizing data into clusters such that there is

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA

SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH

Analysis of the colorectal tumor microenvironment using integrative bioinformatic tools

International Journal of Software and Web Sciences (IJSWS)

MEDICAL DATA MINING: A REVIEW

Aiping Lu. Key Laboratory of System Biology Chinese Academic Society

Investigating Clinical Care Pathways Correlated with Outcomes

Transcription:

Survey of clinical data mining applications on big data in health informatics Matthew Herland, Taghi M. Khoshgoftaar, and Randall Wald 劉 俊 成

Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 Introduction Data mining for Health Informatics Prediction, detection, classification Different types of data Molecular level Patient level Tissue level - Magnetic Resonance Images

Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 Motivation Will have a disease? How severe? Correct response for some emergency

Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 Using molecular level data use gene microarray data To calculate the gene expression Predict early stage of colorectal cancer categorize leukemia into two different subclasses Nearest Centroid Classifier Support Vector Machines

Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 Nearest Centroid Classifier (NCC) Classification Calculate the means K-means method 按 一 下 以 編 輯 母 片 文 字 樣 式 第 二 層 第 三 層 第 四 層 第 五 層

Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 Support Vector Machines (SVM) Linear classification

Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 Using patient level data physiological status Heart rate, body temperature, blood oxygen Real-time prediction for emergency IBM s method - Similarity learning Decision Tree

Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 IBM s data stream mining Using the database Finding the similar case physiological status and various clinical data Similarity learning k-nearest neighbors

Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 Decision Tree Decision Tree Set the issue to be the leaf node Very Fast Decision Tree

0 Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 Conclude Introduce some examples of Health Informatics Different types of data

1 Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 Grouper infected virus Grouper : one of the most important aquaculture fishes with high economic value all over the world High-density farming problems disease infection, horizontal transmission of virus Two common types of iridovirus Grouper Iridovirus of Taiwan (TGIV) of Megalocytivirus Grouper Iridovirus (GIV) of Ranavirus

2 Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 NGS Next-generation sequencing technology (NGS) High throughput gene expression analysis De novo assembly vs Reference mapping approaches (model species vs. nonmodel species)

3 Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 Biological pathway KEGG (Kyoto Encyclopedia of Genes and Genomes) Database for molecular-level biology problems (http://www.genome.jp/kegg/) KEGG Ontology(KO) A B KEGG Pathway A B

4 Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 Genes Analysis Overlap gene sets under different setting ratios Show different levels of unique gene clusters between TGIV and GIV infected groupers gene appeared both in M R gene appeared in M Gene appeared in R 14

5 Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 Pathway analysis Choose zebrafish as model species Pathway enrichment analysis Applying Hypergeometric distribution model Calculate p-value N = all genes amount n = selected sample in all genes m = all genes in each pathway k = selected sample in each pathway

6 Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 Result-genes analysis TGIV GIV DE Gene Name PSMB2 CASP3 U2AF2 RP-L4e RPL4

7 Comparing differentially Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 expressed genes in ECMreceptor interaction TGIV GIV 17

Thanks for listening