Survey of clinical data mining applications on big data in health informatics Matthew Herland, Taghi M. Khoshgoftaar, and Randall Wald 劉 俊 成
Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 Introduction Data mining for Health Informatics Prediction, detection, classification Different types of data Molecular level Patient level Tissue level - Magnetic Resonance Images
Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 Motivation Will have a disease? How severe? Correct response for some emergency
Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 Using molecular level data use gene microarray data To calculate the gene expression Predict early stage of colorectal cancer categorize leukemia into two different subclasses Nearest Centroid Classifier Support Vector Machines
Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 Nearest Centroid Classifier (NCC) Classification Calculate the means K-means method 按 一 下 以 編 輯 母 片 文 字 樣 式 第 二 層 第 三 層 第 四 層 第 五 層
Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 Support Vector Machines (SVM) Linear classification
Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 Using patient level data physiological status Heart rate, body temperature, blood oxygen Real-time prediction for emergency IBM s method - Similarity learning Decision Tree
Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 IBM s data stream mining Using the database Finding the similar case physiological status and various clinical data Similarity learning k-nearest neighbors
Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 Decision Tree Decision Tree Set the issue to be the leaf node Very Fast Decision Tree
0 Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 Conclude Introduce some examples of Health Informatics Different types of data
1 Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 Grouper infected virus Grouper : one of the most important aquaculture fishes with high economic value all over the world High-density farming problems disease infection, horizontal transmission of virus Two common types of iridovirus Grouper Iridovirus of Taiwan (TGIV) of Megalocytivirus Grouper Iridovirus (GIV) of Ranavirus
2 Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 NGS Next-generation sequencing technology (NGS) High throughput gene expression analysis De novo assembly vs Reference mapping approaches (model species vs. nonmodel species)
3 Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 Biological pathway KEGG (Kyoto Encyclopedia of Genes and Genomes) Database for molecular-level biology problems (http://www.genome.jp/kegg/) KEGG Ontology(KO) A B KEGG Pathway A B
4 Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 Genes Analysis Overlap gene sets under different setting ratios Show different levels of unique gene clusters between TGIV and GIV infected groupers gene appeared both in M R gene appeared in M Gene appeared in R 14
5 Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 Pathway analysis Choose zebrafish as model species Pathway enrichment analysis Applying Hypergeometric distribution model Calculate p-value N = all genes amount n = selected sample in all genes m = all genes in each pathway k = selected sample in each pathway
6 Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 Result-genes analysis TGIV GIV DE Gene Name PSMB2 CASP3 U2AF2 RP-L4e RPL4
7 Comparing differentially Survey of clinical data mining applications on big data in health Informatics 劉 俊 成 expressed genes in ECMreceptor interaction TGIV GIV 17
Thanks for listening