Big Data Analytics for Healthcare
|
|
|
- Cameron Cox
- 10 years ago
- Views:
Transcription
1 Big Data Analytics for Healthcare Jimeng Sun Chandan K. Reddy Healthcare Analytics Department IBM TJ Watson Research Center Department of Computer Science Wayne State University 1
2 Healthcare Analytics using Electronic Health Records (EHR) Old way: Data are expensive and small Input data are from clinical trials, which is small and costly Modeling effort is small since the data is limited A single model can still take months EHR era: Data are cheap and large Broader patient population Noisy data Heterogeneous data Diverse scale Complex use cases 2
3 Heterogeneous Medical Data Diagnosis Genetic data Medication Images Lab Clinical notes 3
4 Challenges of Healthcare Analytics Analytics Scalability Challenges inchallenges Healthcare Collaboration across domains Analytic platform Intuitive results Scalable computation 4
5 PARALLEL MODEL BUILDING 5
6 Motivation Predictive modeling using EHR is growing Explosion in interest Need for scalable predictive modeling platforms/systems due to increased computational requirements from: Processing EHR data (due to volume, variability, and heterogeneity) Building accurate models Building clinically meaningful models Validating models for accuracy and generalizability 6
7 What does it take to develop a predictive model using EHR? Marina: IBM Analytics Consultant 5 4 Within 3 months, we need to 1. understand business case 2. obtain the data 3. prepare the data 4. develop predictive models 5. deliver the final model David Gotz, Harry Starvropoulos, Jimeng Sun, Fei Wang. ICDA: A Platform for Intelligent Care Delivery Analytics, AMIA
8 A Generalized Predictive Modeling Pipeline Model specification Cohort Construction: Find an appropriate set of patients with the specified target condition and a corresponding set of control patients without the condition. Feature Construction: Compute a feature vector representation for each patient based on the patient s EHR data. Cross Validation: Partition the data into complementary subsets for use in model training and validation testing. Feature Selection: Rank the input features and select a subset of relevant features for use in the model. Classification: The training and evaluation of a model for a specific classifier. Output: Clean up intermediate files and to put results into their final locations. 8
9 Cohort Construction D1 All patients Cases D2 D3 Controls D1 D2 D3 Disease Target Hypertension control Heart failure onset Hypertension diagnosis samples K 300K 9
10 Feature Construction Observation Window Prediction Window Index date Diagnosis date We define Diagnosis date and index date Prediction and observation windows Features are constructed from the observation window and predict HF onset after the prediction window 10
11 11
12 13
13 14
14 15
15 PARAMO: A Parallel Predictive Modeling Platform Pipeline Specifications Dependency Graph Generator Remove Redundancies Identify Dependencies Dependency Graph Dependency Graph Execution Engine Prioritization Scheduling Parallel Execution Results Parallelization Infrastructure 16
16 An Example Set of Pipeline Specifications Cohort construction: One patient data set Feature construction: Feature Type Diagnoses Medications Symptoms Labs Aggregation Count Count Count Mean Cross-validation: 2-fold cross-validation Feature selection: Information Gain, Fisher Score Classification: Naïve Bayes, Logistic Regression, Random Forest 17
17 Dependency Graph Generator Pipeline Specifications Dependency Graph Generator Remove Redundancies Identify Dependencies Dependency Graph Input: Pipeline specifications Output: Dependency graph Function: Remove Redundancies Identify Dependencies Encode parallel jobs 18
18 Dependency Graph 19
19 Dependency Graph Execution Engine Dependency Graph Hadoop Cluster Nimble Execution Engine Prioritization Scheduling Parallel Execution Parallelization Infrastructure Dependency Graph Multi-Process Single Server Results Input: Dependency graph Output: Results (models, scores, etc.) Function: Schedules tasks in a topological ordering of the graph Prioritizes pending tasks using information from already completed tasks Executes tasks in parallel via the parallelization infrastructure 20
20 Dependency Graph Runtime Analysis (Hadoop, 20 Concurrent Tasks) 21
21 Experimental Data Sets Data Set Years Number of Data of Patients Number of Number of Features Records Number of Cases Number of Target Controls Condition Small 3 4, ,312, Hypertension Control Medium 10 32, ,719, Heart Failure Onset Large 4 319, ,531, Hypertension Onset 22
22 Experimental Pipeline Specifications Cohort construction: One patient data set: Small, Medium, or Large Feature construction: Feature Type Diagnoses Medications Procedures Symptoms Labs Aggregation Count Count Count Count Mean Cross-validation: 10 x 10-fold cross-validation Feature selection: Information Gain, Fisher Score Classification: k-nn, Naïve Bayes, Logistic Regression, Random Forest 23
23 Running Time vs. Parallelism level days Large Medium Small 72X speed up Runtime (s) hours Serial Number of Concurrent Tasks 160 Small: 5,000 patients, Medium: 33K, Large: 319K 10 times 10-fold cross validation Dependency graph: 1808 nodes and 3610 edges 24
24 Summary Predictive models in healthcare research is becoming more prevalent Electronic health records (EHR) adoption continues to accelerate Need for scalable predictive modeling platforms/systems PARAMO is a parallel predictive modeling platform for EHR data PARAMO can facilitate large-scale modeling endeavors and speed-up the research workflow Tests on real EHR data show significant performance gains 27
25 PATIENT SIMILARITY PLATFORM 28
26 Patient Similarity Problem Similarity search Patient Doctor 29
27 Patient Similarity Problem Patient Doctor 30
28 Challenges of Patient Similarity Similar patients Traditional setting Query patient Feature vector EHR database No exact match >100k dimensional feature vector Modern setting As the size of a feature vector increases, it is hard to find exact match on all features How to find relevant patients to a query for a specific clinical context? 31
29 Our Approach Query patient >100k dimensional feature vector 1. Feature Selection & Generalization 2. Patient Important features similarity learning For a clinical context, 1. What are important features? 2. What is the right similarity measure? 32
30 Healthcare Analytic Platform Large-scale Analytics Platform 33
31 Healthcare Analytic Platform Information Healthcare Data Mining Analytics Visualization Extraction 34
32 Healthcare Analytic Platform Information Extraction Patient representation Feature extraction Structured EHR Data Mining Visualization Context FeatureAnalytics Healthcare Unstructured EHR selection Visualization Patient similarity 35
33 Feature Extraction from Unstructured EHR data Information Extraction Patient representation Feature extraction Structured EHR Data Mining Visualization Context FeatureAnalytics Healthcare Unstructured EHR selection Visualization Patient similarity 36
34 Motivations for Early Detection of Heart Failure Heart failure (HF) is a complex disease Huge Societal Burden Diagnoses are usually made late, despite there are symptoms documented in clinical notes prior Our method exacting HF symptoms achieves precision 0.925, recall 0.896, and F-score Roy J. Byrd, Steven R. Steinhubl, Jimeng Sun, Shahram Ebadollahi. Automatic identification of heart failure diagnostic criteria, using text analysis of clinical notes from electronic health records. International Journal of Medical Informatics
35 Potential Impact on Evidence-based Therapies No symptoms Framingham symptoms Clinical diagnosis Opportunity for early intervention 3,168 patients eventually all diagnosed with HF 70.00% 60.00% Preceding Framingham diagnosis 50.00% 40.00% 30.00% After Framingham diagnosis 20.00% 10.00% After clinical diagnosis 0.00% Applying text mining to extract Framingham symptoms can help trigger early intervention Vhavakrishnan R, Steinhubl SR, Sun J, et al. Potential impact of predictive models for early detection of heart failure on the initiation of evidence-based therapies. J Am Coll Cardiol. 2012;59(13s1):E949-E
36 Knowledge plus Data Feature Selection Information Extraction Patient representation Feature extraction Structured EHR Data Mining Visualization Context FeatureAnalytics Healthcare Unstructured EHR selection Visualization Patient similarity 39
37 Combining Knowledge- and Data-driven Risk Factors Knowledge Knowledge base Risk factor gathering Knowledge risk factors Combination Risk factor augmentation Clinical data Data processing Data Potential risk factors Combined risk factors Target condition Jimeng Sun, Jianying Hu, Dijun Luo, Marianthi Markatou, Fei Wang, Shahram Ebadollahi, Steven E. Steinhubl, Zahra Daar, Walter F. Stewart. Combining Knowledge and Data Driven Insights for Identifying Risk Factors using Electronic Health Records. AMIA2012 Dijun Luo, Fei Wang, Jimeng Sun, Marianthi Markatou, Jianying Hu,Shahram Ebadollahi, SOR: Scalable Orthogonal Regression for Low-Redundancy Feature Selection and its Healthcare Applications. SDM 12 41
38 Risk Factor Augmentation Sparse learning objective formulation: Model error Correlation among data-driven features Correlation between data- and knowledgedriven features Sparse Penalty Jimeng Sun, Jianying Hu, Dijun Luo, Marianthi Markatou, Fei Wang, Shahram Ebadollahi, Steven E. Steinhubl, Zahra Daar, Walter F. Stewart. Combining Knowledge and Data Driven Insights for Identifying Risk Factors using Electronic Health Records. AMIA2012 Dijun Luo, Fei Wang, Jimeng Sun, Marianthi Markatou, Jianying Hu,Shahram Ebadollahi, SOR: Scalable Orthogonal Regression for Low-Redundancy Feature Selection and its Healthcare Applications. SDM 12 42
39 Prediction Results using Selected Features AUC 0.7 Data driven features improve AUC significantly Hypertension 0.6 +diabetes Knowledge driven features improve AUC slightly CAD all knowledge features Number of features
40 Top-10 Selected Data-driven Features Feature Relevancy to HF Dyslipidemia Thiazides-like Diuretics Antihypertensive Combinations Aminopenicillins Bone density regulators Naturietic Peptide Rales Diuretic Combinations S3Gallop NSAIDS 9 out of 10 are considered relevant to HF 45
41 Top-10 Selected Data-driven Features Feature Relevancy to HF Dyslipidemia Category Thiazides-like Diuretics Diagnosis Antihypertensive Combinations Medication Aminopenicillins Lab Bone density regulators Symptom Naturietic Peptide Rales Diuretic Combinations S3Gallop NSAIDS 9 out of 10 are considered relevant to HF The data driven features are complementary to the existing knowledge-driven features 46
42 Patient Similarity Information Extraction Patient representation Feature extraction Structured EHR Data Mining Visualization Context FeatureAnalytics Healthcare Unstructured EHR selection Visualization Patient similarity 47
43 Patient Similarity through Locally Supervised Metric Learning Under a specific clinical context Query patient Jimeng Sun, Fei Wang, Jianying Hu, Shahram Edabollahi: Supervised patient similarity measure of patient records. SIGKDD Explorations 14(1): (2012) Jimeng Sun, Large-scaleheterogeneous Healthcare Analytics 48
44 Patient Similarity through Locally Supervised Metric Learning Under a specific clinical context Homogeneous neighborhood Heterogeneous neighborhood Query patient Homogeneous neighbors: true positives Heterogeneous neighbors: false positives Jimeng Sun, Fei Wang, Jianying Hu, Shahram Edabollahi: Supervised patient similarity measure of patient records. SIGKDD Explorations 14(1): (2012) Jimeng Sun, Large-scaleheterogeneous Healthcare Analytics 49
45 Patient Similarity through Locally Supervised Metric Learning Under a specific clinical context Query patient Shrink homogeneous neighborhood Grow heterogeneous neighborhood Jimeng Sun, Fei Wang, Jianying Hu, Shahram Edabollahi: Supervised patient similarity measure of patient records. SIGKDD Explorations 14(1): (2012) Jimeng Sun, Large-scaleheterogeneous Healthcare Analytics 50
46 Locally Supervised Metric Learning (LSML) Goal: Learn a generalized Mahalanobis distance for a specific clinical context (target label) Margin for xi Total distance to heterogeneous neighbors Total distance to homogeneous neighbors Homogeneous neighborhood for xi Maximize the total margin Heterogeneous neighborhood for xi Jimeng Sun, Fei Wang, Jianying Hu, Shahram Edabollahi: Supervised patient similarity measure of Jimeng Sun, Large-scaleheterogeneous Healthcare Analytics patient records. SIGKDD Explorations 14(1): (2012) 51
47 Patient Similarity Experiment Design 200K samples 195 features D1 Cases D6 Controls D1 D2 D3 D4 D5 D6 Disease Target DM with Acute Complications DM without Complications Depression Heart Failure Asthma Lung Cancers samples 4,392 10,734 6,794 5,262 6,606 1,172 56
48 Prediction Results on Patient Similarity Baselines: EUC: Euclidean distance PCA: Principal component analysis LDA: Linear discriminant analysis Observations: LDA does not perform well, because of the resulting dimensionality is too low LSML algorithm performs the best among all DM with Acute DM without Complications Complications Depression Congestive Heart Failure Asthma Lung Cancers Euclidean LDA PCA LSML
49 Clinical Relevancy of Patient Similarity Results yes 100% 90% no no 60% no no no no 80% 70% possible possible no possible possible possible possible possible 50% 40% 30% 20% 10% 0% DM with Acute DM without Complica ons Complica ons Depression Heart Failure Asthma Lung Cancers Retrieve the top ranked comorbidities among similar patients over 80% of those are considered as relevant to target disease 58
50 Summary on Patient Similarity LSML learns a customized distance metric Extension 1: Composite distance integration (Comdi) [1] How to combine multiple patient similarity measures? Extension 2: Interactive metric update (imet) [2] How to update an existing distance measure? 1. Fei Wang, Jimeng Sun, Shahram Ebadollahi: Integrating Distance Metrics Learned from Multiple Experts and its Application in Inter-Patient Similarity Assessment. SDM 2011: Fei Wang, Jimeng Sun, Jianying Hu, Shahram Ebadollahi: imet: Interactive Metric Learning in Healthcare Applications. SDM 2011:
51 Visualization Information Extraction Patient representation Feature extraction Structured EHR Data Mining Visualization Context FeatureAnalytics Healthcare Unstructured EHR selection Visualization Patient similarity 60
52 Visualization FacetAtlas (InfoVis 10) DICON (InfoVis 11) SolarMap(ICDM 11) MatrixFlow (AMIA 12) 61
53 Conclusions Information Extraction Data Mining Patient representation Feature extraction Structured EHR Unstructured EHR Visualization Context Feature selection Visualization Patient similarity Scalable healthcare analytic research platform Enables efficient collaboration across domains Extensible analytic platform Intuitive analytic results Scalable computation engine 62
54 Future of Healthcare Analytics Health Analytic Apps Clinical data Social data Data miner Visualization User Behavior data Genomic data Privacy engine Heart disease predictor for $5.99 Analytic cloud Research Challenges Data analytic techniques for each data modality Privacy preserving data sharing Visual analytic techniques 63
Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center
Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II
Exploration and Visualization of Post-Market Data
Exploration and Visualization of Post-Market Data Jianying Hu, PhD Joint work with David Gotz, Shahram Ebadollahi, Jimeng Sun, Fei Wang, Marianthi Markatou Healthcare Analytics Research IBM T.J. Watson
Predictive Care Models to Improve Outcomes Brendan Fowkes Sr. Healthcare Solution Executive May 14, 2013
Predictive Care Models to Improve Outcomes Brendan Fowkes Sr. Healthcare Solution Executive May 14, 2013 Technology Insights Performance Outcomes 2013 2012 IBM Corporation Healthcare Transformation: A
Big Data Analytics and Healthcare
Big Data Analytics and Healthcare Anup Kumar, Professor and Director of MINDS Lab Computer Engineering and Computer Science Department University of Louisville Road Map Introduction Data Sources Structured
Electronic Medical Record Mining. Prafulla Dawadi School of Electrical Engineering and Computer Science
Electronic Medical Record Mining Prafulla Dawadi School of Electrical Engineering and Computer Science Introduction An electronic health record is a systematic collection of electronic health information
Smarter Healthcare@IBM Research. Joseph M. Jasinski, Ph.D. Distinguished Engineer IBM Research
Smarter Healthcare@IBM Research Joseph M. Jasinski, Ph.D. Distinguished Engineer IBM Research Our researchers work on a wide spectrum of topics Basic Science Industry specific innovation Nanotechnology
Big Data Analytics Opportunities and Challenges
Big Data Analytics Opportunities and Challenges Anup Kumar, Professor and Director of MINDS Lab Computer Engineering and Computer Science Department University of Louisville Road Map Introduction Hadoop
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Investigating Clinical Care Pathways Correlated with Outcomes
Investigating Clinical Care Pathways Correlated with Outcomes Geetika T. Lakshmanan, Szabolcs Rozsnyai, Fei Wang IBM T. J. Watson Research Center, NY, USA August 2013 Outline Care Pathways Typical Challenges
An EVIDENCE-ENHANCED HEALTHCARE ECOSYSTEM for Cancer: I/T perspectives
An EVIDENCE-ENHANCED HEALTHCARE ECOSYSTEM for Cancer: I/T perspectives Chalapathy Neti, Ph.D. Associate Director, Healthcare Transformation, Shahram Ebadollahi, Ph.D. Research Staff Memeber IBM Research,
Putting IBM Watson to Work In Healthcare
Martin S. Kohn, MD, MS, FACEP, FACPE Chief Medical Scientist, Care Delivery Systems IBM Research [email protected] Putting IBM Watson to Work In Healthcare 2 SB 1275 Medical data in an electronic or
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
Uncovering Value in Healthcare Data with Cognitive Analytics. Christine Livingston, Perficient Ken Dugan, IBM
Uncovering Value in Healthcare Data with Cognitive Analytics Christine Livingston, Perficient Ken Dugan, IBM Conflict of Interest Christine Livingston Ken Dugan Has no real or apparent conflicts of interest
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
DICON: Visual Cluster Analysis in Support of Clinical Decision Intelligence
DICON: Visual Cluster Analysis in Support of Clinical Decision Intelligence Abstract David Gotz, PhD 1, Jimeng Sun, PhD 1, Nan Cao, MS 2, Shahram Ebadollahi, PhD 1 1 IBM T.J. Watson Research Center, New
Healthcare Big Data Exploration in Real-Time
Healthcare Big Data Exploration in Real-Time Muaz A Mian A Project Submitted in partial fulfillment of the requirements for degree of Masters of Science in Computer Science and Systems University of Washington
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
Big Data Text Mining and Visualization. Anton Heijs
Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features
Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features Charlie Berger, MS Eng, MBA Sr. Director Product Management, Data Mining and Advanced Analytics [email protected] www.twitter.com/charliedatamine
Data Mining in CRM & Direct Marketing. Jun Du The University of Western Ontario [email protected]
Data Mining in CRM & Direct Marketing Jun Du The University of Western Ontario [email protected] Outline Why CRM & Marketing Goals in CRM & Marketing Models and Methodologies Case Study: Response Model Case
Advanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Find the signal in the noise
Find the signal in the noise Electronic Health Records: The challenge The adoption of Electronic Health Records (EHRs) in the USA is rapidly increasing, due to the Health Information Technology and Clinical
BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, [email protected]) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
Big Data Mining Services and Knowledge Discovery Applications on Clouds
Big Data Mining Services and Knowledge Discovery Applications on Clouds Domenico Talia DIMES, Università della Calabria & DtoK Lab Italy [email protected] Data Availability or Data Deluge? Some decades
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Data Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
Role of Social Networking in Marketing using Data Mining
Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION
ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical
CloudAtlas: Cloud-Based Predictive Modeling Platform for Healthcare Research
CloudAtlas: Cloud-Based Predictive Modeling Platform for Healthcare Research Hang Su Georgia Institute of Technology [email protected] Fei Xia Tsinghua University [email protected] Jimeng Sun
Divide-n-Discover Discretization based Data Exploration Framework for Healthcare Analytics
for Healthcare Analytics Si-Chi Chin,KiyanaZolfaghar,SenjutiBasuRoy,AnkurTeredesai,andPaulAmoroso Institute of Technology, The University of Washington -Tacoma,900CommerceStreet,Tacoma,WA980-00,U.S.A.
Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012
Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering
Using EHRs for Heart Failure Therapy Recommendation Using Multidimensional Patient Similarity Analytics
Digital Healthcare Empowering Europeans R. Cornet et al. (Eds.) 2015 European Federation for Medical Informatics (EFMI). This article is published online with Open Access by IOS Press and distributed under
MACHINE LEARNING BASICS WITH R
MACHINE LEARNING [Hands-on Introduction of Supervised Machine Learning Methods] DURATION 2 DAY The field of machine learning is concerned with the question of how to construct computer programs that automatically
2015 Workshops for Professors
SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market
Machine Learning with MATLAB David Willingham Application Engineer
Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the
Large-scale Data Mining: MapReduce and Beyond Part 2: Algorithms. Spiros Papadimitriou, IBM Research Jimeng Sun, IBM Research Rong Yan, Facebook
Large-scale Data Mining: MapReduce and Beyond Part 2: Algorithms Spiros Papadimitriou, IBM Research Jimeng Sun, IBM Research Rong Yan, Facebook Part 2:Mining using MapReduce Mining algorithms using MapReduce
How To Perform An Ensemble Analysis
Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS
IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS V.Sudhakar 1 and G. Draksha 2 Abstract:- Collective behavior refers to the behaviors of individuals
Big Data Analytics for Mitigating Insider Risks in Electronic Medical Records
Big Data Analytics for Mitigating Insider Risks in Electronic Medical Records Bradley Malin, Ph.D. Associate Prof. & Vice Chair of Biomedical Informatics, School of Medicine Associate Prof. of Computer
KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics
ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM KATE GLEASON COLLEGE OF ENGINEERING John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE (KGCOE- CQAS- 747- Principles of
Maximizing Return and Minimizing Cost with the Decision Management Systems
KDD 2012: Beijing 18 th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Rich Holada, Vice President, IBM SPSS Predictive Analytics Maximizing Return and Minimizing Cost with the Decision Management
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Using multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
Tax Fraud in Increasing
Preventing Fraud with Through Analytics Satya Bhamidipati Data Scientist Business Analytics Product Group Copyright 2014 Oracle and/or its affiliates. All rights reserved. 2 Tax Fraud in Increasing 27%
Travis Goodwin & Sanda Harabagiu
Automatic Generation of a Qualified Medical Knowledge Graph and its Usage for Retrieving Patient Cohorts from Electronic Medical Records Travis Goodwin & Sanda Harabagiu Human Language Technology Research
Big Data 101: Harvest Real Value & Avoid Hollow Hype
Big Data 101: Harvest Real Value & Avoid Hollow Hype 2 Executive Summary Odds are you are hearing the growing hype around the potential for big data to revolutionize our ability to assimilate and act on
Healthcare data analytics. Da-Wei Wang Institute of Information Science [email protected]
Healthcare data analytics Da-Wei Wang Institute of Information Science [email protected] Outline Data Science Enabling technologies Grand goals Issues Google flu trend Privacy Conclusion Analytics
Final Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data
CS535 Big Data W1.A.1 CS535 BIG DATA W1.A.2 Let the data speak to you Medication Adherence Score How likely people are to take their medication, based on: How long people have lived at the same address
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
WHITE PAPER. Caradigm Healthcare Analytics. Healthcare Analytics
WHITE PAPER We have witnessed a major paradigm shift in healthcare information management over the past decade, instigated by the birth of electronic medical records and medical informatics. This new world
Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC
Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC 1. Introduction A popular rule of thumb suggests that
STATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies
Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,
A.I. in health informatics lecture 1 introduction & stuff kevin small & byron wallace
A.I. in health informatics lecture 1 introduction & stuff kevin small & byron wallace what is this class about? health informatics managing and making sense of biomedical information but mostly from an
Copyright 2014. This report and/or appended material may not be partly or completely published or
Aalborg University Copenhagen Semester: 4 th Title: Personalized Medicine based on patient journals and family medical history records Aalborg University Copenhagen A.C. Meyers Vænge 15 2450 København
Performance Metrics for Graph Mining Tasks
Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Introduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
Sanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 [email protected] 1. Introduction The field of data mining and knowledgee discovery is emerging as a
Learning from Diversity
Learning from Diversity Epitope Prediction with Sequence and Structure Features using an Ensemble of Support Vector Machines Rob Patro and Carl Kingsford Center for Bioinformatics and Computational Biology
Supervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
Maschinelles Lernen mit MATLAB
Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical
Introduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
Hexaware E-book on Predictive Analytics
Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015
W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H-1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction
Statistics for BIG data
Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before
KnowledgeSEEKER Marketing Edition
KnowledgeSEEKER Marketing Edition Predictive Analytics for Marketing The Easiest to Use Marketing Analytics Tool KnowledgeSEEKER Marketing Edition is a predictive analytics tool designed for marketers
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for
Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis
Webinar will begin shortly Hadoop s Advantages for Machine Learning and Predictive Analytics Presented by Hortonworks & Zementis September 10, 2014 Copyright 2014 Zementis, Inc. All rights reserved. 2
WHITE PAPER. QualityAnalytics. Bridging Clinical Documentation and Quality of Care
WHITE PAPER QualityAnalytics Bridging Clinical Documentation and Quality of Care 2 EXECUTIVE SUMMARY The US Healthcare system is undergoing a gradual, but steady transformation. At the center of this transformation
