Mathematical Models of Supervised Learning and their Application to Medical Diagnosis
|
|
|
- Jodie Cunningham
- 10 years ago
- Views:
Transcription
1 Genomic, Proteomic and Transcriptomic Lab High Performance Computing and Networking Institute National Research Council, Italy Mathematical Models of Supervised Learning and their Application to Medical Diagnosis Mario Rosario Guarracino January 9, /12/2007 6:53 PM
2 Acknowledgements prof. Franco Giannessi U. of Pisa, prof. Panos Pardalos CAO UFL, Onur Seref CAO UFL, Claudio Cifarelli HP. January 9, Pg. 2
3 Agenda Mathematical models of supervised learning Purpose of incremental learning Subset selection algorithm Initial points selection Accuracy results Conclusion and future work January 9, Pg. 3
4 Introduction Supervised learning refers to the capability of a system to learn from examples (training set). The trained system is able to provide an answer (output) for each new question (input). Supervised means the desired output for the training set is provided by an external teacher. Binary classification is among the most successful methods for supervised learning. January 9, Pg. 4
5 Applications Many applications in biology and medicine: Tissues that are prone to cancer can be detected with high accuracy. Identification of new genes or isoforms of gene expressions in large datasets. New DNA sequences or proteins can be tracked down to their origins. Analysis and reduction of data spatiality and principal characteristics for drug design. January 9, Pg. 5
6 Problem characteristics Data produced in biomedical application will exponentially increase in the next years. Gene expression data contain tens of thousand characteristics. In genomic/proteomic application, data are often updated, which poses problems to the training step. Current classification methods can over-fit the problem, providing models that do not generalize well. January 9, Pg. 6
7 Linear discriminant planes Consider a binary classification task with points in two linearly separable sets. There exists a plane that classifies all points in the two sets A B There are infinitely many planes that correctly classify the training data. January 9, Pg. 7
8 SVM classification A different approach, yielding the same solution, is to maximize the margin between support planes Support planes leave all points of a class on one side A B Support planes are pushed apart until they bump into a small set of data points (support vectors). January 9, Pg. 8
9 SVM classification Support Vector Machines are the state of the art for the existing classification methods. Their robustness is due to the strong fundamentals of statistical learning theory. The training relies on optimization of a quadratic convex cost function, for which many methods are available. Available software includes SVM-Lite and LIBSVM. These techniques can be extended to the nonlinear discrimination, embedding the data in a nonlinear space using kernel functions. January 9, Pg. 9
10 A different religion Binary classification problem can be formulated as a generalized eigenvalue problem (GEPSVM). Find x w 1 =γ 1 the closer to A and the farther from B: A B O. Mangasarian et al., (2006) IEEE Trans. PAMI January 9, Pg. 10
11 ReGEC technique Let [w 1 γ 1 ] and [w m γ m ] be eigenvectors associated to min and max eigenvalues of Gx=λHx: a A closer to x'w 1 -γ 1 =0 than to x'w m -γ m =0, b B closer to x'w m -γ m =0 than to x'w 1 -γ 1 =0. M.R. Guarracino et al., (2007) OMS. January 9, Pg. 11
12 Nonlinear classification When classes cannot be linearly separated, nonlinear discrimination is needed Classification surfaces can be very tangled. This model accurately describes original data, but does not generalize to new data (over-fitting). January 9, Pg. 12
13 How to solve the problem? January 9, Pg. 13
14 Incremental classification A possible solution is to find a small and robust subset of the training set that provides comparable accuracy results. A smaller set of points: reduces the probability of over-fitting the problem, iscomputationally more efficient in predicting new points. As new points become available, the cost of retraining the algorithm decreases if the influence of the new points is only evaluated with respect to the small subset. January 9, Pg. 14
15 I-ReGEC: Incremental learning algorithm 1: Γ 0 = C \ C 0 2: {M 0, Acc 0 }= Classify( C; C 0 ) 3: k = 1 4: while Γ k > 0 do 5: x k = x : max x {Mk Γ k-1 } {dist(x, P class(x) )} 6: {M k, Acc k }= Classify( C; {C k-1 {x k }} ) 7: if Acc k > Acc k-1 then 8: C k = C k-1 {x k } 9: k = k : end if 11: Γ k = Γ k-1 \{x k } 12: end while January 9, Pg. 15
16 I-ReGEC overfitting ReGEC accuracy=84.44 I-ReGEC accuracy= When ReGEC algorithm is trained on all points, surfaces are affected by noisy points (left). I-ReGEC achieves clearly defined boundaries, preserving accuracy (right). Less then 5% of points needed for training! January 9, Pg. 16
17 Initial points selection Unsupervised clustering techniques can be adapted to select initial points. We compare the classification obtained with k randomly selected starting points for each class, and k points determined by k-means method. Results show higher classification accuracy and a more consistent representation of the training set, when k-means method is used instead of random selection. January 9, Pg. 17
18 Initial points selection Starting points C i chosen: randomly (top), k-means (bottom). For each kernel produced by C i, a set of evenly distributed points x is classified. The procedure is repeated 100 times. Let y i {1; -1} be the classification based on C i. y = y i estimates the probability x is classified in one class. random acc=84 std = 0.05 k-means acc=85 std = 0.01 January 9, Pg. 18
19 Initial points selection Starting points C i chosen: randomly (top), k-means (bottom). For each kernel produced by C i, a set of evenly distributed points x is classified. The procedure is repeated 100 times. Let y i {1; -1} be the classification based on C i. y = y i estimates the probability x is classified in one class. random acc=72.1std = 1.45 k-means acc=97.6 std = 0.04 January 9, Pg. 19
20 Initial point selection Effect of increasing initial points k with k-means on Chessboard dataset The graph shows the classification accuracy versus the total number of initial points 2k from both classes. This result empirically shows that there is a minimum k, for which maximum accuracy is reached. January 9, Pg. 20
21 Initial point selection Bottom figure shows k vs. the number of additional points included in the incremental dataset January 9, Pg. 21
22 Dataset reduction Experiments on real and synthetic datasets confirm training data reduction. Dataset Banana chunk 15.7 I-ReGEC % of train 3.92 German Diabetis Haberman Bupa Votes WPBC Thyroid Flare-solar January 9, Pg. 22
23 Accuracy results Classification accuracy with incremental techniques well compare with standard methods Dataset Banana German Diabetis Haberman ReGEC train acc I-ReGEC chunk k acc SVM acc Bupa Votes WPBC Thyroid Flaresolar January 9, Pg. 23
24 Positive results Incremental learning, in conjunction with ReGEC, reduces training sets dimension. Accuracy results well compare with those obtained selecting all training points. Classification surfaces can be generalized. January 9, Pg. 24
25 Ongoing research Microarray technology can scan expression levels of tens of thousands of genes to classify patients in different groups. For example, it is possible to classify types of cancers with respect to the patterns of gene activity in the tumor cells. Standard methods fail to derive grouping of genes responsible of classification. January 9, Pg. 25
26 Examples of microarray analysis Breast cancer: BRCA1 vs. BRCA2 and sporadic mutations, I. Hedenfalk et al, NEJM, Prostate cancer: prediction of patient outcome after prostatectomy, Singh D. et al, Cancer Cell, Malignant gliomas survival: gene expression vs. histological classification, C. Nutt et al, Cancer Res., Clinical outcome of breast cancer, L. van t Veer et al, Nature, Recurrence of hepatocellaur carcinoma after curative resection, N. Iizuka et al, Lancet, Tumor vs. normal colon tissues, A. Alon et al, PNAS, Acute Myeloid vs. Lymphoblastic Leukemia, T. Golub et al, Science, January 9, Pg. 26
27 Feature selection techniques Standard methods need long and memory intensive computations. PCA, SVD, ICA, Statistical techniques are much faster, but can produce low accuracy results. FDA, LDA, Need for hybrid techniques that can take advantage of both approaches. January 9, Pg. 27
28 ILDC-ReGEC Simultaneous incremental learning and decremented characterization permit to acquire knowledge on gene grouping during the classification process. This technique relies on standard statistical indexes (mean µ and standard deviation σ): January 9, Pg. 28
29 ILDC-ReGEC: Golub dataset X About 100 genes out of 7129 AML All responsible of discrimination Acute Myeloid Leukemia, and Acute Lymphoblastic Leukemia. X Selected genes in agreement with previous studies. X Less then 10 patients, out of 72, needed for training. Classification accuracy: 96.86% January 9, Pg. 29
30 ILDC-ReGEC: Golub dataset Missclassified patient Different techniques agree on the miss-classified patient! January 9, Pg. 30
31 Gene expression analysis ILDC-ReGEC Incremental classification with feature selection for microarray datasets. Dataset H-BRCA1 22 x 3226 H-BRCA2 22 x 3226 H-Sporadic 22 x 3226 Singh 136 x chunk % of train features % of features Few experiments and genes selected as important for discrimination. Nutt 50 x Vantveer 98 x Iizuka 60 x 7129 Alon 62 x 2000 Golub 72 x January 9, Pg. 31
32 ILDC-ReGEC: gene expression analysis Dataset LLS SVM KLS SVM UPCA FDA SPCA FDA LUPCA FDA LSPCA FDA KUPCA FDA KUPCA FDA ILDC ReGEC H-BRCA1 22 x H-BRCA2 22 x H-Sporadic 22 x Singh 136 x n.a. n.a n.a. n.a Nutt 50 x n.a. n.a n.a. n.a Vantveer 98 x n.a. n.a n.a. n.a Iizuka 60 x n.a. n.a n.a. n.a Alon 62 x Golub 72 x January 9, Pg. 32
33 Conclusions ReGEC is a competitive classification method. Incremental learning reduces redundancy in training sets and can help avoiding over-fitting. Subset selection algorithm provides a constructive way to reduce complexity in kernel based classification algorithms. Initial points selection strategy can help in finding regions where knowledge is missing. IReGEC can be a starting point to explore very large problems. January 9, Pg. 33
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, [email protected]) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano
Final Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
Machine learning for algo trading
Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with
Medical Informatics II
Medical Informatics II Zlatko Trajanoski Institute for Genomics and Bioinformatics Graz University of Technology http://genome.tugraz.at [email protected] Medical Informatics II Introduction
Gene Selection for Cancer Classification using Support Vector Machines
Gene Selection for Cancer Classification using Support Vector Machines Isabelle Guyon+, Jason Weston+, Stephen Barnhill, M.D.+ and Vladimir Vapnik* +Barnhill Bioinformatics, Savannah, Georgia, USA * AT&T
Data Mining Techniques for Prognosis in Pancreatic Cancer
Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,
Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov
Data Integration Lectures 16 & 17 Lectures Outline Goals for Data Integration Homogeneous data integration time series data (Filkov et al. 2002) Heterogeneous data integration microarray + sequence microarray
ADVANCED MACHINE LEARNING. Introduction
1 1 Introduction Lecturer: Prof. Aude Billard ([email protected]) Teaching Assistants: Guillaume de Chambrier, Nadia Figueroa, Denys Lamotte, Nicola Sommer 2 2 Course Format Alternate between: Lectures
Support Vector Machine (SVM)
Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
Introduction to Pattern Recognition
Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University [email protected] CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental
Supervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
Scalable Developments for Big Data Analytics in Remote Sensing
Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,
Exploratory data analysis for microarray data
Eploratory data analysis for microarray data Anja von Heydebreck Ma Planck Institute for Molecular Genetics, Dept. Computational Molecular Biology, Berlin, Germany [email protected] Visualization
Unsupervised and supervised dimension reduction: Algorithms and connections
Unsupervised and supervised dimension reduction: Algorithms and connections Jieping Ye Department of Computer Science and Engineering Evolutionary Functional Genomics Center The Biodesign Institute Arizona
Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center
Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II
Simple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,
Making Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University [email protected] [email protected] I. Introduction III. Model The goal of our research
Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal
Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether
APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support
TIETS34 Seminar: Data Mining on Biometric identification
TIETS34 Seminar: Data Mining on Biometric identification Youming Zhang Computer Science, School of Information Sciences, 33014 University of Tampere, Finland [email protected] Course Description Content
Biomedical Big Data and Precision Medicine
Biomedical Big Data and Precision Medicine Jie Yang Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago October 8, 2015 1 Explosion of Biomedical Data 2 Types
Machine Learning CS 6830. Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science [email protected]
Machine Learning CS 6830 Razvan C. Bunescu School of Electrical Engineering and Computer Science [email protected] What is Learning? Merriam-Webster: learn = to acquire knowledge, understanding, or skill
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
Semi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.
Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,
A Survey on Pre-processing and Post-processing Techniques in Data Mining
, pp. 99-128 http://dx.doi.org/10.14257/ijdta.2014.7.4.09 A Survey on Pre-processing and Post-processing Techniques in Data Mining Divya Tomar and Sonali Agarwal Indian Institute of Information Technology,
Feature Selection vs. Extraction
Feature Selection In many applications, we often encounter a very large number of potential features that can be used Which subset of features should be used for the best classification? Need for a small
Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu
Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of
The Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
diagnosis through Random
Convegno Calcolo ad Alte Prestazioni "Biocomputing" Bio-molecular diagnosis through Random Subspace Ensembles of Learning Machines Alberto Bertoni, Raffaella Folgieri, Giorgio Valentini DSI Dipartimento
Microarray Data Mining: Puce a ADN
Microarray Data Mining: Puce a ADN Recent Developments Gregory Piatetsky-Shapiro KDnuggets EGC 2005, Paris 2005 KDnuggets EGC 2005 Role of Gene Expression Cell Nucleus Chromosome Gene expression Protein
FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem
FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem Elsa Bernard Laurent Jacob Julien Mairal Jean-Philippe Vert September 24, 2013 Abstract FlipFlop implements a fast method for de novo transcript
Big learning: challenges and opportunities
Big learning: challenges and opportunities Francis Bach SIERRA Project-team, INRIA - Ecole Normale Supérieure December 2013 Omnipresent digital media Scientific context Big data Multimedia, sensors, indicators,
Environmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
Multi-class Classification: A Coding Based Space Partitioning
Multi-class Classification: A Coding Based Space Partitioning Sohrab Ferdowsi, Svyatoslav Voloshynovskiy, Marcin Gabryel, and Marcin Korytkowski University of Geneva, Centre Universitaire d Informatique,
Morphological analysis on structural MRI for the early diagnosis of neurodegenerative diseases. Marco Aiello On behalf of MAGIC-5 collaboration
Morphological analysis on structural MRI for the early diagnosis of neurodegenerative diseases Marco Aiello On behalf of MAGIC-5 collaboration Index Motivations of morphological analysis Segmentation of
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
Using multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
Genomic Medicine The Future of Cancer Care. Shayma Master Kazmi, M.D. Medical Oncology/Hematology Cancer Treatment Centers of America
Genomic Medicine The Future of Cancer Care Shayma Master Kazmi, M.D. Medical Oncology/Hematology Cancer Treatment Centers of America Personalized Medicine Personalized health care is a broad term for interventions
MS1b Statistical Data Mining
MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD
Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
A fast multi-class SVM learning method for huge databases
www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
ALCHEMIST (Adjuvant Lung Cancer Enrichment Marker Identification and Sequencing Trials)
ALCHEMIST (Adjuvant Lung Cancer Enrichment Marker Identification and Sequencing Trials) 3 Integrated Trials Testing Targeted Therapy in Early Stage Lung Cancer Part of NCI s Precision Medicine Effort in
Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees
Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and tree-based classification techniques.
Manjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India
Volume 5, Issue 6, June 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Multiple Pheromone
Neural Networks and Support Vector Machines
INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines
Going Big in Data Dimensionality:
LUDWIG- MAXIMILIANS- UNIVERSITY MUNICH DEPARTMENT INSTITUTE FOR INFORMATICS DATABASE Going Big in Data Dimensionality: Challenges and Solutions for Mining High Dimensional Data Peer Kröger Lehrstuhl für
An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data
n Introduction to the Use of ayesian Network to nalyze Gene Expression Data Cristina Manfredotti Dipartimento di Informatica, Sistemistica e Comunicazione (D.I.S.Co. Università degli Studi Milano-icocca
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
High-Dimensional Data Visualization by PCA and LDA
High-Dimensional Data Visualization by PCA and LDA Chaur-Chin Chen Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan Abbie Hsu Institute of Information Systems & Applications,
Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012
Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering
Introduction to Machine Learning Using Python. Vikram Kamath
Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression
Machine Learning for Data Science (CS4786) Lecture 1
Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 10:10 to 11:25 AM Hollister B14 Instructors : Lillian Lee and Karthik Sridharan ROUGH DETAILS ABOUT THE COURSE Diagnostic assignment 0 is out:
LEUKEMIA CLASSIFICATION USING DEEP BELIEF NETWORK
Proceedings of the IASTED International Conference Artificial Intelligence and Applications (AIA 2013) February 11-13, 2013 Innsbruck, Austria LEUKEMIA CLASSIFICATION USING DEEP BELIEF NETWORK Wannipa
Unsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
Statistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
203.4770: Introduction to Machine Learning Dr. Rita Osadchy
203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:
Classification and Regression by randomforest
Vol. 2/3, December 02 18 Classification and Regression by randomforest Andy Liaw and Matthew Wiener Introduction Recently there has been a lot of interest in ensemble learning methods that generate many
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Supervised and unsupervised learning - 1
Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
Sub-class Error-Correcting Output Codes
Sub-class Error-Correcting Output Codes Sergio Escalera, Oriol Pujol and Petia Radeva Computer Vision Center, Campus UAB, Edifici O, 08193, Bellaterra, Spain. Dept. Matemàtica Aplicada i Anàlisi, Universitat
HIGH DIMENSIONAL UNSUPERVISED CLUSTERING BASED FEATURE SELECTION ALGORITHM
HIGH DIMENSIONAL UNSUPERVISED CLUSTERING BASED FEATURE SELECTION ALGORITHM Ms.Barkha Malay Joshi M.E. Computer Science and Engineering, Parul Institute Of Engineering & Technology, Waghodia. India Email:
Identification algorithms for hybrid systems
Identification algorithms for hybrid systems Giancarlo Ferrari-Trecate Modeling paradigms Chemistry White box Thermodynamics System Mechanics... Drawbacks: Parameter values of components must be known
Maximum Margin Clustering
Maximum Margin Clustering Linli Xu James Neufeld Bryce Larson Dale Schuurmans University of Waterloo University of Alberta Abstract We propose a new method for clustering based on finding maximum margin
Evaluation of Machine Learning Techniques for Green Energy Prediction
arxiv:1406.3726v1 [cs.lg] 14 Jun 2014 Evaluation of Machine Learning Techniques for Green Energy Prediction 1 Objective Ankur Sahai University of Mainz, Germany We evaluate Machine Learning techniques
BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16
Course Director: Dr. Barry Grant (DCM&B, [email protected]) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven
Bootstrapping Big Data
Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu
A MapReduce based distributed SVM algorithm for binary classification
A MapReduce based distributed SVM algorithm for binary classification Ferhat Özgür Çatak 1, Mehmet Erdal Balaban 2 1 National Research Institute of Electronics and Cryptology, TUBITAK, Turkey, Tel: 0-262-6481070,
SAP HANA Enabling Genome Analysis
SAP HANA Enabling Genome Analysis Joanna L. Kelley, PhD Postdoctoral Scholar, Stanford University Enakshi Singh, MSc HANA Product Management, SAP Labs LLC Outline Use cases Genomics review Challenges in
Guide for Data Visualization and Analysis using ACSN
Guide for Data Visualization and Analysis using ACSN ACSN contains the NaviCell tool box, the intuitive and user- friendly environment for data visualization and analysis. The tool is accessible from the
Support Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
Network Intrusion Detection using Semi Supervised Support Vector Machine
Network Intrusion Detection using Semi Supervised Support Vector Machine Jyoti Haweliya Department of Computer Engineering Institute of Engineering & Technology, Devi Ahilya University Indore, India ABSTRACT
Music Mood Classification
Music Mood Classification CS 229 Project Report Jose Padial Ashish Goel Introduction The aim of the project was to develop a music mood classifier. There are many categories of mood into which songs may
Survey of clinical data mining applications on big data in health informatics
Survey of clinical data mining applications on big data in health informatics Matthew Herland, Taghi M. Khoshgoftaar, and Randall Wald 劉 俊 成 Survey of clinical data mining applications on big data in health
