diagnosis through Random
|
|
|
- Caroline Ellis
- 10 years ago
- Views:
Transcription
1 Convegno Calcolo ad Alte Prestazioni "Biocomputing" Bio-molecular diagnosis through Random Subspace Ensembles of Learning Machines Alberto Bertoni, Raffaella Folgieri, Giorgio Valentini DSI Dipartimento di Scienze dell Informazione Università degli Studi di Milano
2 Outline Bio-molecular diagnosis of tumors using machine learning methods Current approaches to automatic bio-molecular diagnosis Random Subspace (RS) ensemble: experimental results on a case study Combining feature selection and RS ensemble On-going work: RP-ensembles
3 Bio-molecular diagnosis of malignancies: motivations Traditional clinical diagnostic approaches may sometimes fail in detecting tumors (Alizadeh et al. 2001) Several results showed that bio-molecular analysis of malignancies may help to better characterize malignancies (e.g. gene expression profiling) Information for supporting both diagnosis and prognosis of malignancies at bio-molecular level may be obtained from high-throughput biotechnologies (e.g. DNA microarray)
4 Bio-molecular diagnosis of malignancies: current approaches Huge amount of data available from biotechnologies: analysis and extraction of significant biological knowledge is critical Current approaches: statistical methods and machine learning methods (Golub et al., 1999; Furey et al., 2000; Ramaswamy et al., 2001; Dudoit et al. 2002; Lee & Lee, 2003; Weston et al., 2003, Dettling et al., 2003, Dettling 2004, Zhou et al, 2005, Zhang et al., 2006).
5 Main problems with gene expression data for bio-molecular diagnosis High dimensionality Low cardinality Curse of dimensionality Data are usually noisy: Gene expression measurements Labeling errors
6 Current approaches against the curse of dimensionality Selection of significant subsets of components (genes) e.g.: filter methods, forward selection, backward selection, recursive feature elimination, entropy and mutual information based feature selection methods (see Guyon & Ellisseef, 2003 for a recent review). Extraction of significant subsets of features e.g.: Principal Component Analysis or Independent Component Analysis Anyway, both approaches have problems...
7 An alternative approach based on ensemble methods Random subspace (RS) ensembles: RS (Ho, 1998) reduce the high dimensionality of the data by randomly selecting subsets of genes. Aggregation of different base learners trained on different subsets of features may reduce variance and improve diversity D 1 h 1 D Algorithm Aggregation h D m h m
8 The RS algorithm Input: a d-dimensional labelled gene expression data set D - a learning algorithm L - subspace dimension n<d - number of the base learners I Output: - Final hypothesis h ran :X C computed by the ensemble begin for i = 1 to I begin D i = Subspace_projection(D,n) H i = L(D i ) end h ran (x)=argmax t C card({i h i (x)=t}) end
9 Reasons for applying RS ensembles to the bio-molecular diagnosis of tumors Gene expression data are usually very high dimensional, and RS ensembles reduce the dimensionality and are effective with high dimensional data (Skurichina and Duin, 2002) Co-regulated genes show correlated gene expression levels (Gasch and Eisen, 2002), and RS ensembles are effective with correlated sets of features (Bingham and Mannila, 2001) Random projections may improve the diversity between base learners Overall accuracy of the ensemble may be enhanced through aggregation techniques (at least w.r.t. the variance component of the error)
10 Colon adenocarcinoma diagnosis Data (Alon et al., 1999): 62 samples 40 colon tumors 22 normal colon samples 2000 genes Methods: RS ensembles with linear SVMs as base learners Single linear SVMs Software: C++ NEURObjects library Hardware: Avogadro cluster of Xeon double processor workstations (Arlandini, 2005)
11 Results Colon tumor prediction (5 fold cross validation)
12 Colon tumor prediction: error as a function of the susbspace dimension Single SVM test error
13 Average base learner error The better accuracy of the RS ensemble does not simply depend on the better accuracy of their component base learners
14 - Open problems with RS methods 1. Can we explain the effectiveness of RS through the diversity of the base learners? 2. Can we get a bias-variance interpretation? 3. What about the optimal subspace dimension? 4. Are feature selection and random subspace ensemble approaches alternative, or it may be useful to combine them?
15 Combining feature selection and random subspace ensemble methods Random Subspace on Selected Features (RS-SF algorithm) A two-steps algorithm: 1. Select a subset of features (genes) according to a suitable feature selection method 2. Apply the random subspace ensemble method to the subset of selected features
16 Results on combining feature selection with random subspace ensembles Colon data set (Alon, 1999) 5-fold cross validation
17 Comparison with other methods Methods Estimated error LogitBoost (Dettling and Buhlmann, 2003) Bagging (Valentini et al., 2004) BagBoost (Dettling, 2004) Random Forest (Breiman, 2001) Random Subspace SVM PAM (Tibshirani et al. 2002) DLDA (Dudoit et al. 2002) knn Colon data set: generalization error estimated through crossvalidation or multiple-hold out techniques
18 An on-going development: Supervised Randomly Projected Ensembles (RP-ensembles): Recent work on unsupervised analysis of complex bio-molecular data (Bertoni and Valentini, 2006) showed that random projections obeying the Johnson-Lindenstrauss lemma can be used for: Discovering structures in bio-molecular data Validating clustering results Improving clustering results Random projections to lower dimensional subspaces can be applied to supervised analysis (e.g. bio-molecular diagnosis)?
19 Conclusions RS ensembles can improve the accuracy of biomolecular diagnosis characterized by very high dimensional data They could be also easily applied to heterogeneous bio-molecular and clinical data. A new promising approach consists in combining state of the art feature (gene) selection methods and RS ensembles RS ensembles are computationally intensive but can be easily parallelized using clusters of workstations (e.g. in a MPI framework).
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
Supervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
II. RELATED WORK. Sentiment Mining
Sentiment Mining Using Ensemble Classification Models Matthew Whitehead and Larry Yaeger Indiana University School of Informatics 901 E. 10th St. Bloomington, IN 47408 {mewhiteh, larryy}@indiana.edu Abstract
Maschinelles Lernen mit MATLAB
Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
Robust Feature Selection Using Ensemble Feature Selection Techniques
Robust Feature Selection Using Ensemble Feature Selection Techniques Yvan Saeys, Thomas Abeel, and Yves Van de Peer Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Gent, Belgium and
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca [email protected] Spain Manuel Martín-Merino Universidad
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
Gene Selection for Cancer Classification using Support Vector Machines
Gene Selection for Cancer Classification using Support Vector Machines Isabelle Guyon+, Jason Weston+, Stephen Barnhill, M.D.+ and Vladimir Vapnik* +Barnhill Bioinformatics, Savannah, Georgia, USA * AT&T
Decompose Error Rate into components, some of which can be measured on unlabeled data
Bias-Variance Theory Decompose Error Rate into components, some of which can be measured on unlabeled data Bias-Variance Decomposition for Regression Bias-Variance Decomposition for Classification Bias-Variance
Mathematical Models of Supervised Learning and their Application to Medical Diagnosis
Genomic, Proteomic and Transcriptomic Lab High Performance Computing and Networking Institute National Research Council, Italy Mathematical Models of Supervised Learning and their Application to Medical
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
Combining SVM classifiers for email anti-spam filtering
Combining SVM classifiers for email anti-spam filtering Ángela Blanco Manuel Martín-Merino Abstract Spam, also known as Unsolicited Commercial Email (UCE) is becoming a nightmare for Internet users and
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
Gene expression analysis. Ulf Leser and Karin Zimmermann
Gene expression analysis Ulf Leser and Karin Zimmermann Ulf Leser: Bioinformatics, Wintersemester 2010/2011 1 Last lecture What are microarrays? - Biomolecular devices measuring the transcriptome of a
Predicting Flight Delays
Predicting Flight Delays Dieterich Lawson [email protected] William Castillo [email protected] Introduction Every year approximately 20% of airline flights are delayed or cancelled, costing
203.4770: Introduction to Machine Learning Dr. Rita Osadchy
203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:
Bagged Ensemble Classifiers for Sentiment Classification of Movie Reviews
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3 Issue 2 February, 2014 Page No. 3951-3961 Bagged Ensemble Classifiers for Sentiment Classification of Movie
Model Combination. 24 Novembre 2009
Model Combination 24 Novembre 2009 Datamining 1 2009-2010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy
International Journal of Software and Web Sciences (IJSWS) www.iasir.net
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research [email protected]
Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research [email protected] Introduction Logistics Prerequisites: basics concepts needed in probability and statistics
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
Comparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
REVIEW OF ENSEMBLE CLASSIFICATION
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,
Advanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht [email protected] 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht [email protected] 539 Sennott
Feature Selection with Monte-Carlo Tree Search
Feature Selection with Monte-Carlo Tree Search Robert Pinsler 20.01.2015 20.01.2015 Fachbereich Informatik DKE: Seminar zu maschinellem Lernen Robert Pinsler 1 Agenda 1 Feature Selection 2 Feature Selection
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 10 Sajjad Haider Fall 2012 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Ensemble Learning of Colorectal Cancer Survival Rates
Ensemble Learning of Colorectal Cancer Survival Rates Chris Roadknight School of Computing Science University of Nottingham Malaysia Campus Malaysia [email protected] Uwe Aickelin School
Statistics W4240: Data Mining Columbia University Spring, 2014
Statistics W4240: Data Mining Columbia University Spring, 2014 Version: January 30, 2014. The syllabus is subject to change, so look for the version with the most recent date. Course Description Massive
Applied Multivariate Analysis - Big data analytics
Applied Multivariate Analysis - Big data analytics Nathalie Villa-Vialaneix [email protected] http://www.nathalievilla.org M1 in Economics and Economics and Statistics Toulouse School of
Generalizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel
Generalizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel Copyright 2008 All rights reserved. Random Forests Forest of decision
Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University [email protected]
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University [email protected] 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
Data Mining Techniques for Prognosis in Pancreatic Cancer
Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree
An Experimental Study on Rotation Forest Ensembles
An Experimental Study on Rotation Forest Ensembles Ludmila I. Kuncheva 1 and Juan J. Rodríguez 2 1 School of Electronics and Computer Science, University of Wales, Bangor, UK [email protected]
An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data
n Introduction to the Use of ayesian Network to nalyze Gene Expression Data Cristina Manfredotti Dipartimento di Informatica, Sistemistica e Comunicazione (D.I.S.Co. Università degli Studi Milano-icocca
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Manjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India
Volume 5, Issue 6, June 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Multiple Pheromone
Automatic Text Processing: Cross-Lingual. Text Categorization
Automatic Text Processing: Cross-Lingual Text Categorization Dipartimento di Ingegneria dell Informazione Università degli Studi di Siena Dottorato di Ricerca in Ingegneria dell Informazone XVII ciclo
A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode
A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode Seyed Mojtaba Hosseini Bamakan, Peyman Gholami RESEARCH CENTRE OF FICTITIOUS ECONOMY & DATA SCIENCE UNIVERSITY
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
How To Identify A Churner
2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management
BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, [email protected]) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
Introduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
MS1b Statistical Data Mining
MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to
Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov
Data Integration Lectures 16 & 17 Lectures Outline Goals for Data Integration Homogeneous data integration time series data (Filkov et al. 2002) Heterogeneous data integration microarray + sequence microarray
Data Mining and Machine Learning in Bioinformatics
Data Mining and Machine Learning in Bioinformatics PRINCIPAL METHODS AND SUCCESSFUL APPLICATIONS Ruben Armañanzas http://mason.gmu.edu/~rarmanan Adapted from Iñaki Inza slides http://www.sc.ehu.es/isg
T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577
T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing
Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation
Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation James K. Kimotho, Christoph Sondermann-Woelke, Tobias Meyer, and Walter Sextro Department
Big Data Analytics for Healthcare
Big Data Analytics for Healthcare Jimeng Sun Chandan K. Reddy Healthcare Analytics Department IBM TJ Watson Research Center Department of Computer Science Wayne State University 1 Healthcare Analytics
Effect Size and Power
Università di Trento DiSCoF Dipartimento di Scienze della Cognizione e Formazione Effect Size and Power [email protected] Methodological course (COBRAS-DiSCoF) A simple example of one-way anova.
Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach
Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach Xiaoli Zhang Fern [email protected] Carla E. Brodley [email protected] School of Electrical and Computer Engineering,
L25: Ensemble learning
L25: Ensemble learning Introduction Methods for constructing ensembles Combination strategies Stacked generalization Mixtures of experts Bagging Boosting CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna
Beating the MLB Moneyline
Beating the MLB Moneyline Leland Chen [email protected] Andrew He [email protected] 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series
Sanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 [email protected] 1. Introduction The field of data mining and knowledgee discovery is emerging as a
Learning from Diversity
Learning from Diversity Epitope Prediction with Sequence and Structure Features using an Ensemble of Support Vector Machines Rob Patro and Carl Kingsford Center for Bioinformatics and Computational Biology
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Microarray Data Mining: Puce a ADN
Microarray Data Mining: Puce a ADN Recent Developments Gregory Piatetsky-Shapiro KDnuggets EGC 2005, Paris 2005 KDnuggets EGC 2005 Role of Gene Expression Cell Nucleus Chromosome Gene expression Protein
Learning is a very general term denoting the way in which agents:
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes
Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk [email protected] Tom Kelsey ID5059-19-B &
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
Unsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,
Going Big in Data Dimensionality:
LUDWIG- MAXIMILIANS- UNIVERSITY MUNICH DEPARTMENT INSTITUTE FOR INFORMATICS DATABASE Going Big in Data Dimensionality: Challenges and Solutions for Mining High Dimensional Data Peer Kröger Lehrstuhl für
User Authentication/Identification From Web Browsing Behavior
User Authentication/Identification From Web Browsing Behavior US Naval Research Laboratory PI: Myriam Abramson, Code 5584 Shantanu Gore, SEAP Student, Code 5584 David Aha, Code 5514 Steve Russell, Code
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
Fast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski [email protected]
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski [email protected] Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training
Making Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University [email protected] [email protected] I. Introduction III. Model The goal of our research
Chapter 12 Bagging and Random Forests
Chapter 12 Bagging and Random Forests Xiaogang Su Department of Statistics and Actuarial Science University of Central Florida - 1 - Outline A brief introduction to the bootstrap Bagging: basic concepts
A Survey on Pre-processing and Post-processing Techniques in Data Mining
, pp. 99-128 http://dx.doi.org/10.14257/ijdta.2014.7.4.09 A Survey on Pre-processing and Post-processing Techniques in Data Mining Divya Tomar and Sonali Agarwal Indian Institute of Information Technology,
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
Machine Learning and Statistics: What s the Connection?
Machine Learning and Statistics: What s the Connection? Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh, UK August 2006 Outline The roots of machine learning
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
Predicting borrowers chance of defaulting on credit loans
Predicting borrowers chance of defaulting on credit loans Junjie Liang ([email protected]) Abstract Credit score prediction is of great interests to banks as the outcome of the prediction algorithm
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,
On the effect of data set size on bias and variance in classification learning
On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent
How To Perform An Ensemble Analysis
Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier
Final Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
Scalable Developments for Big Data Analytics in Remote Sensing
Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,
Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News
Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Sushilkumar Kalmegh Associate Professor, Department of Computer Science, Sant Gadge Baba Amravati
BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16
Course Director: Dr. Barry Grant (DCM&B, [email protected]) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
HT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
