Nonnegative Matrix Factorization (NMF) in Semisupervised Learning Reducing Dimension and Maintaining Meaning


 Gertrude Paul
 3 years ago
 Views:
Transcription
1 Nonnegative Matrix Factorization (NMF) in Semisupervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013
2 Outline Introduction to NMF Applications Motivations NMF as a middle step in a semisupervised learning framework Support vector machines Random forests Future directions and Q&A
3 Introduction Consider a matrix X p n s.t. X i,j 0 i, j Nonstandard interpretation for statisticians Rows are features Columns are samples Nonnegative matrix factorization X p n = W p k H k n + E p n W R p k +  Basis Matrix H R k n +  Coefficient Matrix E  Error matrix Advantage: NMF decomposes original matrix into a parts based representation that gives better interpretation of factoring matrices for nonnegative data
4 Better Interpretation: Lee and Seung, 1999
5 Why do we care? Many real world applications  some in health care! 1. Text mining  document clustering, topic detection and trend tracking 2. Image analysis  feature representation, sparse coding, video tracking, image compression, image reconstruction, semisupervised learning 3. Social/Interaction networks  community detection, recommendation systems 4. Bioinformatics  omics data analysis 5. Acoustic Signal Processing  blind source separation 6. Data clustering...to name a few!
6 Community Detection H can be interpreted as indicating community membership Illustrated here using a cell phone network of 177 cell towers in DR matrix of normalized call flows (i.e., ij th element = proportion of calls from i to j.) y Kenneth 70.5 K. Lopiano x
7 Community Detection Clear separation in captial city  West is higher income, east is lower income y x
8 Metagenomics: Brunet et al. Efficient method for identification of distinct molecular patterns and provides a powerful method for class discovery
9 Audio Source Separation: Battenberg and Wessel Matrix  number of positive frequency bins by number of analysis frames CUDA implementation...the newer Geforce GTX 280, with 30 multiprocessors at 1.3GHz, runs the CUDA implementation over 30x faster than the optimized Matlab implementation
10 So it matters  now what? Can I estimate W and H? How is this done? What are the properties of my estimators? A fruitful area of research related to NMF has been related to developing algorithms to answer these questions, however, I am not interested in improving/comparing algorithms. I am interested in using the algorithms in different applications and understanding the unique benefits of NMF.
11 Loss Functions For completeness  a brief review Frobenius norm KL Divergence min W,H i,j min W,H ( Xij X ij log X WH 2 F s.t. W, H 0 (WH) ij s.t. W, H 0 ) X ij + (WH) ij Sparsity constraints on H (similarly defined for sparsity on W) min W,H many more... n X WH 2 F + η W 2 F + β H(:, j) 2 1 s.t. W 0, H 0 j=1
12 Algorithms
13 NMF for Partially Labeled Data NMF is an unsupervised learning algorithm to reduce dimension of original data Question: Suppose some observations are labeled (e.g., diseased versus not diseased). If the weight vectors are used as covariates in a statistical learning framework, then does NMF give any clear advantages over other dimension reduction techniques (e.g., PCA)?
14 SemiSupervised Dimensionality Reduction NMF is an unsupervised learning algorithm to reduce dimension of original data Goal: Incorporate information from labeled examples to estimate the rank of the lower dimensional data. Example: MNIST Data  Comparing 4s and 9s  n = 13782, p = 784 pixels, m = 11791, Y i, i = 1,..., m, is an indicator the i th observations is a 4 or 9. Use NMF to project the observations from 784 dimensions to k = 8, 16, 32, 64, 128, and 256 dimensions (multiplicative updates algorithm) Use support vector machines to train a classifier using The full training data: observations and 784 covariates The reduced training data: observations and k dimensions Predict the class membership of n m = 1991 validation observations
15 SemiSupervised Dimensionality Reduction NMF is an unsupervised learning algorithm to reduce dimension of original data Goal: Incorporate information from labeled examples to estimate the rank of the lower dimensional data. Example: MNIST Data  Comparing 4s and 9s  n = 13782, p = 784 pixels, m = 11791, Y i, i = 1,..., m, is an indicator the i th observations is a 4 or 9. Use NMF to project the observations from 784 dimensions to k = 8, 16, 32, 64, 128, and 256 dimensions (multiplicative updates algorithm) Use support vector machines to train a classifier using The full training data: observations and 784 covariates The reduced training data: observations and k dimensions Predict the class membership of n m = 1991 validation observations
16 SemiSupervised Dimensionality Reduction NMF is an unsupervised learning algorithm to reduce dimension of original data Goal: Incorporate information from labeled examples to estimate the rank of the lower dimensional data. Example: MNIST Data  Comparing 4s and 9s  n = 13782, p = 784 pixels, m = 11791, Y i, i = 1,..., m, is an indicator the i th observations is a 4 or 9. Use NMF to project the observations from 784 dimensions to k = 8, 16, 32, 64, 128, and 256 dimensions (multiplicative updates algorithm) Use support vector machines to train a classifier using The full training data: observations and 784 covariates The reduced training data: observations and k dimensions Predict the class membership of n m = 1991 validation observations
17 SemiSupervised Dimensionality Reduction NMF is an unsupervised learning algorithm to reduce dimension of original data Goal: Incorporate information from labeled examples to estimate the rank of the lower dimensional data. Example: MNIST Data  Comparing 4s and 9s  n = 13782, p = 784 pixels, m = 11791, Y i, i = 1,..., m, is an indicator the i th observations is a 4 or 9. Use NMF to project the observations from 784 dimensions to k = 8, 16, 32, 64, 128, and 256 dimensions (multiplicative updates algorithm) Use support vector machines to train a classifier using The full training data: observations and 784 covariates The reduced training data: observations and k dimensions Predict the class membership of n m = 1991 validation observations
18 SemiSupervised Dimensionality Reduction NMF is an unsupervised learning algorithm to reduce dimension of original data Goal: Incorporate information from labeled examples to estimate the rank of the lower dimensional data. Example: MNIST Data  Comparing 4s and 9s  n = 13782, p = 784 pixels, m = 11791, Y i, i = 1,..., m, is an indicator the i th observations is a 4 or 9. Use NMF to project the observations from 784 dimensions to k = 8, 16, 32, 64, 128, and 256 dimensions (multiplicative updates algorithm) Use support vector machines to train a classifier using The full training data: observations and 784 covariates The reduced training data: observations and k dimensions Predict the class membership of n m = 1991 validation observations
19 SemiSupervised Dimensionality Reduction NMF is an unsupervised learning algorithm to reduce dimension of original data Goal: Incorporate information from labeled examples to estimate the rank of the lower dimensional data. Example: MNIST Data  Comparing 4s and 9s  n = 13782, p = 784 pixels, m = 11791, Y i, i = 1,..., m, is an indicator the i th observations is a 4 or 9. Use NMF to project the observations from 784 dimensions to k = 8, 16, 32, 64, 128, and 256 dimensions (multiplicative updates algorithm) Use support vector machines to train a classifier using The full training data: observations and 784 covariates The reduced training data: observations and k dimensions Predict the class membership of n m = 1991 validation observations
20 SemiSupervised Dimensionality Reduction NMF is an unsupervised learning algorithm to reduce dimension of original data Goal: Incorporate information from labeled examples to estimate the rank of the lower dimensional data. Example: MNIST Data  Comparing 4s and 9s  n = 13782, p = 784 pixels, m = 11791, Y i, i = 1,..., m, is an indicator the i th observations is a 4 or 9. Use NMF to project the observations from 784 dimensions to k = 8, 16, 32, 64, 128, and 256 dimensions (multiplicative updates algorithm) Use support vector machines to train a classifier using The full training data: observations and 784 covariates The reduced training data: observations and k dimensions Predict the class membership of n m = 1991 validation observations
21 Results Prediction Error in Reduced Dimension Prediction Error (%) NMF PC Full Dimension k
22 Idea  Reducing Dimension and Maintaining Meaning NMF gives factors that are more interpretable than those obtained from PCA or SVD. Does this mean that the importance of the variables in the reduced dimension can be interpreted?
23 Random Forests and Variable Importance Random forest  machine learning algorithm used for classification and regression  many decision trees are trained to many samples of the original data and combined to form final classification rules (details omitted here) Variable importance measures can be used to identify which variables are important for the learning task. Gini Impurity  I = 1 2 i=1 f i 2, f i = fraction of items labeled i in the set Gini importance  Every time a split of a node is made on a variable the gini impurity criterion for the two descendent nodes is less than the parent node. Adding up the gini decreases for each individual variable over all trees in the forest gives a fast variable importance... Kenneth breiman/randomforests/cc K. Lopiano home.htm
24 Results k = 32, 64, 128 Random forest using k factors obtained through NMF and first k principal components The 4 most important variables are plotted for both NMF and PCA
25 Examples Example 4 Mean 4r Example 9 Mean 9
26 Results k=32,64,and 128 k=32,64,and 128 k=32,64,and 128 k=32,64,and 128
27 Results k = 32 k=32 k=32 k=32 k=32
28 Results k = 64 k=64 k=64 k=64 k=64
29 Results k = 128 k=128 k=128 k=128 k=128
30 Moving forward NMF as a middle step  classification or prediction as final step More examples  genetics and medical imaging With prediction or classification in mind  minimize mean squared prediction error and cross validation to choose k
31 References Battenberg, E. and Wessel, D. (2009) Accelerating nonnegative matrix factorization for audio source separation on multicore and many core architectures Proceedings of the 10th International Society for Music Information Retrieval Conference (ISMIR 2009) Brunet et al. (2004) Metagenes and molecular pattern discovery using matrix factorization. PNAS. vol 101, no 12. Jiang, X. et al. (2012) A nonnegative matrix factorization framework for identifying modular paterns in metagenomic profile data. Journal of Mathematical Biology. vol 64. pp Kim, J. and Park, H. (2008) Sparse NMF for Clustering. hpark/papers/gtcse pdf Mazack, M. (2009) NonNegative Matrix Factorization with Applications to Handwritten Digit Recognition, Working Paper, University of Minnesota. nmf paper.pdf Wang, F et al. (2010) Community discovery using NMF. Data Mining and Knowledge Discovery
BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM 10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
More informationModelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
More informationMedical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu
Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationIntroduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011
Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning
More informationUnsupervised and supervised dimension reduction: Algorithms and connections
Unsupervised and supervised dimension reduction: Algorithms and connections Jieping Ye Department of Computer Science and Engineering Evolutionary Functional Genomics Center The Biodesign Institute Arizona
More informationExploratory data analysis for microarray data
Eploratory data analysis for microarray data Anja von Heydebreck Ma Planck Institute for Molecular Genetics, Dept. Computational Molecular Biology, Berlin, Germany heydebre@molgen.mpg.de Visualization
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationConcepts in Machine Learning, Unsupervised Learning & Astronomy Applications
Data Mining In Modern Astronomy Sky Surveys: Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications ChingWa Yip cwyip@pha.jhu.edu; Bloomberg 518 Human are Great Pattern Recognizers
More informationClustering and Data Mining in R
Clustering and Data Mining in R Workshop Supplement Thomas Girke December 10, 2011 Introduction Data Preprocessing Data Transformations Distance Methods Cluster Linkage Hierarchical Clustering Approaches
More informationComponent Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
More informationMachine Learning: Overview
Machine Learning: Overview Why Learning? Learning is a core of property of being intelligent. Hence Machine learning is a core subarea of Artificial Intelligence. There is a need for programs to behave
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationFace Recognition using Principle Component Analysis
Face Recognition using Principle Component Analysis Kyungnam Kim Department of Computer Science University of Maryland, College Park MD 20742, USA Summary This is the summary of the basic idea about PCA
More informationbionmf: a webbased tool for nonnegative matrix factorization in biology
Published online 30 May 2008 Nucleic Acids Research, 2008, Vol. 36, Web Server issue W523 W528 doi:10.1093/nar/gkn335 bionmf: a webbased tool for nonnegative matrix factorization in biology E. MejíaRoa
More informationSparse Nonnegative Matrix Factorization for Clustering
Sparse Nonnegative Matrix Factorization for Clustering Jingu Kim and Haesun Park College of Computing Georgia Institute of Technology 266 Ferst Drive, Atlanta, GA 30332, USA {jingu, hpark}@cc.gatech.edu
More informationMethodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph
More informationCS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 RealTime Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
More informationClassspecific Sparse Coding for Learning of Object Representations
Classspecific Sparse Coding for Learning of Object Representations Stephan Hasler, Heiko Wersing, and Edgar Körner Honda Research Institute Europe GmbH CarlLegienStr. 30, 63073 Offenbach am Main, Germany
More informationA Survey on Outlier Detection Techniques for Credit Card Fraud Detection
IOSR Journal of Computer Engineering (IOSRJCE) eissn: 22780661, p ISSN: 22788727Volume 16, Issue 2, Ver. VI (MarApr. 2014), PP 4448 A Survey on Outlier Detection Techniques for Credit Card Fraud
More informationNAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju
NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE Venu Govindaraju BIOMETRICS DOCUMENT ANALYSIS PATTERN RECOGNITION 8/24/2015 ICDAR 2015 2 Towards a Globally Optimal Approach for Learning Deep Unsupervised
More informationFast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
More informationText Analytics (Text Mining)
CSE 6242 / CX 4242 Apr 3, 2014 Text Analytics (Text Mining) LSI (uses SVD), Visualization Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey
More informationRANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING
= + RANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING Stefan Savev Berlin Buzzwords June 2015 KEYWORDBASED SEARCH Document Data 300 unique words per document 300 000 words in vocabulary Data sparsity:
More informationEfficient online learning of a nonnegative sparse autoencoder
and Machine Learning. Bruges (Belgium), 2830 April 2010, dside publi., ISBN 293030102. Efficient online learning of a nonnegative sparse autoencoder Andre Lemme, R. Felix Reinhart and Jochen J. Steil
More informationStatistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP  Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
More informationStatistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees
Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and treebased classification techniques.
More informationSPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING
AAS 07228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations
More informationIntegrated Data Mining Strategy for Effective Metabolomic Data Analysis
The First International Symposium on Optimization and Systems Biology (OSB 07) Beijing, China, August 8 10, 2007 Copyright 2007 ORSC & APORC pp. 45 51 Integrated Data Mining Strategy for Effective Metabolomic
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationUsing multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
More informationMACHINE LEARNING AN INTRODUCTION
AN INTRODUCTION JOSEFIN ROSÉN, SENIOR ANALYTICAL EXPERT, SAS INSTITUTE JOSEFIN.ROSEN@SAS.COM TWITTER: @ROSENJOSEFIN AGENDA What is machine learning? When, where and how is machine learning used? Exemple
More informationDoptimal plans in observational studies
Doptimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
More informationSupervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
More informationProgramming Exercise 3: Multiclass Classification and Neural Networks
Programming Exercise 3: Multiclass Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement onevsall logistic regression and neural networks
More informationSupervised and unsupervised learning  1
Chapter 3 Supervised and unsupervised learning  1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in
More informationAPPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
More informationCS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on
CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d dimensional subspace Axes of this subspace
More informationMehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics
INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree
More informationNetwork Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016
Network Machine Learning Research Group S. Jiang InternetDraft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draftjiangnmlrgnetworkmachinelearning00
More informationC19 Machine Learning
C9 Machine Learning 8 Lectures Hilary Term 25 2 Tutorial Sheets A. Zisserman Overview: Supervised classification perceptron, support vector machine, loss functions, kernels, random forests, neural networks
More informationDistributed forests for MapReducebased machine learning
Distributed forests for MapReducebased machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
More informationEM Clustering Approach for MultiDimensional Analysis of Big Data Set
EM Clustering Approach for MultiDimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
More informationSection for Cognitive Systems DTU Informatics, Technical University of Denmark
Transformation Invariant Sparse Coding Morten Mørup & Mikkel N Schmidt Morten Mørup & Mikkel N. Schmidt Section for Cognitive Systems DTU Informatics, Technical University of Denmark Redundancy Reduction
More informationBig Data Text Mining and Visualization. Anton Heijs
Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationPredicting the Next State of Traffic by Data Mining Classification Techniques
Predicting the Next State of Traffic by Data Mining Classification Techniques S. Mehdi Hashemi Mehrdad Almasi Roozbeh Ebrazi Intelligent Transportation System Research Institute (ITSRI) Amirkabir University
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More informationMultidimensional data analysis
Multidimensional data analysis Ella Bingham Dept of Computer Science, University of Helsinki ella.bingham@cs.helsinki.fi June 2008 The Finnish Graduate School in Astronomy and Space Physics Summer School
More informationTIETS34 Seminar: Data Mining on Biometric identification
TIETS34 Seminar: Data Mining on Biometric identification Youming Zhang Computer Science, School of Information Sciences, 33014 University of Tampere, Finland Youming.Zhang@uta.fi Course Description Content
More informationImage Compression through DCT and Huffman Coding Technique
International Journal of Current Engineering and Technology EISSN 2277 4106, PISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul
More informationFinal Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
More informationCheng Soon Ong & Christfried Webers. Canberra February June 2016
c Cheng Soon Ong & Christfried Webers Research Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 31 c Part I
More informationBUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE
BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE Alex Lin Senior Architect Intelligent Mining alin@intelligentmining.com Outline Predictive modeling methodology knearest Neighbor
More informationMachine Learning for Medical Image Analysis. A. Criminisi & the InnerEye team @ MSRC
Machine Learning for Medical Image Analysis A. Criminisi & the InnerEye team @ MSRC Medical image analysis the goal Automatic, semantic analysis and quantification of what observed in medical scans Brain
More informationObject Recognition and Template Matching
Object Recognition and Template Matching Template Matching A template is a small image (subimage) The goal is to find occurrences of this template in a larger image That is, you want to find matches of
More informationData Mining Applications in Fund Raising
Data Mining Applications in Fund Raising Nafisseh Heiat Data mining tools make it possible to apply mathematical models to the historical data to manipulate and discover new information. In this study,
More informationRecognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28
Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bagofwords Spatial pyramids Neural Networks Object
More informationNew Ensemble Combination Scheme
New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,
More informationCS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen
CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 3: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major
More informationScience Navigation Map: An Interactive Data Mining Tool for Literature Analysis
Science Navigation Map: An Interactive Data Mining Tool for Literature Analysis Yu Liu School of Software yuliu@dlut.edu.cn Zhen Huang School of Software kobe_hz@163.com Yufeng Chen School of Computer
More informationBenchmark Hadoop and Mars: MapReduce on cluster versus on GPU
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview
More informationA Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca ablancogo@upsa.es Spain Manuel MartínMerino Universidad
More informationImproving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation
More informationTRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
More informationStatistical Analysis. NBAFB Metabolomics Masterclass. Mark Viant
Statistical Analysis NBAFB Metabolomics Masterclass Mark Viant 1. Introduction 2. Univariate analysis Overview of lecture 3. Unsupervised multivariate analysis Principal components analysis (PCA) Interpreting
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationCSC321 Introduction to Neural Networks and Machine Learning. Lecture 21 Using Boltzmann machines to initialize backpropagation.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 21 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton Some problems with backpropagation The amount of information
More informationClustering Very Large Data Sets with Principal Direction Divisive Partitioning
Clustering Very Large Data Sets with Principal Direction Divisive Partitioning David Littau 1 and Daniel Boley 2 1 University of Minnesota, Minneapolis MN 55455 littau@cs.umn.edu 2 University of Minnesota,
More informationWavelet analysis. Wavelet requirements. Example signals. Stationary signal 2 Hz + 10 Hz + 20Hz. Zero mean, oscillatory (wave) Fast decay (let)
Wavelet analysis In the case of Fourier series, the orthonormal basis is generated by integral dilation of a single function e jx Every 2πperiodic squareintegrable function is generated by a superposition
More informationLearning, Sparsity and Big Data
Learning, Sparsity and Big Data M. MagdonIsmail (Joint Work) January 22, 2014. OutofSample is What Counts NO YES A pattern exists We don t know it We have data to learn it Tested on new cases? Teaching
More informationTensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
More informationBEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
More informationIntroduction to Machine Learning Using Python. Vikram Kamath
Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression
More informationAdaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering
IV International Congress on Ultra Modern Telecommunications and Control Systems 22 Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering Antti Juvonen, Tuomo
More informationData Mining in Web Search Engine Optimization and User Assisted Rank Results
Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management
More informationClass Overview and General Introduction to Machine Learning
Class Overview and General Introduction to Machine Learning Piyush Rai www.cs.utah.edu/~piyush CS5350/6350: Machine Learning August 23, 2011 (CS5350/6350) Intro to ML August 23, 2011 1 / 25 Course Logistics
More informationEnsemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 20150305
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 20150305 Roman Kern (KTI, TU Graz) Ensemble Methods 20150305 1 / 38 Outline 1 Introduction 2 Classification
More informationMachine learning for algo trading
Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with
More informationMapReduce for Machine Learning on Multicore
MapReduce for Machine Learning on Multicore Chu, et al. Problem The world is going multicore New computers  dual core to 12+core Shift to more concurrent programming paradigms and languages Erlang,
More informationPrincipal Component Analysis
Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded
More informationPrinciple Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate
More informationDiscriminant nonstationary signal features clustering using hard and fuzzy cluster labeling
Ghoraani and Krishnan EURASIP Journal on Advances in Signal Processing 2012, 2012:250 RESEARCH Open Access Discriminant nonstationary signal features clustering using hard and fuzzy cluster labeling Behnaz
More informationIntroduction to Pattern Recognition
Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationBig Data Summarization Using Semantic. Feture for IoT on Cloud
Contemporary Engineering Sciences, Vol. 7, 2014, no. 22, 10951103 HIKARI Ltd, www.mhikari.com http://dx.doi.org/10.12988/ces.2014.49137 Big Data Summarization Using Semantic Feture for IoT on Cloud YooKang
More informationImplementation of the 5/3 Lifting 2D Discrete Wavelet Transform
Implementation of the 5/3 Lifting 2D Discrete Wavelet Transform 1 Jinal Patel, 2 Ketki Pathak 1 Post Graduate Student, 2 Assistant Professor Electronics and Communication Engineering Department, Sarvajanik
More information6. Cholesky factorization
6. Cholesky factorization EE103 (Fall 201112) triangular matrices forward and backward substitution the Cholesky factorization solving Ax = b with A positive definite inverse of a positive definite matrix
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDDLAB ISTI CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More informationDecision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
More informationUNIVERSAL SPEECH MODELS FOR SPEAKER INDEPENDENT SINGLE CHANNEL SOURCE SEPARATION
UNIVERSAL SPEECH MODELS FOR SPEAKER INDEPENDENT SINGLE CHANNEL SOURCE SEPARATION Dennis L. Sun Department of Statistics Stanford University Gautham J. Mysore Adobe Research ABSTRACT Supervised and semisupervised
More informationStandardization and Its Effects on KMeans Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology 6(7): 3993303, 03 ISSN: 0407459; eissn: 0407467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03
More informationMaschinelles Lernen mit MATLAB
Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical
More informationReview Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 03 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
More informationSoft Clustering with Projections: PCA, ICA, and Laplacian
1 Soft Clustering with Projections: PCA, ICA, and Laplacian David Gleich and Leonid Zhukov Abstract In this paper we present a comparison of three projection methods that use the eigenvectors of a matrix
More information01219211 Software Development Training Camp 1 (03) Prerequisite : 01204214 Program development skill enhancement camp, at least 48 personhours.
(International Program) 01219141 ObjectOriented Modeling and Programming 3 (30) Object concepts, objectoriented design and analysis, objectoriented analysis relating to developing conceptual models
More informationUnsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
More informationData Clustering. Dec 2nd, 2013 Kyrylo Bessonov
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms kmeans Hierarchical Main
More information