Spectral Methods for Learning Latent Variable Models: Unsupervised and Supervised Settings
|
|
- Brittany Rice
- 8 years ago
- Views:
Transcription
1 Spectral Methods for Learning Latent Variable Models: Unsupervised and Supervised Settings Anima Anandkumar U.C. Irvine
2 Learning with Big Data
3 Data vs. Information Messy Data Missing observations, gross corruptions, outliers. High dimensional regime: as data grows, more variables! Useful information: low-dimensional structures. Learning with big data: ill-posed problem.
4 Data vs. Information Messy Data Missing observations, gross corruptions, outliers. High dimensional regime: as data grows, more variables! Useful information: low-dimensional structures. Learning with big data: ill-posed problem. Learning is finding needle in a haystack
5 Data vs. Information Messy Data Missing observations, gross corruptions, outliers. High dimensional regime: as data grows, more variables! Useful information: low-dimensional structures. Learning with big data: ill-posed problem. Learning is finding needle in a haystack Learning with big data: computationally challenging! Principled approaches for finding low dimensional structures?
6 How to model information structures? Latent variable models Incorporate hidden or latent variables. Information structures: Relationships between latent variables and observed data.
7 How to model information structures? Latent variable models Incorporate hidden or latent variables. Information structures: Relationships between latent variables and observed data. Basic Approach: mixtures/clusters Hidden variable is categorical.
8 How to model information structures? Latent variable models Incorporate hidden or latent variables. Information structures: Relationships between latent variables and observed data. Basic Approach: mixtures/clusters Hidden variable is categorical. Advanced: Probabilistic models Hidden variables have more general distributions. Can model mixed membership/hierarchical groups. h 1 h 2 h 3 x 1 x 2 x 3 x 4 x 5
9 Latent Variable Models (LVMs) Document modeling Observed: words. Hidden: topics. Social Network Modeling Observed: social interactions. Hidden: communities, relationships. Recommendation Systems Observed: recommendations (e.g., reviews). Hidden: User and business attributes Unsupervised Learning: Learn LVM without labeled examples.
10 LVM for Feature Engineering Learn good features/representations for classification tasks, e.g., computer vision and NLP. Sparse Coding/Dictionary Learning Sparse representations, low dimensional hidden structures. A few dictionary elements make complicated shapes.
11 Associative Latent Variable Models Supervised Learning Given labeled examples {(x i,y i )}, learn a classifier ŷ = f(x).
12 Associative Latent Variable Models Supervised Learning Given labeled examples {(x i,y i )}, learn a classifier ŷ = f(x). Associative/conditional models: p(y x). Example: Logistic regression: E[y x] = σ( u, x ).
13 Associative Latent Variable Models Supervised Learning Given labeled examples {(x i,y i )}, learn a classifier ŷ = f(x). Associative/conditional models: p(y x). Example: Logistic regression: E[y x] = σ( u, x ). Mixture of Logistic Regressions E[y x,h] = g( Uh,x + b,h )
14 Associative Latent Variable Models Supervised Learning Given labeled examples {(x i,y i )}, learn a classifier ŷ = f(x). Associative/conditional models: p(y x). Example: Logistic regression: E[y x] = σ( u, x ). Mixture of Logistic Regressions E[y x,h] = g( Uh,x + b,h ) Multi-layer/Deep Network E[y x] = σ d (A d σ d 1 (A d 1 σ d 2 ( A 2 σ 1 (A 1 x))))
15 Challenges in Learning LVMs Computational Challenges Maximum likelihood is NP-hard in most scenarios. Practice: Local search approaches such as Back-propagation, EM, Variational Bayes have no consistency guarantees. Sample Complexity Sample complexity is exponential (w.r.t hidden variable dimension) for many learning methods. Guaranteed and efficient learning through spectral methods
16 Outline 1 Introduction 2 Spectral Methods Classical Matrix Methods Beyond Matrices: Tensors 3 Moment Tensors for Latent Variable Models Topic Models Network Community Models Experimental Results 4 Moment Tensors in Supervised Setting 5 Conclusion
17 Outline 1 Introduction 2 Spectral Methods Classical Matrix Methods Beyond Matrices: Tensors 3 Moment Tensors for Latent Variable Models Topic Models Network Community Models Experimental Results 4 Moment Tensors in Supervised Setting 5 Conclusion
18 Classical Spectral Methods: Matrix PCA and CCA Unsupervised Setting: PCA For centered samples {x i }, find projection P with Rank(P) = k s.t. min P 1 x i Px i 2. n i [n] Result: Eigen-decomposition of S = Cov(X). Supervised Setting: CCA For centered samples {x i,y i }, find max a,b a Ê[xy ]b. a Ê[xx ]a b Ê[yy ]b Result: Generalized eigen decomposition. x a,x b,y y
19 Shortcomings of Matrix Methods Learning through Spectral Clustering Dimension reduction through PCA (on data matrix) Clustering on projected vectors (e.g. k-means).
20 Shortcomings of Matrix Methods Learning through Spectral Clustering Dimension reduction through PCA (on data matrix) Clustering on projected vectors (e.g. k-means). Basic method works only for single memberships. Failure to cluster under small separation.
21 Shortcomings of Matrix Methods Learning through Spectral Clustering Dimension reduction through PCA (on data matrix) Clustering on projected vectors (e.g. k-means). Basic method works only for single memberships. Failure to cluster under small separation. Efficient Learning Without Separation Constraints?
22 Outline 1 Introduction 2 Spectral Methods Classical Matrix Methods Beyond Matrices: Tensors 3 Moment Tensors for Latent Variable Models Topic Models Network Community Models Experimental Results 4 Moment Tensors in Supervised Setting 5 Conclusion
23 Beyond SVD: Spectral Methods on Tensors How to learn the mixture models without separation constraints? PCA uses covariance matrix of data. Are higher order moments helpful? Unified framework? Moment-based estimation of probabilistic latent variable models? SVD gives spectral decomposition of matrices. What are the analogues for tensors?
24 Moment Matrices and Tensors Multivariate Moments in Unsupervised Setting M 1 := E[x], M 2 := E[x x], M 3 := E[x x x]. Matrix E[x x] R d d is a second order tensor. E[x x] i1,i 2 = E[x i1 x i2 ]. For matrices: E[x x] = E[xx ]. Tensor E[x x x] R d d d is a third order tensor. E[x x x] i1,i 2,i 3 = E[x i1 x i2 x i3 ].
25 Moment Matrices and Tensors Multivariate Moments in Unsupervised Setting M 1 := E[x], M 2 := E[x x], M 3 := E[x x x]. Matrix E[x x] R d d is a second order tensor. E[x x] i1,i 2 = E[x i1 x i2 ]. For matrices: E[x x] = E[xx ]. Tensor E[x x x] R d d d is a third order tensor. E[x x x] i1,i 2,i 3 = E[x i1 x i2 x i3 ]. Multivariate Moments in Supervised Setting M 1 := E[x],E[y], M 2 := E[x y], M 3 := E[x x y].
26 Spectral Decomposition of Tensors M 2 = i λ i u i v i = +... Matrix M 2 λ 1 u 1 v 1 λ 2 u 2 v 2
27 Spectral Decomposition of Tensors M 2 = i λ i u i v i = +... Matrix M 2 λ 1 u 1 v 1 λ 2 u 2 v 2 M 3 = i λ i u i v i w i = +... Tensor M 3 λ 1 u 1 v 1 w 1 λ 2 u 2 v 2 w 2 u v w is a rank-1 tensor since its (i 1,i 2,i 3 ) th entry is u i1 v i2 w i3. How to solve this non-convex problem?
28 Decomposition of Orthogonal Tensors M 3 = i w i a i a i a i. Suppose A has orthogonal columns.
29 Decomposition of Orthogonal Tensors M 3 = i w i a i a i a i. Suppose A has orthogonal columns. M 3 (I,a 1,a 1 ) = i w i a i,a 1 2 a i = w 1 a 1.
30 Decomposition of Orthogonal Tensors M 3 = i w i a i a i a i. Suppose A has orthogonal columns. M 3 (I,a 1,a 1 ) = i w i a i,a 1 2 a i = w 1 a 1. a i are eigenvectors of tensor M 3. Analogous to matrix eigenvectors: Mv = M(I,v) = λv.
31 Decomposition of Orthogonal Tensors M 3 = i w i a i a i a i. Suppose A has orthogonal columns. M 3 (I,a 1,a 1 ) = i w i a i,a 1 2 a i = w 1 a 1. a i are eigenvectors of tensor M 3. Analogous to matrix eigenvectors: Mv = M(I,v) = λv. Two Problems How to find eigenvectors of a tensor? A is not orthogonal in general.
32 Orthogonal Tensor Power Method Symmetric orthogonal tensor T R d d d : T = i [k]λ i v i v i v i.
33 Orthogonal Tensor Power Method Symmetric orthogonal tensor T R d d d : T = i [k]λ i v i v i v i. Recall matrix power method: v M(I,v) M(I,v).
34 Orthogonal Tensor Power Method Symmetric orthogonal tensor T R d d d : T = i [k]λ i v i v i v i. Recall matrix power method: v M(I,v) M(I,v). Algorithm: tensor power method: v T(I,v,v) T(I,v,v).
35 Orthogonal Tensor Power Method Symmetric orthogonal tensor T R d d d : T = i [k]λ i v i v i v i. Recall matrix power method: v M(I,v) M(I,v). Algorithm: tensor power method: v T(I,v,v) T(I,v,v). How do we avoid spurious solutions (not part of decomposition)? {v i} s are the only robust fixed points.
36 Orthogonal Tensor Power Method Symmetric orthogonal tensor T R d d d : T = i [k]λ i v i v i v i. Recall matrix power method: v M(I,v) M(I,v). Algorithm: tensor power method: v T(I,v,v) T(I,v,v). How do we avoid spurious solutions (not part of decomposition)? {v i} s are the only robust fixed points. All other eigenvectors are saddle points.
37 Orthogonal Tensor Power Method Symmetric orthogonal tensor T R d d d : T = i [k]λ i v i v i v i. Recall matrix power method: v M(I,v) M(I,v). Algorithm: tensor power method: v T(I,v,v) T(I,v,v). How do we avoid spurious solutions (not part of decomposition)? {v i} s are the only robust fixed points. All other eigenvectors are saddle points. For an orthogonal tensor, no spurious local optima!
38 Whitening: Conversion to Orthogonal Tensor M 3 = i w i a i a i a i, M 2 = i w i a i a i. Find whitening matrix W s.t. W A = V is an orthogonal matrix. When A R d k has full column rank, it is an invertible transformation. a 1 a 2 a 3 W v 3 v 1 v 2 Use pairwise moments M 2 to find W. SVD of M 2 is needed.
39 Putting it together Non-orthogonal tensor M 3 = i w ia i a i a i, M 2 = i w ia i a i. Whitening matrix W: Multilinear transform: T = M 3 (W,W,W) a 1a2a3 W v 3 v 1 v 2 Tensor M 3 Tensor T
40 Putting it together Non-orthogonal tensor M 3 = i w ia i a i a i, M 2 = i w ia i a i. Whitening matrix W: Multilinear transform: T = M 3 (W,W,W) a 1a2a3 W v 3 v 1 v 2 Tensor M 3 Tensor T Tensor Decomposition: Guaranteed Non-Convex Optimization!
41 Putting it together Non-orthogonal tensor M 3 = i w ia i a i a i, M 2 = i w ia i a i. Whitening matrix W: Multilinear transform: T = M 3 (W,W,W) a 1a2a3 W v 3 v 1 v 2 Tensor M 3 Tensor T Tensor Decomposition: Guaranteed Non-Convex Optimization! For what latent variable models can we obtain M 2 and M 3 forms?
42 Outline 1 Introduction 2 Spectral Methods Classical Matrix Methods Beyond Matrices: Tensors 3 Moment Tensors for Latent Variable Models Topic Models Network Community Models Experimental Results 4 Moment Tensors in Supervised Setting 5 Conclusion
43 Types of Latent Variable Models What is the form of hidden variables h? Basic Approach: mixtures/clusters Hidden variable h is categorical. Advanced: Probabilistic models Hidden variable h has more general distributions. Can model mixed memberships, e.g. Dirichlet distribution. h 1 h 2 h 3 x 1 x 2 x 3 x 4 x 5
44 Outline 1 Introduction 2 Spectral Methods Classical Matrix Methods Beyond Matrices: Tensors 3 Moment Tensors for Latent Variable Models Topic Models Network Community Models Experimental Results 4 Moment Tensors in Supervised Setting 5 Conclusion
45 Topic Modeling
46 Geometric Picture for Topic Models Topic proportions vector (h) Document
47 Geometric Picture for Topic Models Single topic (h)
48 Geometric Picture for Topic Models Single topic (h) A A A x 2 x 1 x 3 Word generation (x 1,x 2,...)
49 Geometric Picture for Topic Models Single topic (h) A A A x 2 x 1 Linear model: E[x i h] = Ah. x 3 Word generation (x 1,x 2,...)
50 Moments for Single Topic Models E[x i h] = Ah. w := E[h]. Learn topic-word matrix A, vector w h A A A A A x 1 x 2 x 3 x 4 x 5
51 Moments for Single Topic Models E[x i h] = Ah. w := E[h]. Learn topic-word matrix A, vector w h A A A A A x 1 x 2 x 3 x 4 x 5 Pairwise Co-occurence Matrix M x M 2 := E[x 1 x 2 ] = E[E[x 1 x 2 h]] = k w i a i a i i=1 Triples Tensor M 3 M 3 := E[x 1 x 2 x 3 ] = E[E[x 1 x 2 x 3 h]] = k w i a i a i a i i=1
52 Moments under LDA M 2 := E[x 1 x 2 ] α 0 α 0 +1 E[x 1] E[x 1 ] M 3 := E[x 1 x 2 x 3 ] α 0 α 0 +2 E[x 1 x 2 E[x 1 ]] more stuff... Then M 2 = w i a i a i M 3 = w i a i a i a i. Three words per document suffice for learning LDA. Similar forms for HMM, ICA, sparse coding etc. Tensor Decompositions for Learning Latent Variable Models by A. Anandkumar, R. Ge, D. Hsu, S.M. Kakade and M. Telgarsky. JMLR 2014.
53 Outline 1 Introduction 2 Spectral Methods Classical Matrix Methods Beyond Matrices: Tensors 3 Moment Tensors for Latent Variable Models Topic Models Network Community Models Experimental Results 4 Moment Tensors in Supervised Setting 5 Conclusion
54 Network Community Models
55 Network Community Models
56 Network Community Models
57 Network Community Models
58 Network Community Models
59 Network Community Models
60 Subgraph Counts as Graph Moments A Tensor Spectral Approach to Learning Mixed Membership Community Models by A. Anandkumar, R. Ge, D. Hsu, and S.M. Kakade. COLT 2013.
61 Subgraph Counts as Graph Moments A Tensor Spectral Approach to Learning Mixed Membership Community Models by A. Anandkumar, R. Ge, D. Hsu, and S.M. Kakade. COLT 2013.
62 Subgraph Counts as Graph Moments 3-Star Count Tensor M 3 (a,b,c) = 1 # of common neighbors in X X = 1 G(x, a)g(x, b)g(x, c). X M 3 = 1 X x X [G x,a G x,b G x,c] x X X x A B C a b c A Tensor Spectral Approach to Learning Mixed Membership Community Models by A. Anandkumar, R. Ge, D. Hsu, and S.M. Kakade. COLT 2013.
63 Outline 1 Introduction 2 Spectral Methods Classical Matrix Methods Beyond Matrices: Tensors 3 Moment Tensors for Latent Variable Models Topic Models Network Community Models Experimental Results 4 Moment Tensors in Supervised Setting 5 Conclusion
64 Computational Complexity (k n) n = # of nodes N = # of iterations k = # of communities. c = # of cores. Whiten STGD Unwhiten Space O(nk) O(k 2 ) O(nk) Time O(nsk/c+k 3 ) O(Nk 3 /c) O(nsk/c) Whiten: matrix/vector products and SVD. STGD: Stochastic Tensor Gradient Descent Unwhiten: matrix/vector products Our approach: O( nsk c +k 3 ) Embarrassingly Parallel and fast!
65 Tensor Decomposition on GPUs Running time(secs) Number of communities k MATLAB Tensor Toolbox(CPU) CULA Standard Interface(GPU) CULA Device Interface(GPU) Eigen Sparse(CPU)
66 Summary of Results Users Friend Business User Reviews Author Coauthor Facebook n 20k Error (E) and Recovery ratio (R) Yelp n 40k DBLP(sub) n 1 million( 100k) Dataset ˆk Method Running Time E R Facebook(k=360) 500 ours % Facebook(k=360) 500 variational 86, %. Yelp(k=159) 100 ours % Yelp(k=159) 100 variational N.A.. DBLP sub(k=250) 500 ours 10, % DBLP sub(k=250) 500 variational 558, % DBLP(k=6000) 100 ours % Thanks to Prem Gopalan and David Mimno for providing variational code.
67 Experimental Results on Yelp Lowest error business categories & largest weight businesses Rank Category Business Stars Review Counts 1 Latin American Salvadoreno Restaurant Gluten Free P.F. Chang s China Bistro Hobby Shops Make Meaning Mass Media KJZZ 91.5FM Yoga Sutra Midtown
68 Experimental Results on Yelp Lowest error business categories & largest weight businesses Rank Category Business Stars Review Counts 1 Latin American Salvadoreno Restaurant Gluten Free P.F. Chang s China Bistro Hobby Shops Make Meaning Mass Media KJZZ 91.5FM Yoga Sutra Midtown Bridgeness: Distance from vector [1/ˆk,...,1/ˆk] Top-5 bridging nodes (businesses) Business Four Peaks Brewing Pizzeria Bianco FEZ Matt s Big Breakfast Cornish Pasty Co Categories Restaurants, Bars, American, Nightlife, Food, Pubs, Tempe Restaurants, Pizza, Phoenix Restaurants, Bars, American, Nightlife, Mediterranean, Lounges, Phoenix Restaurants, Phoenix, Breakfast& Brunch Restaurants, Bars, Nightlife, Pubs, Tempe
69 Outline 1 Introduction 2 Spectral Methods Classical Matrix Methods Beyond Matrices: Tensors 3 Moment Tensors for Latent Variable Models Topic Models Network Community Models Experimental Results 4 Moment Tensors in Supervised Setting 5 Conclusion
70 Moment Tensors for Associative Models Multivariate Moments: Many possibilities... E[x y],e[x x y],e[ψ(x) y]... Feature Transformations of the Input: x ψ(x) How to exploit them? Are moments E[ψ(x) y] useful? If ψ(x) is a matrix/tensor, we have matrix/tensor moments. Can carry out spectral decomposition of the moments.
71 Score Function Features Higher order score function: S m (x) := ( 1) m (m) p(x) p(x) Can be a matrix or a tensor instead of a vector. Derivative w.r.t parameter or input Form the cross-moments: E[y S m (x)]. [ ] Extension of Stein s lemma: E[y S m (x)] = E (m) G(x) when E[y x] := G(x) Spectral decomposition: [ ] E (m) G(x) = u m j j [k] Can be applied for learning of associative latent variable models.
72 Learning Deep Neural Networks Realizable Setting E[y x] = σ d (A d σ d 1 (A d 1 σ d 2 ( A 2 σ 1 (A 1 x)))) M 3 = E[y S 3 (x)] = i [r] λ i u 3 i where u i = e i A 1 are rows of A 1. Guaranteed learning of weights (layer-by-layer) via tensor decomposition. Similar guarantees for learning mixture of classifiers
73 Automated Extraction of Discriminative Features
74 Outline 1 Introduction 2 Spectral Methods Classical Matrix Methods Beyond Matrices: Tensors 3 Moment Tensors for Latent Variable Models Topic Models Network Community Models Experimental Results 4 Moment Tensors in Supervised Setting 5 Conclusion
75 Conclusion: Guaranteed Non-Convex Optimization Tensor Decomposition Efficient sample and computational complexities Better performance compared to EM, Variational Bayes etc. In practice Scalable and embarrassingly parallel: handle large datasets. Efficient performance: perplexity or ground truth validation. Related Topics Overcomplete Tensor Decomposition: Neural networks, sparse coding and ICA models tend to be overcomplete (more neurons than input dimensions). Provable Non-Convex Iterative Methods: Robust PCA, Dictionary learning etc.
76 My Research Group and Resources Furong Huang Majid Janzamin Hanie Sedghi Niranjan UN Forough Arabshahi ML summer school lectures available at
Tackling Big Data with Tensor Methods
Tackling Big Data with Tensor Methods Anima Anandkumar U.C. Irvine Learning with Big Data Data vs. Information Data vs. Information Data vs. Information Missing observations, gross corruptions, outliers.
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationLearning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu
Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of
More informationNimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff
Nimble Algorithms for Cloud Computing Ravi Kannan, Santosh Vempala and David Woodruff Cloud computing Data is distributed arbitrarily on many servers Parallel algorithms: time Streaming algorithms: sublinear
More informationAPPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
More informationMachine Learning for Data Science (CS4786) Lecture 1
Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 10:10 to 11:25 AM Hollister B14 Instructors : Lillian Lee and Karthik Sridharan ROUGH DETAILS ABOUT THE COURSE Diagnostic assignment 0 is out:
More informationTensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationMachine Learning with MATLAB David Willingham Application Engineer
Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationMachine Learning Logistic Regression
Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationLecture 9: Introduction to Pattern Analysis
Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationFacebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
More informationIntroduction: Overview of Kernel Methods
Introduction: Overview of Kernel Methods Statistical Data Analysis with Positive Definite Kernels Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department of Statistical Science, Graduate University
More informationMachine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.
Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,
More informationPrincipal Component Analysis
Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded
More informationNon-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationStatistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
More informationAn Effective Way to Ensemble the Clusters
An Effective Way to Ensemble the Clusters R.Saranya 1, Vincila.A 2, Anila Glory.H 3 P.G Student, Department of Computer Science Engineering, Parisutham Institute of Technology and Science, Thanjavur, Tamilnadu,
More informationCS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on
CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace
More informationReview Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
More informationStatistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
More informationAdvanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
More informationHow To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationChapter 6. Orthogonality
6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be
More informationMS1b Statistical Data Mining
MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to
More informationComparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationBIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
More informationHT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationAccurate and robust image superresolution by neural processing of local image representations
Accurate and robust image superresolution by neural processing of local image representations Carlos Miravet 1,2 and Francisco B. Rodríguez 1 1 Grupo de Neurocomputación Biológica (GNB), Escuela Politécnica
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More information5. Orthogonal matrices
L Vandenberghe EE133A (Spring 2016) 5 Orthogonal matrices matrices with orthonormal columns orthogonal matrices tall matrices with orthonormal columns complex matrices with orthonormal columns 5-1 Orthonormal
More informationLABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
More informationNeural Network Add-in
Neural Network Add-in Version 1.5 Software User s Guide Contents Overview... 2 Getting Started... 2 Working with Datasets... 2 Open a Dataset... 3 Save a Dataset... 3 Data Pre-processing... 3 Lagging...
More informationPrinciple Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate
More informationCourse: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
More informationSpark and the Big Data Library
Spark and the Big Data Library Reza Zadeh Thanks to Matei Zaharia Problem Data growing faster than processing speeds Only solution is to parallelize on large clusters» Wide use in both enterprises and
More informationData Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
More informationChapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
More informationSampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data
Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data (Oxford) in collaboration with: Minjie Xu, Jun Zhu, Bo Zhang (Tsinghua) Balaji Lakshminarayanan (Gatsby) Bayesian
More informationLarge-scale Data Mining: MapReduce and Beyond Part 2: Algorithms. Spiros Papadimitriou, IBM Research Jimeng Sun, IBM Research Rong Yan, Facebook
Large-scale Data Mining: MapReduce and Beyond Part 2: Algorithms Spiros Papadimitriou, IBM Research Jimeng Sun, IBM Research Rong Yan, Facebook Part 2:Mining using MapReduce Mining algorithms using MapReduce
More informationLinear Algebra Review. Vectors
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka kosecka@cs.gmu.edu http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length
More informationBEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
More informationNeural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation
Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed A brief history of backpropagation
More informationMACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
More informationCS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
More informationDimension Reduction. Wei-Ta Chu 2014/10/22. Multimedia Content Analysis, CSIE, CCU
1 Dimension Reduction Wei-Ta Chu 2014/10/22 2 1.1 Principal Component Analysis (PCA) Widely used in dimensionality reduction, lossy data compression, feature extraction, and data visualization Also known
More informationUnsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
More informationlarge-scale machine learning revisited Léon Bottou Microsoft Research (NYC)
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven
More informationMachine Learning in Computer Vision A Tutorial. Ajay Joshi, Anoop Cherian and Ravishankar Shivalingam Dept. of Computer Science, UMN
Machine Learning in Computer Vision A Tutorial Ajay Joshi, Anoop Cherian and Ravishankar Shivalingam Dept. of Computer Science, UMN Outline Introduction Supervised Learning Unsupervised Learning Semi-Supervised
More informationScalable Machine Learning - or what to do with all that Big Data infrastructure
- or what to do with all that Big Data infrastructure TU Berlin blog.mikiobraun.de Strata+Hadoop World London, 2015 1 Complex Data Analysis at Scale Click-through prediction Personalized Spam Detection
More informationEM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
More informationPrinciples of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
More informationCommon factor analysis
Common factor analysis This is what people generally mean when they say "factor analysis" This family of techniques uses an estimate of common variance among the original variables to generate the factor
More informationRandomized Robust Linear Regression for big data applications
Randomized Robust Linear Regression for big data applications Yannis Kopsinis 1 Dept. of Informatics & Telecommunications, UoA Thursday, Apr 16, 2015 In collaboration with S. Chouvardas, Harris Georgiou,
More informationBig learning: challenges and opportunities
Big learning: challenges and opportunities Francis Bach SIERRA Project-team, INRIA - Ecole Normale Supérieure December 2013 Omnipresent digital media Scientific context Big data Multimedia, sensors, indicators,
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationSoft Clustering with Projections: PCA, ICA, and Laplacian
1 Soft Clustering with Projections: PCA, ICA, and Laplacian David Gleich and Leonid Zhukov Abstract In this paper we present a comparison of three projection methods that use the eigenvectors of a matrix
More informationEnvironmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
More informationThe Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
More informationRank one SVD: un algorithm pour la visualisation d une matrice non négative
Rank one SVD: un algorithm pour la visualisation d une matrice non négative L. Labiod and M. Nadif LIPADE - Universite ParisDescartes, France ECAIS 2013 November 7, 2013 Outline Outline 1 Data visualization
More informationDistance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center
Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II
More informationCollaborative Filtering. Radek Pelánek
Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains
More informationMachine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu
Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel
More informationISOMETRIES OF R n KEITH CONRAD
ISOMETRIES OF R n KEITH CONRAD 1. Introduction An isometry of R n is a function h: R n R n that preserves the distance between vectors: h(v) h(w) = v w for all v and w in R n, where (x 1,..., x n ) = x
More informationCCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York
BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationExploratory Data Analysis with MATLAB
Computer Science and Data Analysis Series Exploratory Data Analysis with MATLAB Second Edition Wendy L Martinez Angel R. Martinez Jeffrey L. Solka ( r ec) CRC Press VV J Taylor & Francis Group Boca Raton
More informationHow To Understand Multivariate Models
Neil H. Timm Applied Multivariate Analysis With 42 Figures Springer Contents Preface Acknowledgments List of Tables List of Figures vii ix xix xxiii 1 Introduction 1 1.1 Overview 1 1.2 Multivariate Models
More informationNonlinear Iterative Partial Least Squares Method
Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for
More informationMehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics
INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree
More informationPerformance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations
Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams, 1990 Presented by Chris Eldred Outline Summary Finite Element Solver Load Balancing Results Types Conclusions
More informationPart 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
More informationCS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationManifold Learning Examples PCA, LLE and ISOMAP
Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationFactor analysis. Angela Montanari
Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number
More informationMedical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu
Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
More informationMLlib: Scalable Machine Learning on Spark
MLlib: Scalable Machine Learning on Spark Xiangrui Meng Collaborators: Ameet Talwalkar, Evan Sparks, Virginia Smith, Xinghao Pan, Shivaram Venkataraman, Matei Zaharia, Rean Griffith, John Duchi, Joseph
More informationJournée Thématique Big Data 13/03/2015
Journée Thématique Big Data 13/03/2015 1 Agenda About Flaminem What Do We Want To Predict? What Is The Machine Learning Theory Behind It? How Does It Work In Practice? What Is Happening When Data Gets
More informationData Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
More informationStatistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant
Statistical Analysis NBAF-B Metabolomics Masterclass Mark Viant 1. Introduction 2. Univariate analysis Overview of lecture 3. Unsupervised multivariate analysis Principal components analysis (PCA) Interpreting
More informationChapter 7. Cluster Analysis
Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. Density-Based Methods 6. Grid-Based Methods 7. Model-Based
More informationPartial Least Squares (PLS) Regression.
Partial Least Squares (PLS) Regression. Hervé Abdi 1 The University of Texas at Dallas Introduction Pls regression is a recent technique that generalizes and combines features from principal component
More information10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
More information