Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu


 Violet Crawford
 3 years ago
 Views:
Transcription
1 Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July
2 Data A lot of data
3 Outline Nonlinear tensor models (and stochastic blockmodels) Sparse Gaussian process models
4 Tensor data: multiple aspects Patients Biomarkers Medicines Yi,j,k: the value of the kth biomarker (i.e., cell population) for the jth patient after taking the ith medicine Predict drug response
5 Matrix data: networks people people Yi,j: 1 if node i is linked to node j, 0 otherwise. Discover communities and predict unknown interactions
6 Goals  Predict unknown elements (e.g., drug response and network interactions)  Identify latent multiaspect groups (communities)
7 Classical Tucker decomposition Generalization of matrix factorization 3D case: core tensor loading matrices U Y Z! Y = G 1 U 2 Z 3 V y ijk = g rst u ir z js v ks r s t Sun et al. 2008
8 Assumptions Complete Continuous Multilinear
9 Solution Sparse latent Gaussian processes on tensors
10 Latent sparse GP on tensors Patients Biomarkers (i, j, k) u i : Medicines Element u i : z j : v k : (i, j, k) is characterized by Sparse loading vector in latent medicine groups Sparse loading vector in latent patient groups Sparse loading vector in latent biomarker groups
11 Separate covariance for each dimension K v Biomarkers Separate covariance/kernel function for each dimension Kz Patients i r K u (i, r) =k(u i, u r ) Medicines Nonlinear relationship between medicines i and r The more similar loading vectors, the larger the covariance function value
12 GP on tensors GP on a tensor: stochastic process in an infinite tensor space Tensor N (F 0; K u, K v, K z )=(2π) 3N 2 s=u,v,z K s N2 2 Evaluations of GP on any tensor of finite size is a tensorvalued Gaussian distribution exp{ 1 2 (F 1 K 1 u 2 K 1 v 3 K 1 z ) F 2 }
13 Predict unknown tensor elements Kv Patients K v Biomarkers (i, j, k) Medicines K u w (r, s, t) Simple illustration: 1) Based on observed data, estimate loading vectors: [u i, z j, v k ] 2) Compute weights (similarities) between unknown and observed elements: w(ijk, rst) w([u i, z j, v k ], [u r, z s, v t ]) 3) Predict the unknown element: y i,j,k = r,s,t w(ijk, rst)y r,s,t
14 Graphical model representation U =[u 1,...,u N ] U V Z Sparse loading vectors u i exp( λ u i ) Similarly, sample V and Z F Latent tensor F N(0; K u, K v, K z ) Y U Unknown data Y O Observed data Y O p(y ijk f ijk ) (i,j,k) O p(y ijk f ijk ) : Gaussian for continuous data Probit for binary data Possion for count data
15 Benefits Handle binary and missing data Discover block/group structures Avoid overfitting: adaptive nonparametric model complexity Model prediction uncertainty Incorporate additional side information Yan, Xu & Qi, 2011; Xu, Yan & Qi 2011
16 Algorithm: Variational EM Marginal likelihood log p(y O U, V, Z) + log p(u, V, Z) Variational approximation Iterations
17 Algorithm: explore model structures Example: Trace{(I + K u K v K z ) 1 } Direct computation: Matrix inversion N 3 by N 3 O(N 9 ) Kronecker product operation:
18 Properties of Kronecker product Properties: Eigendecomposition K = WΛW T If K = K u K v K z then W = W u W v W z Λ = Λ u Λ v Λ z W u : eigenvectors of K u diag{λ u } : eigenvalues of K u
19 Reduced computational complexity Example: Trace{(I + K u K v K z ) 1 } Direct computation Using the new theorem and trace properties O(N 9 ) N N N i=1 j=1 k=1 1 1+λ u i λv j λz k O(N 3 ) = M M M i=1 j=1 k=1 1 1+λ u i λv j λz k O(N 2 M) M<<N
20 2D case: GP stochastic blockmodels  Undirected networks (friend relationships and proteinprotein interactions)  Represented by symmetric adjacent matrices Yan, Xu & Qi, UAI 2011; Xu, Yan & Qi AAAI 2011
21 2D: Coauthor networks Membership produced by LEM Area Under Curve AUC values SMGB MMSB LEM Ours Number of latent groups Groups Groups Groups Nodes Membership produced by MMSB Nodes Membership produced by SMGB Nodes NIPS authors Ours Coauthorship dataset: coauthorship links from100 authors who have the largest number of coauthors from NIPS 117.
22 3D: Enron s Enron dataset: s from senior management of Enron before its bankruptcy in D tensor representation: SenderRecipientSubject Area Under Curve AUC values InfTucker tp InfTucker gp CP TD HOSVD NCP WCP PTD Ours Number of Factors Xu, Yan & Qi ICML 2012
23 4D: Digg Area Under Curve AUC values Ours Number of Factors Digg dataset: Social news from digg.com D tensor representation: usernewskeywordscategory AUC values InfTucker tp InfTucker gp CP TD HOSVD NCP WCP PTD 3 5
24 Outline Bayesian nonparametric stochastic blockmodels Sparse Gaussian process models
25 Gaussian process Nonparametric Bayesian prior over functions Computational bottleneck: O(N 3 ) for regression
26 Sparse GP models Lowrank approximation by Nystrom approximation (Williams & Seeger 2001) Summarize data by a few pseudo inputs (Snelson & Ghahramani 2006) Summarize data by a few pseudo data clouds (Qi et al. 2010) A unifying view (QuiñoneroCandela & Rasmussen 2005)
27 Summarization by data clouds (Qi et al. UAI 2010) Exact posterior process: Data cloud approximation: M<<N Local manifold information q(f) GP (f 0,K) M j=1 p(u j f(x)φ(x)dx,λ 1 j ) When φ(x) =δ(x), this approximation reduces to FITC (i.e,. pseudo input approximation).
28 X 2 0 X X 1 Exact GP X 1 Pseudo input (i.e., FITC) X 2 0 X X 1 X 1 SASPA_sphere SASPA (Qi et al. UAI 2010)
29 Key observation Previous approaches: compress data into a sparse representation including pseudo data points or clouds PCA: the optimal compact representation among all orthogonal bases
30 EigenGP: sparse PCA + GP KL expansion of GP prior by Nystrom method: O(N 2 ) > O(M 2 N) Select eigenfunctions by evidence maximization Qi, Dai & Zhu NIPS submission 2012
31 Eigenfunctions of Gaussian kernel Top four eigenfunctions Selected eigenfunctions
32 RMSE RMSE FULL GP FITC NYSTROM EIGEN GP EIGEN GP* Number of Basis/Rank Number of Basis/Rank Boston Housing (400/506 for training) RMSE of Nystrom: 312.5, 41.68, and Pumadyn8nm (2000/8192 for training)
33 Classification results on digits 0.02 EIGEN GP* EIGEN GP Classification Errror Number of Basis/Rank EigenGP*: fix the eigenvalues EigenGP: sparsify eigenvalues
34 Classification results Classification Error Rate FITC EP FULL GP EP NYSTROM LAPLACE SOGP EIGEN GP Classification Error Rate FITC EP FULL GP EP NYSTROM LAPLACE SOGP EIGEN GP Classification Error Rate FITC EP FULL GP EP NYSTROM LAPLACE SOGP EIGEN GP Number of Basis/Rank Number of Basis/Rank Number of Basis/Rank Spambase 3 vs 8 5 vs 8
35 Semisupervised classification Classification Error Rate SEB GR LAPSVM EIGEN GP Classification Error Rate SVM SEB GR LAPSVM EIGEN GP Classification Error Rate SVM SEB GR LAPSVM EIGEN GP Number of Labeled Points Ionospher e (351 points) Number of Labeled Points 20 Newsgroup (1976 points) Number of Labeled Points TDT2 (3672 points)
36 Why EigenGP is better than Nystrom Nystrom method:  Numerical matrix approximation and NOT a valid probabilistic model  break down when the rank is low.  Always the top eigenvectors  Converges to full GP EigenGP:  Valid lowrank GP models  robust when the rank is small (i.e., HIGHLY sparse models)  Can choose eigenvectors based on labeled information  semisupervised learning  Can outperform full GP, esp. for classification (exploring clustering property)
37 Conclusions Latent GP models for graphs and tensors: 20% improvement in prediction accuracy (Xu et al., 2012) EigenGP: fast inference with potential of outperforming full GP on prediction accuracy (Qi et al., 2012)
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationCSE 494 CSE/CBS 598 (Fall 2007): Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye
CSE 494 CSE/CBS 598 Fall 2007: Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye 1 Introduction One important method for data compression and classification is to organize
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany carl@tuebingen.mpg.de WWW home page: http://www.tuebingen.mpg.de/ carl
More informationSemiSupervised Support Vector Machines and Application to Spam Filtering
SemiSupervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
More informationTensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
More informationFiltered Gaussian Processes for Learning with Large DataSets
Filtered Gaussian Processes for Learning with Large DataSets Jian Qing Shi, Roderick MurraySmith 2,3, D. Mike Titterington 4,and Barak A. Pearlmutter 3 School of Mathematics and Statistics, University
More informationAPPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
More informationLatent variable and deep modeling with Gaussian processes; application to system identification. Andreas Damianou
Latent variable and deep modeling with Gaussian processes; application to system identification Andreas Damianou Department of Computer Science, University of Sheffield, UK Brown University, 17 Feb. 2016
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationMachine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.
Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,
More informationCS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on
CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d dimensional subspace Axes of this subspace
More informationFlexible and efficient Gaussian process models for machine learning
Flexible and efficient Gaussian process models for machine learning Edward Lloyd Snelson M.A., M.Sci., Physics, University of Cambridge, UK (2001) Gatsby Computational Neuroscience Unit University College
More informationBayes and Naïve Bayes. cs534machine Learning
Bayes and aïve Bayes cs534machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule
More informationLocal Gaussian Process Regression for Real Time Online Model Learning and Control
Local Gaussian Process Regression for Real Time Online Model Learning and Control Duy NguyenTuong Jan Peters Matthias Seeger Max Planck Institute for Biological Cybernetics Spemannstraße 38, 776 Tübingen,
More informationSampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data
Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data (Oxford) in collaboration with: Minjie Xu, Jun Zhu, Bo Zhang (Tsinghua) Balaji Lakshminarayanan (Gatsby) Bayesian
More informationTensor Factorization for MultiRelational Learning
Tensor Factorization for MultiRelational Learning Maximilian Nickel 1 and Volker Tresp 2 1 Ludwig Maximilian University, Oettingenstr. 67, Munich, Germany nickel@dbs.ifi.lmu.de 2 Siemens AG, Corporate
More informationRandom function priors for exchangeable arrays with applications to graphs and relational data
Random function priors for exchangeable arrays with applications to graphs and relational data James Robert Lloyd Department of Engineering University of Cambridge Peter Orbanz Department of Statistics
More informationConcepts in Machine Learning, Unsupervised Learning & Astronomy Applications
Data Mining In Modern Astronomy Sky Surveys: Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications ChingWa Yip cwyip@pha.jhu.edu; Bloomberg 518 Human are Great Pattern Recognizers
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationDifferential Privacy Preserving Spectral Graph Analysis
Differential Privacy Preserving Spectral Graph Analysis Yue Wang, Xintao Wu, and Leting Wu University of North Carolina at Charlotte, {ywang91, xwu, lwu8}@uncc.edu Abstract. In this paper, we focus on
More informationHT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
More informationGaussian Process Latent Variable Models for Visualisation of High Dimensional Data
Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data Neil D. Lawrence Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield,
More informationIntroduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011
Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationBayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
More informationCS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationLearning with Local and Global Consistency
Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany
More informationLearning with Local and Global Consistency
Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationCheng Soon Ong & Christfried Webers. Canberra February June 2016
c Cheng Soon Ong & Christfried Webers Research Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 31 c Part I
More informationStatistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP  Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
More informationData Clustering. Dec 2nd, 2013 Kyrylo Bessonov
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms kmeans Hierarchical Main
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationGaussian Process Training with Input Noise
Gaussian Process Training with Input Noise Andrew McHutchon Department of Engineering Cambridge University Cambridge, CB PZ ajm57@cam.ac.uk Carl Edward Rasmussen Department of Engineering Cambridge University
More informationComparison of Nonlinear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Nonlinear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Nonlinear
More informationarxiv:1410.4984v1 [cs.dc] 18 Oct 2014
Gaussian Process Models with Parallelization and GPU acceleration arxiv:1410.4984v1 [cs.dc] 18 Oct 2014 Zhenwen Dai Andreas Damianou James Hensman Neil Lawrence Department of Computer Science University
More informationLABEL PROPAGATION ON GRAPHS. SEMISUPERVISED LEARNING. Changsheng Liu 10302014
LABEL PROPAGATION ON GRAPHS. SEMISUPERVISED LEARNING Changsheng Liu 10302014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
More informationBIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM 10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
More informationClassification with Hybrid Generative/Discriminative Models
Classification with Hybrid Generative/Discriminative Models Rajat Raina, Yirong Shen, Andrew Y. Ng Computer Science Department Stanford University Stanford, CA 94305 Andrew McCallum Department of Computer
More informationTensor Decompositions for Analyzing Multilink Graphs
Tensor Decompositions for Analyzing Multilink Graphs Danny Dunlavy, Tammy Kolda, Philip Kegelmeyer Sandia National Laboratories SIAM Parallel Processing for Scientific Computing March 13, 2008 Sandia
More informationMixtures of Robust Probabilistic Principal Component Analyzers
Mixtures of Robust Probabilistic Principal Component Analyzers Cédric Archambeau, Nicolas Delannay 2 and Michel Verleysen 2  University College London, Dept. of Computer Science Gower Street, London WCE
More informationExact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure
Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission
More informationDirichlet Processes A gentle tutorial
Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid ElArini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.
More informationSpectral Methods for Learning Latent Variable Models: Unsupervised and Supervised Settings
Spectral Methods for Learning Latent Variable Models: Unsupervised and Supervised Settings Anima Anandkumar U.C. Irvine Learning with Big Data Data vs. Information Messy Data Missing observations, gross
More information203.4770: Introduction to Machine Learning Dr. Rita Osadchy
203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:
More informationGaussian Processes for Big Data Problems
Gaussian Processes for Big Data Problems Marc Deisenroth Department of Computing Imperial College London http://wp.doc.ic.ac.uk/sml/marcdeisenroth Machine Learning Summer School, Chalmers University 14
More informationData Mining  Evaluation of Classifiers
Data Mining  Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationNonnegative Matrix Factorization (NMF) in Semisupervised Learning Reducing Dimension and Maintaining Meaning
Nonnegative Matrix Factorization (NMF) in Semisupervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
More informationNeural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation
Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdelrahman Mohamed A brief history of backpropagation
More informationMathematical Models of Supervised Learning and their Application to Medical Diagnosis
Genomic, Proteomic and Transcriptomic Lab High Performance Computing and Networking Institute National Research Council, Italy Mathematical Models of Supervised Learning and their Application to Medical
More informationMachine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu
Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machinelearning Logistics Lectures M 9:3011:30 am Room 4419 Personnel
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002Topics in StatisticsBiological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationThe Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
More informationComputing with Finite and Infinite Networks
Computing with Finite and Infinite Networks Ole Winther Theoretical Physics, Lund University Sölvegatan 14 A, S223 62 Lund, Sweden winther@nimis.thep.lu.se Abstract Using statistical mechanics results,
More informationBayesian Factorization Machines
Bayesian Factorization Machines Christoph Freudenthaler, Lars SchmidtThieme Information Systems & Machine Learning Lab University of Hildesheim 31141 Hildesheim {freudenthaler, schmidtthieme}@ismll.de
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationW. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H1 JTCC: October 23, 2015
W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction
More informationGaussian Processes for Big Data
Gaussian Processes for Big Data James Hensman Dept. Computer Science The University of Sheffield Sheffield, UK Nicolò Fusi Dept. Computer Science The University of Sheffield Sheffield, UK Neil D. Lawrence
More informationUnsupervised and supervised dimension reduction: Algorithms and connections
Unsupervised and supervised dimension reduction: Algorithms and connections Jieping Ye Department of Computer Science and Engineering Evolutionary Functional Genomics Center The Biodesign Institute Arizona
More informationAn Effective Way to Ensemble the Clusters
An Effective Way to Ensemble the Clusters R.Saranya 1, Vincila.A 2, Anila Glory.H 3 P.G Student, Department of Computer Science Engineering, Parisutham Institute of Technology and Science, Thanjavur, Tamilnadu,
More informationSimple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRALab,
More informationA Unifying View of Sparse Approximate Gaussian Process Regression
Journal of Machine Learning Research 6 (2005) 1939 1959 Submitted 10/05; Published 12/05 A Unifying View of Sparse Approximate Gaussian Process Regression Joaquin QuiñoneroCandela Carl Edward Rasmussen
More informationAcknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues
Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the
More informationSearch Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
More informationAlgorithmic Crowdsourcing. Denny Zhou Microsoft Research Redmond Dec 9, NIPS13, Lake Tahoe
Algorithmic Crowdsourcing Denny Zhou icrosoft Research Redmond Dec 9, NIPS13, Lake Tahoe ain collaborators John Platt (icrosoft) Xi Chen (UC Berkeley) Chao Gao (Yale) Nihar Shah (UC Berkeley) Qiang Liu
More informationNeural Networks. CAP5610 Machine Learning Instructor: GuoJun Qi
Neural Networks CAP5610 Machine Learning Instructor: GuoJun Qi Recap: linear classifier Logistic regression Maximizing the posterior distribution of class Y conditional on the input vector X Support vector
More informationNeural Networks: a replacement for Gaussian Processes?
Neural Networks: a replacement for Gaussian Processes? Matthew Lilley and Marcus Frean Victoria University of Wellington, P.O. Box 600, Wellington, New Zealand marcus@mcs.vuw.ac.nz http://www.mcs.vuw.ac.nz/
More informationNetwork Intrusion Detection using Semi Supervised Support Vector Machine
Network Intrusion Detection using Semi Supervised Support Vector Machine Jyoti Haweliya Department of Computer Engineering Institute of Engineering & Technology, Devi Ahilya University Indore, India ABSTRACT
More informationScalable Nonparametric Multiway Data Analysis
Scalable Nonparametric Multiway Data Analysis Shandian Zhe Department of Computer Science Purdue University szhe@purdue.edu Zenglin Xu Department of Computer Science University of Electronic Science and
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationBEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
More informationUW CSE Technical Report 030601 Probabilistic Bilinear Models for AppearanceBased Vision
UW CSE Technical Report 030601 Probabilistic Bilinear Models for AppearanceBased Vision D.B. Grimes A.P. Shon R.P.N. Rao Dept. of Computer Science and Engineering University of Washington Seattle, WA
More informationLarge Scale Spectral Clustering with LandmarkBased Representation
Proceedings of the TwentyFifth AAAI Conference on Artificial Intelligence Large Scale Spectral Clustering with LandmarkBased Representation Xinlei Chen Deng Cai State Key Lab of CAD&CG, College of Computer
More informationDistance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center
Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I  Applications Motivation and Introduction Patient similarity application Part II
More informationPrinciple Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate
More informationContributions to high dimensional statistical learning
Contributions to high dimensional statistical learning Stéphane Girard INRIA RhôneAlpes & LJK (team MISTIS). 655, avenue de l Europe, Montbonnot. 38334 SaintIsmier Cedex, France Stephane.Girard@inria.fr
More informationMixture Modeling of Individual Learning Curves
Mixture Modeling of Individual Learning Curves Matthew Streeter Duolingo, Inc. Pittsburgh, PA matt@duolingo.com ABSTRACT We show that student learning can be accurately modeled using a mixture of learning
More informationProbabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur
Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:
More informationSignal Denoising on Graphs via Graph Filtering
Signal Denoising on Graphs via Graph Filtering Siheng Chen Aliaksei Sandryhaila José M. F. Moura Jelena Kovačević Department of ECE Center for Bioimage Informatics sihengc@andrew.cmu.edu HP Vertica aliaksei.sandryhaila@hp.com
More informationDynamic Active Probing of Helpdesk Databases
Dynamic Active Probing of Helpdesk Databases Shenghuo Zhu NEC Labs America zsh@sv.neclabs.com Dingding Wang Florida International University dwang003@cs.fiu.edu Tao Li Florida International University
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationExtracting correlation structure from large random matrices
Extracting correlation structure from large random matrices Alfred Hero University of Michigan  Ann Arbor Feb. 17, 2012 1 / 46 1 Background 2 Graphical models 3 Screening for hubs in graphical model 4
More informationMethods of Data Analysis Working with probability distributions
Methods of Data Analysis Working with probability distributions Week 4 1 Motivation One of the key problems in nonparametric data analysis is to create a good model of a generating probability distribution,
More informationCollaborative Filtering. Radek Pelánek
Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains
More informationLargeScale Sparsified Manifold Regularization
LargeScale Sparsified Manifold Regularization Ivor W. Tsang James T. Kwok Department of Computer Science and Engineering The Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationFinal Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
More informationMACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
More informationIntroduction to machine learning and pattern recognition Lecture 1 Coryn BailerJones
Introduction to machine learning and pattern recognition Lecture 1 Coryn BailerJones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 What is machine learning? Data description and interpretation
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationPerformance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations
Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams, 1990 Presented by Chris Eldred Outline Summary Finite Element Solver Load Balancing Results Types Conclusions
More informationMACHINE LEARNING. Introduction. Alessandro Moschitti
MACHINE LEARNING Introduction Alessandro Moschitti Department of Computer Science and Information Engineering University of Trento Email: moschitti@disi.unitn.it Course Schedule Lectures Tuesday, 14:0016:00
More informationSupervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
More informationLecture 11: Graphical Models for Inference
Lecture 11: Graphical Models for Inference So far we have seen two graphical models that are used for inference  the Bayesian network and the Join tree. These two both represent the same joint probability
More information