Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu
|
|
|
- Violet Crawford
- 10 years ago
- Views:
Transcription
1 Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July
2 Data A lot of data
3 Outline Nonlinear tensor models (and stochastic blockmodels) Sparse Gaussian process models
4 Tensor data: multiple aspects Patients Biomarkers Medicines Yi,j,k: the value of the k-th biomarker (i.e., cell population) for the j-th patient after taking the i-th medicine Predict drug response
5 Matrix data: networks people people Yi,j: 1 if node i is linked to node j, 0 otherwise. Discover communities and predict unknown interactions
6 Goals - Predict unknown elements (e.g., drug response and network interactions) - Identify latent multi-aspect groups (communities)
7 Classical Tucker decomposition Generalization of matrix factorization 3D case: core tensor loading matrices U Y Z! Y = G 1 U 2 Z 3 V y ijk = g rst u ir z js v ks r s t Sun et al. 2008
8 Assumptions Complete Continuous Multi-linear
9 Solution Sparse latent Gaussian processes on tensors
10 Latent sparse GP on tensors Patients Biomarkers (i, j, k) u i : Medicines Element u i : z j : v k : (i, j, k) is characterized by Sparse loading vector in latent medicine groups Sparse loading vector in latent patient groups Sparse loading vector in latent biomarker groups
11 Separate covariance for each dimension K v Biomarkers -Separate covariance/kernel function for each dimension Kz Patients i r K u (i, r) =k(u i, u r ) Medicines Nonlinear relationship between medicines i and r -The more similar loading vectors, the larger the covariance function value
12 GP on tensors GP on a tensor: stochastic process in an infinite tensor space Tensor N (F 0; K u, K v, K z )=(2π) 3N 2 s=u,v,z K s N2 2 Evaluations of GP on any tensor of finite size is a tensor-valued Gaussian distribution exp{ 1 2 (F 1 K 1 u 2 K 1 v 3 K 1 z ) F 2 }
13 Predict unknown tensor elements Kv Patients K v Biomarkers (i, j, k) Medicines K u w (r, s, t) Simple illustration: 1) Based on observed data, estimate loading vectors: [u i, z j, v k ] 2) Compute weights (similarities) between unknown and observed elements: w(ijk, rst) w([u i, z j, v k ], [u r, z s, v t ]) 3) Predict the unknown element: y i,j,k = r,s,t w(ijk, rst)y r,s,t
14 Graphical model representation U =[u 1,...,u N ] U V Z Sparse loading vectors u i exp( λ u i ) Similarly, sample V and Z F Latent tensor F N(0; K u, K v, K z ) Y U Unknown data Y O Observed data Y O p(y ijk f ijk ) (i,j,k) O p(y ijk f ijk ) : Gaussian for continuous data Probit for binary data Possion for count data
15 Benefits Handle binary and missing data Discover block/group structures Avoid overfitting: adaptive nonparametric model complexity Model prediction uncertainty Incorporate additional side information Yan, Xu & Qi, 2011; Xu, Yan & Qi 2011
16 Algorithm: Variational EM Marginal likelihood log p(y O U, V, Z) + log p(u, V, Z) Variational approximation Iterations
17 Algorithm: explore model structures Example: Trace{(I + K u K v K z ) 1 } Direct computation: Matrix inversion N 3 by N 3 O(N 9 ) Kronecker product operation:
18 Properties of Kronecker product Properties: Eigen-decomposition K = WΛW T If K = K u K v K z then W = W u W v W z Λ = Λ u Λ v Λ z W u : eigenvectors of K u diag{λ u } : eigenvalues of K u
19 Reduced computational complexity Example: Trace{(I + K u K v K z ) 1 } Direct computation Using the new theorem and trace properties O(N 9 ) N N N i=1 j=1 k=1 1 1+λ u i λv j λz k O(N 3 ) = M M M i=1 j=1 k=1 1 1+λ u i λv j λz k O(N 2 M) M<<N
20 2D case: GP stochastic blockmodels - Undirected networks (friend relationships and protein-protein interactions) - Represented by symmetric adjacent matrices Yan, Xu & Qi, UAI 2011; Xu, Yan & Qi AAAI 2011
21 2D: Coauthor networks Membership produced by LEM Area Under Curve AUC values SMGB MMSB LEM Ours Number of latent groups Groups Groups Groups Nodes Membership produced by MMSB Nodes Membership produced by SMGB Nodes NIPS authors Ours Co-authorship dataset: co-authorship links from100 authors who have the largest number of co-authors from NIPS 1-17.
22 3D: Enron s Enron dataset: s from senior management of Enron before its bankruptcy in D tensor representation: Sender-Recipient-Subject Area Under Curve AUC values InfTucker tp InfTucker gp CP TD HOSVD NCP WCP PTD Ours Number of Factors Xu, Yan & Qi ICML 2012
23 4D: Digg Area Under Curve AUC values Ours Number of Factors Digg dataset: Social news from digg.com D tensor representation: user-news-keywords-category AUC values InfTucker tp InfTucker gp CP TD HOSVD NCP WCP PTD 3 5
24 Outline Bayesian nonparametric stochastic blockmodels Sparse Gaussian process models
25 Gaussian process Nonparametric Bayesian prior over functions Computational bottleneck: O(N 3 ) for regression
26 Sparse GP models Low-rank approximation by Nystrom approximation (Williams & Seeger 2001) Summarize data by a few pseudo inputs (Snelson & Ghahramani 2006) Summarize data by a few pseudo data clouds (Qi et al. 2010) A unifying view (Quiñonero-Candela & Rasmussen 2005)
27 Summarization by data clouds (Qi et al. UAI 2010) Exact posterior process: Data cloud approximation: M<<N Local manifold information q(f) GP (f 0,K) M j=1 p(u j f(x)φ(x)dx,λ 1 j ) When φ(x) =δ(x), this approximation reduces to FITC (i.e,. pseudo input approximation).
28 X 2 0 X X 1 Exact GP X 1 Pseudo input (i.e., FITC) X 2 0 X X 1 X 1 SASPA_sphere SASPA (Qi et al. UAI 2010)
29 Key observation Previous approaches: compress data into a sparse representation including pseudo data points or clouds PCA: the optimal compact representation among all orthogonal bases
30 EigenGP: sparse PCA + GP KL expansion of GP prior by Nystrom method: O(N 2 ) -> O(M 2 N) Select eigenfunctions by evidence maximization Qi, Dai & Zhu NIPS submission 2012
31 Eigenfunctions of Gaussian kernel Top four eigenfunctions Selected eigenfunctions
32 RMSE RMSE FULL GP FITC NYSTROM EIGEN GP EIGEN GP* Number of Basis/Rank Number of Basis/Rank Boston Housing (400/506 for training) RMSE of Nystrom: 312.5, 41.68, and Pumadyn-8nm (2000/8192 for training)
33 Classification results on digits 0.02 EIGEN GP* EIGEN GP Classification Errror Number of Basis/Rank EigenGP*: fix the eigenvalues EigenGP: sparsify eigenvalues
34 Classification results Classification Error Rate FITC EP FULL GP EP NYSTROM LAPLACE SOGP EIGEN GP Classification Error Rate FITC EP FULL GP EP NYSTROM LAPLACE SOGP EIGEN GP Classification Error Rate FITC EP FULL GP EP NYSTROM LAPLACE SOGP EIGEN GP Number of Basis/Rank Number of Basis/Rank Number of Basis/Rank Spambase 3 vs 8 5 vs 8
35 Semi-supervised classification Classification Error Rate SEB GR LAPSVM EIGEN GP Classification Error Rate SVM SEB GR LAPSVM EIGEN GP Classification Error Rate SVM SEB GR LAPSVM EIGEN GP Number of Labeled Points Ionospher e (351 points) Number of Labeled Points 20 Newsgroup (1976 points) Number of Labeled Points TDT2 (3672 points)
36 Why EigenGP is better than Nystrom Nystrom method: - Numerical matrix approximation and NOT a valid probabilistic model -- break down when the rank is low. - Always the top eigenvectors - Converges to full GP EigenGP: - Valid low-rank GP models -- robust when the rank is small (i.e., HIGHLY sparse models) - Can choose eigenvectors based on labeled information -- semisupervised learning - Can outperform full GP, esp. for classification (exploring clustering property)
37 Conclusions Latent GP models for graphs and tensors: 20% improvement in prediction accuracy (Xu et al., 2012) EigenGP: fast inference with potential of outperforming full GP on prediction accuracy (Qi et al., 2012)
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
Gaussian Processes in Machine Learning
Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany [email protected] WWW home page: http://www.tuebingen.mpg.de/ carl
Semi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
Latent variable and deep modeling with Gaussian processes; application to system identification. Andreas Damianou
Latent variable and deep modeling with Gaussian processes; application to system identification Andreas Damianou Department of Computer Science, University of Sheffield, UK Brown University, 17 Feb. 2016
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
Tensor Factorization for Multi-Relational Learning
Tensor Factorization for Multi-Relational Learning Maximilian Nickel 1 and Volker Tresp 2 1 Ludwig Maximilian University, Oettingenstr. 67, Munich, Germany [email protected] 2 Siemens AG, Corporate
Flexible and efficient Gaussian process models for machine learning
Flexible and efficient Gaussian process models for machine learning Edward Lloyd Snelson M.A., M.Sci., Physics, University of Cambridge, UK (2001) Gatsby Computational Neuroscience Unit University College
Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.
Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,
CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on
CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace
Local Gaussian Process Regression for Real Time Online Model Learning and Control
Local Gaussian Process Regression for Real Time Online Model Learning and Control Duy Nguyen-Tuong Jan Peters Matthias Seeger Max Planck Institute for Biological Cybernetics Spemannstraße 38, 776 Tübingen,
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
Bayes and Naïve Bayes. cs534-machine Learning
Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule
Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University [email protected]
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University [email protected] 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
HT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
CS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data
Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data Neil D. Lawrence Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield,
Learning with Local and Global Consistency
Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Learning with Local and Global Consistency
Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany
Bayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
Statistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
Christfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
Gaussian Process Training with Input Noise
Gaussian Process Training with Input Noise Andrew McHutchon Department of Engineering Cambridge University Cambridge, CB PZ [email protected] Carl Edward Rasmussen Department of Engineering Cambridge University
Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
Linear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, [email protected]) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
Classification with Hybrid Generative/Discriminative Models
Classification with Hybrid Generative/Discriminative Models Rajat Raina, Yirong Shen, Andrew Y. Ng Computer Science Department Stanford University Stanford, CA 94305 Andrew McCallum Department of Computer
Mixtures of Robust Probabilistic Principal Component Analyzers
Mixtures of Robust Probabilistic Principal Component Analyzers Cédric Archambeau, Nicolas Delannay 2 and Michel Verleysen 2 - University College London, Dept. of Computer Science Gower Street, London WCE
arxiv:1410.4984v1 [cs.dc] 18 Oct 2014
Gaussian Process Models with Parallelization and GPU acceleration arxiv:1410.4984v1 [cs.dc] 18 Oct 2014 Zhenwen Dai Andreas Damianou James Hensman Neil Lawrence Department of Computer Science University
Spectral Methods for Learning Latent Variable Models: Unsupervised and Supervised Settings
Spectral Methods for Learning Latent Variable Models: Unsupervised and Supervised Settings Anima Anandkumar U.C. Irvine Learning with Big Data Data vs. Information Messy Data Missing observations, gross
Dirichlet Processes A gentle tutorial
Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid El-Arini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.
Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure
Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission
203.4770: Introduction to Machine Learning Dr. Rita Osadchy
203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:
Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation
Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed A brief history of backpropagation
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. [email protected]
Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang [email protected] http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel
The Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
Computing with Finite and Infinite Networks
Computing with Finite and Infinite Networks Ole Winther Theoretical Physics, Lund University Sölvegatan 14 A, S-223 62 Lund, Sweden [email protected] Abstract Using statistical mechanics results,
Bayesian Factorization Machines
Bayesian Factorization Machines Christoph Freudenthaler, Lars Schmidt-Thieme Information Systems & Machine Learning Lab University of Hildesheim 31141 Hildesheim {freudenthaler, schmidt-thieme}@ismll.de
Gaussian Processes for Big Data
Gaussian Processes for Big Data James Hensman Dept. Computer Science The University of Sheffield Sheffield, UK Nicolò Fusi Dept. Computer Science The University of Sheffield Sheffield, UK Neil D. Lawrence
Unsupervised and supervised dimension reduction: Algorithms and connections
Unsupervised and supervised dimension reduction: Algorithms and connections Jieping Ye Department of Computer Science and Engineering Evolutionary Functional Genomics Center The Biodesign Institute Arizona
A Unifying View of Sparse Approximate Gaussian Process Regression
Journal of Machine Learning Research 6 (2005) 1939 1959 Submitted 10/05; Published 12/05 A Unifying View of Sparse Approximate Gaussian Process Regression Joaquin Quiñonero-Candela Carl Edward Rasmussen
Simple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
Neural Networks: a replacement for Gaussian Processes?
Neural Networks: a replacement for Gaussian Processes? Matthew Lilley and Marcus Frean Victoria University of Wellington, P.O. Box 600, Wellington, New Zealand [email protected] http://www.mcs.vuw.ac.nz/
Network Intrusion Detection using Semi Supervised Support Vector Machine
Network Intrusion Detection using Semi Supervised Support Vector Machine Jyoti Haweliya Department of Computer Engineering Institute of Engineering & Technology, Devi Ahilya University Indore, India ABSTRACT
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate
Mathematical Models of Supervised Learning and their Application to Medical Diagnosis
Genomic, Proteomic and Transcriptomic Lab High Performance Computing and Networking Institute National Research Council, Italy Mathematical Models of Supervised Learning and their Application to Medical
Statistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
Mixture Modeling of Individual Learning Curves
Mixture Modeling of Individual Learning Curves Matthew Streeter Duolingo, Inc. Pittsburgh, PA [email protected] ABSTRACT We show that student learning can be accurately modeled using a mixture of learning
Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur
Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:
Methods of Data Analysis Working with probability distributions
Methods of Data Analysis Working with probability distributions Week 4 1 Motivation One of the key problems in non-parametric data analysis is to create a good model of a generating probability distribution,
MACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
Big Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
Final Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations
Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams, 1990 Presented by Chris Eldred Outline Summary Finite Element Solver Load Balancing Results Types Conclusions
Collaborative Filtering. Radek Pelánek
Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains
Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues
Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the
Regression III: Advanced Methods
Lecture 4: Transformations Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture The Ladder of Roots and Powers Changing the shape of distributions Transforming
Nonlinear Iterative Partial Least Squares Method
Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for
Introduction to Online Learning Theory
Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent
UW CSE Technical Report 03-06-01 Probabilistic Bilinear Models for Appearance-Based Vision
UW CSE Technical Report 03-06-01 Probabilistic Bilinear Models for Appearance-Based Vision D.B. Grimes A.P. Shon R.P.N. Rao Dept. of Computer Science and Engineering University of Washington Seattle, WA
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
Basics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu [email protected] Modern machine learning is rooted in statistics. You will find many familiar
T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577
T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or
Multivariate Statistical Inference and Applications
Multivariate Statistical Inference and Applications ALVIN C. RENCHER Department of Statistics Brigham Young University A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim
Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center
Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II
An Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia [email protected] Tata Institute, Pune,
Statistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
Extracting correlation structure from large random matrices
Extracting correlation structure from large random matrices Alfred Hero University of Michigan - Ann Arbor Feb. 17, 2012 1 / 46 1 Background 2 Graphical models 3 Screening for hubs in graphical model 4
A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions
A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center
Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012
Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering
On the Path to an Ideal ROC Curve: Considering Cost Asymmetry in Learning Classifiers
On the Path to an Ideal ROC Curve: Considering Cost Asymmetry in Learning Classifiers Francis R. Bach Computer Science Division University of California Berkeley, CA 9472 [email protected] Abstract
10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
