Bayesian Factorization Machines
|
|
|
- Nancy Quinn
- 10 years ago
- Views:
Transcription
1 Bayesian Factorization Machines Christoph Freudenthaler, Lars Schmidt-Thieme Information Systems & Machine Learning Lab University of Hildesheim Hildesheim {freudenthaler, Steffen Rendle Social Network Analysis University of Konstanz Konstanz, Germany Abstract This work presents simple and fast structured Bayesian learning for matrix and tensor factorization models. An unblocked Gibbs sampler is proposed for factorization machines (FM which are a general class of latent variable models subsuming matrix, tensor and many other factorization models. We empirically show on the large Netflix challenge dataset that Bayesian FM are fast, scalable and more accurate than state-of-the-art factorization models. 1 Introduction Many different problems in machine learning such as computer vision, computational biology or recommender systems deal with complex problems in sparse data. For those problems, e.g., the Netflix prize problem 1, sparse representations such as latent variable models are typically used. The most basic version are matrix factorization models, extensible to tensor factorization models for more than two categorical dimensions. In [5], Rendle generalizes these factorization models to Factorization Machines (FM and shows how a simple input vector consisting of the appropriate predictors is enough to mimic most factorization models. The general principle is the same as for standard linear regression. For instance factorizing a matrix containing the rating of a user u U represented in row r = 1,..., U for a specific item i I represented in column c = 1,..., I can be generated by defining an input vector of length p = U + I consisting of p indicator variables. Within the first U indicators exactly one indicator is on - the one identifying the current user. Within the last I indicators again exactly one is on - now for the current item. Typically, the number of latent dimensions k for latent variable models is not known. Too small k underfit, too high k overfit the data. Structured Bayesian methods using hierarchical priors overcome brute force grid search for k, e.g., [7, 9] for matrix factorization, [11] for tensor factorization, and [10] multi-relational learning. Each paper derived its own learning algorithm. In this work we come up with one structured Bayesian learning algorithm for a much wider class of factorization models among others subsuming matrix, tensor or multi-relational factorization models. Moreover, the runtime complexity of existing algorithms is typically non-linear in k. A first attempt to tackle this issue is made by Zhu et al. [12]. They adapt the Gibbs sampler of BPMF [7] by unblocking some sampling steps leading to a final runtime complexity of O(nk + k 2 ( U + I. In our work, we derive a fast and unblocked Gibbs sampler where each iteration is linear in k. The improvement is based on the work of Rendle et al. [6] where a linear time learning algorithm for FMs using alternating least-squares is presented. As posterior mean and variance of factors are closely related to the (conditional least-squares solution of [6], we can transfer these ideas to derive fast Bayesian FMs
2 0, 0,,,,...,p,...,p w j v j w j v j x ij x ij w 0 y i w 0 y i 0,0,...,n 0,0,...,n 0, 0 Figure 1: Comparison of the probabilistic interpretation of standard Factorization Machines (left to Bayesian Factorization Machines (right. BFM extend the standard model by using hyperpriors. 2 Bayesian Factorization Machines 2.1 Factorization Machines (FM Factorization Machines [5] are a regression model for a target y using p explanatory variables x R p as: p D p p d y(x := w 0 + w j x j +... x jl (1 d=2 j 1 =1 w j1,...,j d j d >j d 1 l=1 where second and higher order parameters w j1,...,j d are factorized using PARAFAC[2]: w j1,...,j d := k d f=1 l=1 v (d j l,f d 2. (2 FM subsume (among others D-way tensor factorization which is representable by engineering the input features x. Engineering x = (x 1,..., x D as a sequence of disjoint subsequences x i = x i,1,..., x i,pi where i = 1,..., D and!j x i : x i,j = 1, x i, j = 0 i = 1,..., D gives a D-way tensor factorization model. Note that, FM are more complex than tensor factorization models as x may consist of real valued predictor variables. 2.2 Bayesian Factorization Machines So far FM lack Bayesian inference. For obvious reasons such as better accuracy, automatic hyperparameter learning (no grid search or no learning hyperparameter, e.g., learn rate for gradient descent, we want to enhance FM with structured Bayesian inference. Bayesian theory [1] suggest Bayesian models with hierarchical hyperpriors as they regularize underlying model parameters. The intuition behind this is that additional (hierarchical structure put on model parameters shrinks them towards each other as long as no evidence in the data clearly indicates the opposite. Without loss of generality but for better readability, we present Bayesian FM (BFM of order D = 2. Our hierarchical Bayesian model adds to the standard regularized regression model Gaussian hyperpriors for each mean µ θ of all model parameters θ Θ = {w 0, w 1,..., w p, v 1,1,..., v p,k } but µ w0 as well as Gamma hyperpriors for each prior precision α and λ θ but λ w0. The standard L2- regularized regression model with hyperparameters Θ H = {λ θ, µ θ : θ Θ} is for a sample of n observations (y i, x i R 1+p : p(θ y, X, Θ H n α e α 2 (yi y(xi,θ2 θ Θ λθ e λ θ 2 (θ µ θ 2 (3 Figure 1 depicts the graphical representations of the standard and our hierarchically extended Bayesian model: we define two regularization populations, i.e. one for all linear effects w i and another one for the latent variable v 1,1,..., v p,k. 2
3 2.3 Inference: Efficient Gibbs Sampling Since posterior inference for FM is analytically intractable we use Gibbs sampling to draw from the posterior. Standard Gibbs sampling is blocked Gibbs sampling, i.e. all inferred variables Θ and Θ H b are divided into b disjoint blocks consisting of r 1,..., r b variables. However, blocked Gibbs sampling for our hierarchical Bayes approach leads to higher time complexities in the final Gibbs sampler as for each block of size r i matrices need to be inverted which is O(ri 3. Therefore, we propose single parameter Gibbs sampling. For notational readability, we exploit the multi-linear nature of FM, i.e. for each θ Θ eq. 1 can be rewritten as [6]: y(x Θ = g θ (x + θ h θ (x θ Θ (4 where g θ, h θ respectively comprise all terms additively, multiplicatively connected to θ (see [6] for details. Using this notation and parameters Θ 0 = {γ 0, µ 0, α 0, β 0, α λ, β λ } of the conjugate hyperpriors the conditional posterior distributions for each θ, µ θ, λ θ, and α resulting from our hierarchical Bayesian model are: p(θ y, X, Θ \ {θ}, Θ H, Θ 0 = N (µ θ, σ 2 θ, p(µ θ y, X, Θ, Θ H \ {µ θ }, Θ 0 = N (µ µθ, σ 2 µ θ p(λ θ y, X, Θ, Θ H \ {λ θ }, Θ 0 = Γ(α θ, β θ, p(α y, X, Θ, Θ H \ {α}, Θ 0 = Γ(α n, β n with: ( n ( 1 n µ θ = σθ 2 α(y i g θ (x ih θ (x i + µ θ λ θ, σθ 2 = α h θ (x i 2 + λ θ ( p µ µθ = σµ 2 θ λ θ θ j + γ 0µ 0, σµ 2 θ = ((p + γ 0λ θ 1 (6 ( p α θ = (α λ + p + 1 /2, β θ = (θ j µ θ 2 + γ 0(µ θ µ β λ /2 (7 ( n α n = (α 0 + n /2, β n = (y i y(x i, Θ 2 + β 0 /2 (8 The final algorithm iteratively draws samples for hyperparameters Θ H and model parameters Θ. Sampling of hyperparameters using eq. 6-8 is simple. With a straight-forward implementation it is done in O( n k m(x i. = O(k N z 2. Efficient computation of posterior mean and posterior variance of eq. 5 is more complicated. However the posterior mean is identical to the alternating least squares solution for which an efficient algorithm has already proposed [6] with an overall complexity of O(k N z for an update of all model parameters. (5 3 Evaluation We evaluate BFM on the Netflix challenge dataset which is well-known in the recommender systems community. It consists of n = 100, 480, 507 ratings of 480, 189 users for 17, 700 movies. The train/test split is the same as in the challenge. This allows direct comparison with existing results reported in [8, 7, 3, 11]. For the parameters of the hyperpriors Θ 0 we have chosen trivial values: α 0 = β 0 = γ 0 = α λ = β λ = 1, µ 0 = 0. Please note, due to the high number of explanatory variables their impact on the Gibbs sampler is negligible (cf. eq With the derivation of BFM it is possible to learn many different models (e.g. matrix/tensor factorization Bayesian by appropriate specification of the input data x. Exploiting this flexibility, we tested several models by compiling the feature vector x from the following components: (1 binary indicators: for each user (u, each item (i, each day of rating (t, each u and t of rating combination (ut, each i and t combination (it 3, each integer value of the logarithm of the rating user s rating frequency on the day of rating (f, each i and f combination (if, and for items the current user has 2 m(x i. is the number of non-zero elements in the i-th row of the design matrix X. N z := n m(xi. is the number of non-zero elements in X. 3 Time is binned in 50 days intervals (following [4]. 3
4 Netflix RMSE SGD KNN256 SGD KNN256++ BFM KNN256 BFM KNN256++ SGD PMF BPMF BFM (u,i BFM (u,i,t,f BFM (u,i,ut,it,if BFM Ensemble k (#latent Dimensions RMSE BPMF k=30 BFM (u,i k=32 BFM (u,i,t,f k= #Samples Figure 2: Predictive accuracy of various instances of Bayesian Factorization Machines (BFM compared to state-of-the-art baselines: SGD PMF [8], BPMF [7], SGD KNN/KNN++ [3]. Right panel: Convergence of unblocked Gibbs sampling (BFM is comparable to blocked Gibbs sampling (BPMF [7]. co-rated (++. (2 Real-valued variables: the rating values for the co-rated items (KNN 4. We index BFM with the features they use, e.g., BFM (u, i is a Bayesian matrix factorization model using the user and item indicators u and i. It is similar to BPMF [7]. BFM (u, i, t uses time information like the Bayesian PARAFAC model BPTF [11]. BFM KNN and BFM KNN++ are the Bayesian equivalent to SGD KNN(++ [3], etc. Figure 2 plots the quality with a varying amount of factorization dimensions k. As expected, posterior averages in BPMF and BFM (u, i outperform posterior modes in standard matrix factorization. For the same reason BMF KNN(++ outperform SGD KNN(++. Besides, the Bayesian approaches achieve this without a hyperparameter search. Next, we compare our generic BFM approach to the corresponding specialized models where Gibbs samplers [7, 11] already exist. Comparing BFM (u, i to BPMF [7], shows that BFM outperforms BPMF slightly. The reason for this are the additional linear effects in BFM. For the time-aware Bayesian factorization model BPTF [11], the best value the authors report is an RMSE value of for k = 30. For comparison we also run an BFM (u, i, t and largely outperform this with (k = 32 and (k = 64. Moreover, in contrast to BPMF and BPTF using blocked Gibbs, BFM achieve these results with a Gibbs sampler of lower computational complexity. Encouraged by the quality of the single factorization models we also computed a linear ensemble of three BFM: a BFM (u, i, t, f with k = 200, a BFM KNN256++ and a BFM (u, i, ut, it, if with k = 64. The result of this ensembled BFM is The right panel of figure 2 depicts the convergence rate of blocked 5 and unblocked Gibbs sampling. Surprisingly, the number of iterations for similar models (matrix factorization, k 30 using unblocked and blocked Gibbs sampling are comparable. 4 Conclusion We proposed BFM, a computationally efficient Gibbs sampler for FM. Instead of cubic complexity BFM are linear in k. Empirically tested on the Netflix challenge, BFM converge as fast as models using blocked Gibbs sampling. Thus, BFM combine high accuracy and more scalable learning with the generality of FM which subsume many matrix/tensor factorization models and other (factorized polynomial regression models like KNN models. 4 For ++ and KNN we follow the pruning step of [3] and took only the 256 most similar items with respect to the target item. 5 Parameters are grouped in blocks of size k. 4
5 References [1] Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin. Bayesian Data Analysis. Chapman and Hall/CRC, 2nd edition, [2] Richard A. Harshman. Foundations of the parafac procedure: models and conditions for an exploratory multimodal factor analysis. UCLA Working Papers in Phonetics, pages 1 84, [3] Yehuda Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In KDD 08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages , New York, NY, USA, ACM. [4] Yehuda Koren. Collaborative filtering with temporal dynamics. In KDD 09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages , New York, NY, USA, ACM. [5] Steffen Rendle. Factorization machines. In Proceedings of the 10th IEEE International Conference on Data Mining. IEEE Computer Society, [6] Steffen Rendle, Zeno Gantner, Christoph Freudenthaler, and Lars Schmidt-Thieme. Fast context-aware recommendations with factorization machines. In Proceedings of the 34th ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, [7] Ruslan Salakhutdinov and Andriy Mnih. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the 25th International Conference on Machine Learning, volume 25, [8] Ruslan Salakhutdinov and Andriy Mnih. Probabilistic matrix factorization. In Advances in Neural Information Processing Systems, volume 20, [9] Mikkel Schmidt. Linearly constrained bayesian matrix factorization for blind source separation. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages [10] Ilya Sutskever, Ruslan Salakhutdinov, and Joshua Tenenbaum. Modelling relational data using bayesian clustered tensor factorization. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages [11] Liang Xiong, Xi Chen, Tzu-Kuo Huang, Jeff Schneider, and Jaime G. Carbonell. Temporal collaborative filtering with bayesian probabilistic tensor factorization. In Proceedings of the SIAM International Conference on Data Mining (SDM 2010, pages SIAM, [12] Shenghuo Zhu, Kai Yu, and Yihong Gong. Stochastic relational models for large-scale dyadic data using mcmc. In NIPS, pages ,
Factorization Machines
Factorization Machines Factorized Polynomial Regression Models Christoph Freudenthaler, Lars Schmidt-Thieme and Steffen Rendle 2 Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim,
Factorization Machines
Factorization Machines Steffen Rendle Department of Reasoning for Intelligence The Institute of Scientific and Industrial Research Osaka University, Japan [email protected] Abstract In this
Tensor Factorization for Multi-Relational Learning
Tensor Factorization for Multi-Relational Learning Maximilian Nickel 1 and Volker Tresp 2 1 Ludwig Maximilian University, Oettingenstr. 67, Munich, Germany [email protected] 2 Siemens AG, Corporate
Bayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
Probabilistic Matrix Factorization
Probabilistic Matrix Factorization Ruslan Salakhutdinov and Andriy Mnih Department of Computer Science, University of Toronto 6 King s College Rd, M5S 3G4, Canada {rsalakhu,amnih}@cs.toronto.edu Abstract
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University [email protected]
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University [email protected] 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
Big Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Going For Large Scale Application Scenario: Recommender
called Restricted Boltzmann Machines for Collaborative Filtering
Restricted Boltzmann Machines for Collaborative Filtering Ruslan Salakhutdinov [email protected] Andriy Mnih [email protected] Geoffrey Hinton [email protected] University of Toronto, 6 King
Collaborative Filtering. Radek Pelánek
Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains
11. Time series and dynamic linear models
11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd
HT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
Rating Prediction with Informative Ensemble of Multi-Resolution Dynamic Models
JMLR: Workshop and Conference Proceedings 75 97 Rating Prediction with Informative Ensemble of Multi-Resolution Dynamic Models Zhao Zheng Hong Kong University of Science and Technology, Hong Kong Tianqi
Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. [email protected]
Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang [email protected] http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu
Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of
Hybrid model rating prediction with Linked Open Data for Recommender Systems
Hybrid model rating prediction with Linked Open Data for Recommender Systems Andrés Moreno 12 Christian Ariza-Porras 1, Paula Lago 1, Claudia Jiménez-Guarín 1, Harold Castro 1, and Michel Riveill 2 1 School
PREA: Personalized Recommendation Algorithms Toolkit
Journal of Machine Learning Research 13 (2012) 2699-2703 Submitted 7/11; Revised 4/12; Published 9/12 PREA: Personalized Recommendation Algorithms Toolkit Joonseok Lee Mingxuan Sun Guy Lebanon College
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
RecSys Challenge 2014: an ensemble of binary classifiers and matrix factorization
RecSys Challenge 204: an ensemble of binary classifiers and matrix factorization Róbert Pálovics,2 Frederick Ayala-Gómez,3 Balázs Csikota Bálint Daróczy,3 Levente Kocsis,4 Dominic Spadacene András A. Benczúr,3
Big Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Going For Large Scale Going For Large Scale 1
BayesX - Software for Bayesian Inference in Structured Additive Regression
BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich
Advanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
Lecture 8 February 4
ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt
Gaussian Process Training with Input Noise
Gaussian Process Training with Input Noise Andrew McHutchon Department of Engineering Cambridge University Cambridge, CB PZ [email protected] Carl Edward Rasmussen Department of Engineering Cambridge University
Large Scale Learning to Rank
Large Scale Learning to Rank D. Sculley Google, Inc. [email protected] Abstract Pairwise learning to rank methods such as RankSVM give good performance, but suffer from the computational burden of optimizing
Three New Graphical Models for Statistical Language Modelling
Andriy Mnih Geoffrey Hinton Department of Computer Science, University of Toronto, Canada [email protected] [email protected] Abstract The supremacy of n-gram models in statistical language modelling
Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014
Parallel Data Mining Team 2 Flash Coders Team Research Investigation Presentation 2 Foundations of Parallel Computing Oct 2014 Agenda Overview of topic Analysis of research papers Software design Overview
Bayesian Predictive Profiles with Applications to Retail Transaction Data
Bayesian Predictive Profiles with Applications to Retail Transaction Data Igor V. Cadez Information and Computer Science University of California Irvine, CA 92697-3425, U.S.A. [email protected] Padhraic
Efficient Thompson Sampling for Online Matrix-Factorization Recommendation
Efficient Thompson Sampling for Online Matrix-Factorization Recommendation Jaya Kawale, Hung Bui, Branislav Kveton Adobe Research San Jose, CA {kawale, hubui, kveton}@adobe.com Long Tran Thanh University
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano
Probabilistic Methods for Time-Series Analysis
Probabilistic Methods for Time-Series Analysis 2 Contents 1 Analysis of Changepoint Models 1 1.1 Introduction................................ 1 1.1.1 Model and Notation....................... 2 1.1.2 Example:
Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler
Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error
Course: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 [email protected] Abstract Probability distributions on structured representation.
The Optimality of Naive Bayes
The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most
Supporting Online Material for
www.sciencemag.org/cgi/content/full/313/5786/504/dc1 Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks G. E. Hinton* and R. R. Salakhutdinov *To whom correspondence
Addressing Cold Start in Recommender Systems: A Semi-supervised Co-training Algorithm
Addressing Cold Start in Recommender Systems: A Semi-supervised Co-training Algorithm Mi Zhang,2 Jie Tang 3 Xuchen Zhang,2 Xiangyang Xue,2 School of Computer Science, Fudan University 2 Shanghai Key Laboratory
Large-Scale Similarity and Distance Metric Learning
Large-Scale Similarity and Distance Metric Learning Aurélien Bellet Télécom ParisTech Joint work with K. Liu, Y. Shi and F. Sha (USC), S. Clémençon and I. Colin (Télécom ParisTech) Séminaire Criteo March
CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht [email protected] 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht [email protected] 539 Sennott
Steven C.H. Hoi. School of Computer Engineering Nanyang Technological University Singapore
Steven C.H. Hoi School of Computer Engineering Nanyang Technological University Singapore Acknowledgments: Peilin Zhao, Jialei Wang, Hao Xia, Jing Lu, Rong Jin, Pengcheng Wu, Dayong Wang, etc. 2 Agenda
Data Mining Yelp Data - Predicting rating stars from review text
Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University [email protected] Chetan Naik Stony Brook University [email protected] ABSTRACT The majority
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
Predicting Soccer Match Results in the English Premier League
Predicting Soccer Match Results in the English Premier League Ben Ulmer School of Computer Science Stanford University Email: [email protected] Matthew Fernandez School of Computer Science Stanford University
Statistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
Principles of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
Parallelization Strategies for Multicore Data Analysis
Parallelization Strategies for Multicore Data Analysis Wei-Chen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management
Statistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
Discrete Frobenius-Perron Tracking
Discrete Frobenius-Perron Tracing Barend J. van Wy and Michaël A. van Wy French South-African Technical Institute in Electronics at the Tshwane University of Technology Staatsartillerie Road, Pretoria,
Dirichlet Processes A gentle tutorial
Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid El-Arini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.
Online Courses Recommendation based on LDA
Online Courses Recommendation based on LDA Rel Guzman Apaza, Elizabeth Vera Cervantes, Laura Cruz Quispe, José Ochoa Luna National University of St. Agustin Arequipa - Perú {r.guzmanap,elizavvc,lvcruzq,eduardo.ol}@gmail.com
Introduction to Logistic Regression
OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Feature Engineering and Classifier Ensemble for KDD Cup 2010
JMLR: Workshop and Conference Proceedings 1: 1-16 KDD Cup 2010 Feature Engineering and Classifier Ensemble for KDD Cup 2010 Hsiang-Fu Yu, Hung-Yi Lo, Hsun-Ping Hsieh, Jing-Kai Lou, Todd G. McKenzie, Jung-Wei
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Gaussian Processes to Speed up Hamiltonian Monte Carlo
Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo
COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments
Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for
Marketing Mix Modelling and Big Data P. M Cain
1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored
Validation of Software for Bayesian Models Using Posterior Quantiles
Validation of Software for Bayesian Models Using Posterior Quantiles Samantha R. COOK, Andrew GELMAN, and Donald B. RUBIN This article presents a simulation-based method designed to establish the computational
Visualization of Collaborative Data
Visualization of Collaborative Data Guobiao Mei University of California, Riverside [email protected] Christian R. Shelton University of California, Riverside [email protected] Abstract Collaborative data
The Chinese Restaurant Process
COS 597C: Bayesian nonparametrics Lecturer: David Blei Lecture # 1 Scribes: Peter Frazier, Indraneel Mukherjee September 21, 2007 In this first lecture, we begin by introducing the Chinese Restaurant Process.
Markov Chain Monte Carlo Simulation Made Simple
Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical
Science Navigation Map: An Interactive Data Mining Tool for Literature Analysis
Science Navigation Map: An Interactive Data Mining Tool for Literature Analysis Yu Liu School of Software [email protected] Zhen Huang School of Software [email protected] Yufeng Chen School of Computer
Gaussian Processes in Machine Learning
Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany [email protected] WWW home page: http://www.tuebingen.mpg.de/ carl
Big & Personal: data and models behind Netflix recommendations
Big & Personal: data and models behind Netflix recommendations Xavier Amatriain Netflix [email protected] ABSTRACT Since the Netflix $1 million Prize, announced in 2006, our company has been known to
Polynomial Neural Network Discovery Client User Guide
Polynomial Neural Network Discovery Client User Guide Version 1.3 Table of contents Table of contents...2 1. Introduction...3 1.1 Overview...3 1.2 PNN algorithm principles...3 1.3 Additional criteria...3
A Learning Based Method for Super-Resolution of Low Resolution Images
A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 [email protected] Abstract The main objective of this project is the study of a learning based method
Sampling for Bayesian computation with large datasets
Sampling for Bayesian computation with large datasets Zaiying Huang Andrew Gelman April 27, 2005 Abstract Multilevel models are extremely useful in handling large hierarchical datasets. However, computation
Towards running complex models on big data
Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation
Big Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
Iterative Splits of Quadratic Bounds for Scalable Binary Tensor Factorization
Iterative Splits of Quadratic Bounds for Scalable Binary Tensor Factorization Beyza Ermis Department of Computer Engineering Bogazici University 34342 Bebek, Turkey [email protected] Guillaume Bouchard
Fundamental Analysis Challenge
All Together Now: A Perspective on the NETFLIX PRIZE Robert M. Bell, Yehuda Koren, and Chris Volinsky When the Netflix Prize was announced in October of 6, we initially approached it as a fun diversion
A Logistic Regression Approach to Ad Click Prediction
A Logistic Regression Approach to Ad Click Prediction Gouthami Kondakindi [email protected] Satakshi Rana [email protected] Aswin Rajkumar [email protected] Sai Kaushik Ponnekanti [email protected] Vinit Parakh
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
Learning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
Impact of Boolean factorization as preprocessing methods for classification of Boolean data
Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,
Fast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
How I won the Chess Ratings: Elo vs the rest of the world Competition
How I won the Chess Ratings: Elo vs the rest of the world Competition Yannis Sismanis November 2010 Abstract This article discusses in detail the rating system that won the kaggle competition Chess Ratings:
Efficient Cluster Detection and Network Marketing in a Nautural Environment
A Probabilistic Model for Online Document Clustering with Application to Novelty Detection Jian Zhang School of Computer Science Cargenie Mellon University Pittsburgh, PA 15213 [email protected] Zoubin
