# Iterative Splits of Quadratic Bounds for Scalable Binary Tensor Factorization

Save this PDF as:

Size: px
Start display at page:

## Transcription

3 Figure 1: Comparison of the logistic function and the Gaussian distribution. The Gaussian distribution (the mean is and the variance is 5.25) approximates well the logistic function in a range of values (gray area in the range [-2,2]) that corresponds to probabilities between 0.12 and 0.88 (dotted lines). In most of recommender systems applications, the probability of correctly predicting user choice is in this range, which explains that RMSE loss is reasonable. through the data has been completed, but this case is relatively rare in practice. Our objective is to study methods that have a sub-linear complexity in the number of training sample, i.e. we can take into account all the Ω training samples while having a complexity that scales only in the number n + = Ω + of non-zero elements. In the following we show that this complexity can be obtained by using a quadratic approximation to the logistic loss, leading to considerable speedup in our experiments. 3 QUADRATIC LOSS: FAST BUT OFTEN INACCURATE As illustrated in Figure 1, the logistic loss can be reasonably approximated by a quadratic function, so the Root Mean Square Error (RMSE) should be a a good surrogate function to minimize. A naive computation of the square loss L(θ) = t Ω (y t z t (θ)) 2 would require O(KD d n d) operations, since there are d n d possible predictions, but simple algebra shows that it is equal to: L(θ) = t Ω(y t z t ) 2 (2) = n + 2 t Ω + z t (θ) + K K D k=1 k =1 d=1 M (d) kk (3) where the K K matrices M (d) are defined by M (d) kk := d j=1 θ(d) jk θ(d) jk. Hence, for tensors of high dimension, we get a significant speedup as Equation (3) can be computed in O(Kn + + K 2 d n d) operations.as an example, assume one wishes to compute the loss of a tensor containing 10 5 entries and the low-rank approximation has rank K = 100. Then, we can see that there are basic operations in the formula of Equation (2), while the formula of Equation (3) contains basic operations. This means that it will be 5000 times faster to compute exactly the same quantity! If we minimize the loss L(θ) with respect to θ using blockcoordinate descent, the iterations end up being least squares problem with a per-iteration complexity that scales linearly with the number of positive examples only. This corresponds exactly to the itals algorithm [22], which is the tensor generalization of the alternating least squares algorithm of [9]. In our experiments, we used gradient descent to minimize the objective function, using Equation (3) to compute the gradient efficiently (the complexity is the same as the function evaluation). 4 SPEED-ACCURACY TRADE-OFF BY BOUNDING SPLITS 4.1 UPPER BOUNDING THE LOSS To speed-up computation, we minimize a quadratic upper bound to the logistic loss. We use Jaakkola s bound to the logistic loss [10]: log(1 + e z ) λ(ξ)(z 2 ξ 2 ) (z ξ) + log(1 + eξ ), where λ(ξ) := 1 2ξ ( e ξ 2 ) and ξ is a variational parameter. We keep the same value for ξ for all the elements of the tensor Z, so that the upper bound has exactly the form required to apply the computational speedup described in the previous section. L(θ, ξ) = λ(ξ) t Ω ( z t 2y ) 2 t + c(ξ) 4λ(ξ) where c(ξ) is a constant function that does not depend on θ: ( c(ξ) = Ω log(1 + e ξ ) λ(ξ)ξ ξ 1 ) 16λ(ξ) The optimization of this bound with respect to ξ gives exactly the Frobenius norm of the tensor: z t (θ) 2 = arg min L(θ, ξ). (4) ξ t Ω Note that Z(θ) is low rank in general. This means that it can also be efficiently computed using the third term of Equation (3). We have now an upper bound to the original loss L that needs to be minimized: L(θ) L Ω (θ, ξ). (5) As usual with bound optimization, we alternate between two steps: 1) minimizing the bound with respect to the variational parameter ξ using closed form updates (e.g. when using Jaakkola s bound) or dichotomic search where

4 z b 1 (x) f(x) B 1 =({1,2,4,6},{1,3,4}) B 2 =({1,2,4,6},{2,5}) B 3 =({3,4},{1,2,3,4,5}) split B 1 B 1 =({1,6},{1,3,4}) B 2 =({1,2,4,6},{2,5}) B 3 =({3,4},{1,2,3,4,5}) B 4 =({2,4},{1,3,4}) b 2 (x) f(x) z z b 3 (x) f(x) Figure 2: Illustration of the idea of bound refinement. The main idea is that computing the integral of the quadratic upper bound (such as b 1 in the top graph) is much faster than computing the integral of f directly. To improve the accuracy, we use piecewise bounds. To choose the domain of the pieces, we use a greedy algorithm that identifies the partition that leads to the diminished upper bound (leading to the upper bounds b 2 and b 3. no closed form solution exists; and 2) minimizing the bound with respect to θ using a standard gradient descent algorithm. This algorithm is sometimes referred as Majorization-Minimization algorithm in the literature [16]. The algorithm minimizes the loss with a complexity per iteration equal to O ( Kn + + K 2 d n d) In the experiments, this algorithm is called Quad-App. It is detailed in Algorithm SPLIT THE DATA TO IMPROVE ACCURACY The drawback of the previous approach, even with one or two orders of magnitude speedups, the resulting quadratic approximation can be quite loose for some data, and the accuracy of the method can be too low. We give here a family of approximation that interpolates between this fast Figure 3: Example of matrix split to improve upper bound accuracy. Colors identify blocks. Numbers in the matrix represent predictions. The block 1 is decomposed into blocks 1 and 4, both having a small variance on the absolute value of the predictions. but inaccurate quadratic approximation and the slow but exact minimization of the non-quadratic loss. We propose to take advantage of the speedup due to the quadratic upper bound by applying it on a partition of the the original tensor: we select a set B = {B 1,, B B } of disjoint blocks that partitions the space of possible observation indices Ω. On each of these blocks, the bounding technique described in the previous section is applied, the main difference being that the minimization with respect to θ is done jointly on all the blocks. This process is illustrated in Figure 2. Formally, each block B b, b {1,, B } is identified by D sets of indices which represent the dimensions that are selected in the given block b. The split of these indices are illustrated in a toy matrix example in Figure 3. Blocks for tensors are computed the same way as matrices, but the depth indices are also splitted: At each refinement step, we choose to partition the rows indices, column indices or depth indices. To select the blocks, we use a greedy construction of the blocks: starting with a single block containing all the indices, i.e. B = {Ω}, we iteratively refine the blocks using the following two-step procedure, called RefineBlocks: 1. select the block b to split that has the maximal variance in the absolute values of the predictions z t ; 2. split the block b into two block so that the variance of the absolute values of the predictions z t is minimized. An example of such a split is shown in Figure 3. This method is fully described in Algorithm 1, which uses Al-

8 7 CONCLUSION There were several important techniques used in this paper: 1) the decomposition of the loss into a small positive part and a large but structured negative space; 2) the use of the squared norm trick that reduces the complexity of squared loss computation and 3) the use of the partitioning technique to gradually reduce the gap introduced by the usage of quadratic upper bounds for non-quadratic losses, particularly useful in the case of binary or count data. This combination of techniques can be applied in a broad range of other problems, such as probabilistic CCA, collectivematrix factorization, non-negative matrix factorization, as well as non-factorial models such as time series [7]. ACKNOWLEDGEMENTS This work has been supported by the Fupol and Fusepool FP7 European project. We thank anonymous reviewers, Jean-Marc Andreoli and Jos-Antonio Rodriguez for their comments. References [1] D. Bohning. Multinomial logistic regression algorithm. Annals of the Inst. of Statistical Math., 44:197200, [2] Antoine Bordes, Xavier Glorot, Jason Weston, and Yoshua Bengio. A semantic matching energy function for learning with multi-relational data. Machine Learning: Special Issue on Learning Semantics, [3] Antoine Bordes, Jason Weston, Ronan Collobert, and Yoshua Bengio. Learning structured embeddings of knowledge bases. In AAAI, [4] Iván Cantador, Peter Brusilovsky, and Tsvi Kuflik. 2nd workshop on information heterogeneity and fusion in recommender systems (hetrec 2011). In Proceedings of the 5th ACM conference on Recommender systems, RecSys 2011, New York, NY, USA, ACM. [5] Michael Collins, Sanjoy Dasgupta, and Robert E Schapire. A generalization of principal component analysis to the exponential family. In NIPS, volume 13, page 23, [6] Richard A Harshman and Margaret E Lundy. The PARAFAC model for three-way factor analysis and multidimensional scaling. Research methods for multimode data analysis, pages , [7] Joo F. Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. Exploiting the circulant structure of tracking-bydetection with kernels. In ECCV (4), pages , [8] Balzs Hidasi and Domonkos Tikk. Fast ALS-Based tensor factorization for context-aware recommendation from implicit feedback. ECML PKDD, page 6782, Bristol, UK, [9] Yifan Hu, Yehuda Koren, and Chris Volinsky. Collaborative filtering for implicit feedback datasets. In ICDM, page , [10] T Jaakkola and M Jordan. A variational approach to Bayesian logistic regression models and their extensions. In Sixth International Workshop on Artificial Intelligence and Statistics, [11] Rodolphe Jenatton, Nicolas Le Roux, Antoine Bordes, and Guillaume Obozinski. A latent factor model for highly multi-relational data. In NIPS, pages , [12] Toshihiro Kamishima. Nantonac collaborative filtering: Recommendation based on order responses. In Proceedings of the 9th International Conference on Knowledge Discovery and Data Mining, page ACM, August [13] Mohammad Emtiyaz Khan, Benjamin M. Marlin, Guillaume Bouchard, and Kevin P. Murphy. Variational bounds for mixed-data factor analysis. In NIPS, pages , [14] Ben London, Theodoros Rekatsinas, Bert Huang, and Lise Getoor. Multi-relational learning using weighted tensor decomposition with modular loss. arxiv preprint arxiv: , [15] Laurens Maaten, Minmin Chen, Stephen Tyree, and Kilian Q Weinberger. Learning with marginalized corrupted features. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), page , [16] Julien Mairal. Optimization with first-order surrogate functions. In ICML, [17] Benjamin M. Marlin, Mohammad Emtiyaz Khan, and Kevin P. Murphy. Piecewise Bounds for Estimating Bernoulli-Logistic Latent Gaussian Models. In ICML, pages , [18] Maximilian Nickel and Volker Tresp. Logistic tensor factorization for multi-relational data. CoRR, abs/ , [19] Maximilian Nickel and Volker Tresp. Tensor factorization for multi-relational learning. In Machine Learning and Knowledge Discovery in Databases, page Springer, [20] Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. A three-way model for collective learning on multirelational data. In ICML, page , [21] Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. Factorizing YAGO: scalable machine learning for linked data. In WWW, page , Lyon, France, [22] Istvn Pilszy, Dvid Zibriczky, and Domonkos Tikk. Fast ALS-based matrix factorization for explicit and implicit feedback datasets. In ACM RecSys, page 7178, [23] Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt- Thieme. Factorizing personalized Markov chains for nextbasket recommendation. In WWW, page , [24] Matthias Seeger and Guillaume Bouchard. Fast Variational Bayesian Inference for Non-Conjugate Matrix Factorization Models. Journal of Machine Learning Research - Proceedings Track, 22: , [25] Ajit P. Singh and Geoffrey J. Gordon. Relational learning via collective matrix factorization. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), page ACM, New York, NY, USA, 2008.

### Tensor Factorization for Multi-Relational Learning

Tensor Factorization for Multi-Relational Learning Maximilian Nickel 1 and Volker Tresp 2 1 Ludwig Maximilian University, Oettingenstr. 67, Munich, Germany nickel@dbs.ifi.lmu.de 2 Siemens AG, Corporate

### Bayesian Factorization Machines

Bayesian Factorization Machines Christoph Freudenthaler, Lars Schmidt-Thieme Information Systems & Machine Learning Lab University of Hildesheim 31141 Hildesheim {freudenthaler, schmidt-thieme}@ismll.de

### Factorization Machines

Factorization Machines Steffen Rendle Department of Reasoning for Intelligence The Institute of Scientific and Industrial Research Osaka University, Japan rendle@ar.sanken.osaka-u.ac.jp Abstract In this

Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains

### Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

### PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

### Machine learning challenges for big data

Machine learning challenges for big data Francis Bach SIERRA Project-team, INRIA - Ecole Normale Supérieure Joint work with R. Jenatton, J. Mairal, G. Obozinski, N. Le Roux, M. Schmidt - December 2012

### Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large

### Simple and efficient online algorithms for real world applications

Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,

### Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014

Parallel Data Mining Team 2 Flash Coders Team Research Investigation Presentation 2 Foundations of Parallel Computing Oct 2014 Agenda Overview of topic Analysis of research papers Software design Overview

### Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of

### CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

### A Logistic Regression Approach to Ad Click Prediction

A Logistic Regression Approach to Ad Click Prediction Gouthami Kondakindi kondakin@usc.edu Satakshi Rana satakshr@usc.edu Aswin Rajkumar aswinraj@usc.edu Sai Kaushik Ponnekanti ponnekan@usc.edu Vinit Parakh

### The Need for Training in Big Data: Experiences and Case Studies

The Need for Training in Big Data: Experiences and Case Studies Guy Lebanon Amazon Background and Disclaimer All opinions are mine; other perspectives are legitimate. Based on my experience as a professor

### Logistic Regression for Spam Filtering

Logistic Regression for Spam Filtering Nikhila Arkalgud February 14, 28 Abstract The goal of the spam filtering problem is to identify an email as a spam or not spam. One of the classic techniques used

### Cheng Soon Ong & Christfried Webers. Canberra February June 2016

c Cheng Soon Ong & Christfried Webers Research Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 31 c Part I

### Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

### The Data Mining Process

Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

### Multiple Kernel Learning on the Limit Order Book

JMLR: Workshop and Conference Proceedings 11 (2010) 167 174 Workshop on Applications of Pattern Analysis Multiple Kernel Learning on the Limit Order Book Tristan Fletcher Zakria Hussain John Shawe-Taylor

### Regression Using Support Vector Machines: Basic Foundations

Regression Using Support Vector Machines: Basic Foundations Technical Report December 2004 Aly Farag and Refaat M Mohamed Computer Vision and Image Processing Laboratory Electrical and Computer Engineering

### Identifying SPAM with Predictive Models

Identifying SPAM with Predictive Models Dan Steinberg and Mikhaylo Golovnya Salford Systems 1 Introduction The ECML-PKDD 2006 Discovery Challenge posed a topical problem for predictive modelers: how to

### Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

### CSE 494 CSE/CBS 598 (Fall 2007): Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye

CSE 494 CSE/CBS 598 Fall 2007: Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye 1 Introduction One important method for data compression and classification is to organize

### Federated Optimization: Distributed Optimization Beyond the Datacenter

Federated Optimization: Distributed Optimization Beyond the Datacenter Jakub Konečný School of Mathematics University of Edinburgh J.Konecny@sms.ed.ac.uk H. Brendan McMahan Google, Inc. Seattle, WA 98103

### Introduction to Online Learning Theory

Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent

### Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

### Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

### Big learning: challenges and opportunities

Big learning: challenges and opportunities Francis Bach SIERRA Project-team, INRIA - Ecole Normale Supérieure December 2013 Omnipresent digital media Scientific context Big data Multimedia, sensors, indicators,

### Bootstrapping Big Data

Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

### Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

### Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

### Support Vector Machines Explained

March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

### Introduction to Logistic Regression

OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction

### Beyond stochastic gradient descent for large-scale machine learning

Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - ECML-PKDD,

### Factorization Machines

Factorization Machines Factorized Polynomial Regression Models Christoph Freudenthaler, Lars Schmidt-Thieme and Steffen Rendle 2 Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim,

### Bilinear Prediction Using Low-Rank Models

Bilinear Prediction Using Low-Rank Models Inderjit S. Dhillon Dept of Computer Science UT Austin 26th International Conference on Algorithmic Learning Theory Banff, Canada Oct 6, 2015 Joint work with C-J.

1 1 Introduction Lecturer: Prof. Aude Billard (aude.billard@epfl.ch) Teaching Assistants: Guillaume de Chambrier, Nadia Figueroa, Denys Lamotte, Nicola Sommer 2 2 Course Format Alternate between: Lectures

### A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center

### Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University

### T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or

### Large-Scale Similarity and Distance Metric Learning

Large-Scale Similarity and Distance Metric Learning Aurélien Bellet Télécom ParisTech Joint work with K. Liu, Y. Shi and F. Sha (USC), S. Clémençon and I. Colin (Télécom ParisTech) Séminaire Criteo March

### Supporting Online Material for

www.sciencemag.org/cgi/content/full/313/5786/504/dc1 Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks G. E. Hinton* and R. R. Salakhutdinov *To whom correspondence

### Predict Influencers in the Social Network

Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

### EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

### Maximum-Margin Matrix Factorization

Maximum-Margin Matrix Factorization Nathan Srebro Dept. of Computer Science University of Toronto Toronto, ON, CANADA nati@cs.toronto.edu Jason D. M. Rennie Tommi S. Jaakkola Computer Science and Artificial

### (Quasi-)Newton methods

(Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

### Translating Embeddings for Modeling Multi-relational Data

Translating Embeddings for Modeling Multi-relational Data Antoine Bordes, Nicolas Usunier, Alberto Garcia-Durán Université de Technologie de Compiègne CNRS Heudiasyc UMR 7253 Compiègne, France {bordesan,

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

### Steven C.H. Hoi. School of Computer Engineering Nanyang Technological University Singapore

Steven C.H. Hoi School of Computer Engineering Nanyang Technological University Singapore Acknowledgments: Peilin Zhao, Jialei Wang, Hao Xia, Jing Lu, Rong Jin, Pengcheng Wu, Dayong Wang, etc. 2 Agenda

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Fast Maximum Margin Matrix Factorization for Collaborative Prediction

for Collaborative Prediction Jason D. M. Rennie JRENNIE@CSAIL.MIT.EDU Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA Nathan Srebro NATI@CS.TORONTO.EDU

### Neural Networks. CAP5610 Machine Learning Instructor: Guo-Jun Qi

Neural Networks CAP5610 Machine Learning Instructor: Guo-Jun Qi Recap: linear classifier Logistic regression Maximizing the posterior distribution of class Y conditional on the input vector X Support vector

### Big Data Analytics. Lucas Rego Drumond

Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Going For Large Scale Application Scenario: Recommender

### Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

### Clustering Very Large Data Sets with Principal Direction Divisive Partitioning

Clustering Very Large Data Sets with Principal Direction Divisive Partitioning David Littau 1 and Daniel Boley 2 1 University of Minnesota, Minneapolis MN 55455 littau@cs.umn.edu 2 University of Minnesota,

### Data Mining Practical Machine Learning Tools and Techniques

Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

### Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

### II. RELATED WORK. Sentiment Mining

Sentiment Mining Using Ensemble Classification Models Matthew Whitehead and Larry Yaeger Indiana University School of Informatics 901 E. 10th St. Bloomington, IN 47408 {mewhiteh, larryy}@indiana.edu Abstract

### Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 What is machine learning? Data description and interpretation

### Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step

### Blind Deconvolution of Corrupted Barcode Signals

Blind Deconvolution of Corrupted Barcode Signals Everardo Uribe and Yifan Zhang Advisors: Ernie Esser and Yifei Lou Interdisciplinary Computational and Applied Mathematics Program University of California,

### Predict the Popularity of YouTube Videos Using Early View Data

000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

### The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of

### The Scientific Data Mining Process

Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

### CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

### Standardization and Its Effects on K-Means Clustering Algorithm

Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03

### Big Data Science. Prof. Lise Getoor University of Maryland, College Park. http://www.cs.umd.edu/~getoor. October 17, 2013

Big Data Science Prof Lise Getoor University of Maryland, College Park October 17, 2013 http://wwwcsumdedu/~getoor BIG Data is not flat 2004-2013 lonnitaylor Data is multi-modal, multi-relational, spatio-temporal,

2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously

### Introduction to Machine Learning

Introduction to Machine Learning Prof. Alexander Ihler Prof. Max Welling icamp Tutorial July 22 What is machine learning? The ability of a machine to improve its performance based on previous results:

Data Mining: An Overview David Madigan http://www.stat.columbia.edu/~madigan Overview Brief Introduction to Data Mining Data Mining Algorithms Specific Eamples Algorithms: Disease Clusters Algorithms:

### Advanced Ensemble Strategies for Polynomial Models

Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer

### FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

### Getting Even More Out of Ensemble Selection

Getting Even More Out of Ensemble Selection Quan Sun Department of Computer Science The University of Waikato Hamilton, New Zealand qs12@cs.waikato.ac.nz ABSTRACT Ensemble Selection uses forward stepwise

### Impact of Boolean factorization as preprocessing methods for classification of Boolean data

Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,

### INTRODUCTION TO NEURAL NETWORKS

INTRODUCTION TO NEURAL NETWORKS Pictures are taken from http://www.cs.cmu.edu/~tom/mlbook-chapter-slides.html http://research.microsoft.com/~cmbishop/prml/index.htm By Nobel Khandaker Neural Networks An

### Classifying Chess Positions

Classifying Chess Positions Christopher De Sa December 14, 2012 Chess was one of the first problems studied by the AI community. While currently, chessplaying programs perform very well using primarily

### Cross Validation. Dr. Thomas Jensen Expedia.com

Cross Validation Dr. Thomas Jensen Expedia.com About Me PhD from ETH Used to be a statistician at Link, now Senior Business Analyst at Expedia Manage a database with 720,000 Hotels that are not on contract

### Probabilistic Matrix Factorization

Probabilistic Matrix Factorization Ruslan Salakhutdinov and Andriy Mnih Department of Computer Science, University of Toronto 6 King s College Rd, M5S 3G4, Canada {rsalakhu,amnih}@cs.toronto.edu Abstract

### Collective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University

Collective Behavior Prediction in Social Media Lei Tang Data Mining & Machine Learning Group Arizona State University Social Media Landscape Social Network Content Sharing Social Media Blogs Wiki Forum

### BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION

BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION Ş. İlker Birbil Sabancı University Ali Taylan Cemgil 1, Hazal Koptagel 1, Figen Öztoprak 2, Umut Şimşekli

### Response prediction using collaborative filtering with hierarchies and side-information

Response prediction using collaborative filtering with hierarchies and side-information Aditya Krishna Menon 1 Krishna-Prasad Chitrapura 2 Sachin Garg 2 Deepak Agarwal 3 Nagaraj Kota 2 1 UC San Diego 2

### Invited Applications Paper

Invited Applications Paper - - Thore Graepel Joaquin Quiñonero Candela Thomas Borchert Ralf Herbrich Microsoft Research Ltd., 7 J J Thomson Avenue, Cambridge CB3 0FB, UK THOREG@MICROSOFT.COM JOAQUINC@MICROSOFT.COM

### Stock Market Forecasting Using Machine Learning Algorithms

Stock Market Forecasting Using Machine Learning Algorithms Shunrong Shen, Haomiao Jiang Department of Electrical Engineering Stanford University {conank,hjiang36}@stanford.edu Tongda Zhang Department of

### Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014

### IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 7, JULY 2009 1181

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 7, JULY 2009 1181 The Global Kernel k-means Algorithm for Clustering in Feature Space Grigorios F. Tzortzis and Aristidis C. Likas, Senior Member, IEEE

### Regularized Logistic Regression for Mind Reading with Parallel Validation

Regularized Logistic Regression for Mind Reading with Parallel Validation Heikki Huttunen, Jukka-Pekka Kauppi, Jussi Tohka Tampere University of Technology Department of Signal Processing Tampere, Finland

### Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

### Linear Classification. Volker Tresp Summer 2015

Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

### Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

### Data Mining. Nonlinear Classification

Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

### Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

### Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

M A C H I N E L E A R N I N G P R O J E C T F I N A L R E P O R T F A L L 2 7 C S 6 8 9 CLASSIFICATION OF TRADING STRATEGIES IN ADAPTIVE MARKETS MARK GRUMAN MANJUNATH NARAYANA Abstract In the CAT Tournament,