Factorization Machines
|
|
|
- Brandon Hunt
- 10 years ago
- Views:
Transcription
1 Factorization Machines Factorized Polynomial Regression Models Christoph Freudenthaler, Lars Schmidt-Thieme and Steffen Rendle 2 Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, 34 Hildesheim, Germany 2 Social Network Analysis, University of Konstanz, Konstanz, Germany {freudenthaler,schmidt-thieme}@ismll.uni-hildesheim.de [email protected] Abstract. Factorization Machines (FM) are a new model class that combines the advantages of polynomial regression models with factorization models. Like polynomial regression models, FMs are a general model class working with any real valued feature vector as input for the prediction of real-valued, ordinal or categorical dependent variables as output. However, in contrast to polynomial regression models, FMs replace two-way and all other higher order interaction effects by their factorized analogues. The factorization of higher order interactions enables efficient parameter estimation even for sparse datasets where just a few or even no observations for those higher order effects are available. Polynomial regression models without this factorization fail. This work discusses the relationship of FMs to polynomial regression models and the conceptual difference between factorizing and non-factorizing model parameters of polynomial regression models. Additionally, we show that the model equation of factorized polynomial regression models can be calculated in linear time and thus FMs can be learned efficiently. Apart from polynomial regression models, we also focus on the other origin of FMs: factorization models. We show that the standard factorization models matrix factorization and parallel factor analysis are a special case of FMs and additionally how recently proposed and successfully applied factorization models like SVD++ can be represented by FMs. The drawback of typical factorization models is that they are not applicable for general prediction tasks but work only for categorical input data. Furthermore their model equations and optimization algorithms are derived individually for each task. By knowing that FMs can mimic these models just by choosing the appropriate input data, we finally conclude that deriving a learning algorithm, e.g., Stochastic Gradient Descent (SGD) or Alternating Least Squares (ALS), for FMs once is sufficient to get the learning algorithms for all factorization models automatically, thus saving a lot of time and effort. Introduction Recommender systems are an important feature of modern websites. Especially, commercial websites benefit from a boost in customer loyalty, click-through rates
2 2 and revenue when implementing recommender systems. For example online shopping websites like Amazon give each customer personalized product recommendations that the user is probably interested in. Other examples are video portals like YouTube that recommend movies to customers. For the majority of researchers and practitioners in the field of recommender systems, the main aim of their research has been to accurately predict ratings. This focus on rating prediction, i.e. predicting a metric, real valued variable, was particularly triggered by the Netflix prize challenge which took place from 26 to 29. The task of the Netflix challenge was to predict ratings of users for movies, the movie rental company Netflix offered. The challenge was structured into three different data sets: the training dataset consisting of more than million ratings of almost 5 registered users for 7 7 different movies, a held-out validation set and a test set. The first team which was able to improve Netflix standard recommender system by % in terms of RMSE on the provided test set was awarded the grand prize of one million US-$. The Netflix prize especially popularized factorization models since the best approaches [, 3, 2] for the Netflix challenge are based on this model class. Also on the ECML/PKDD discovery challenge 3 for tag recommendation, a factorization model based on tensor decomposition [] has outperformed the other approaches. Albeit their predictive power, factorization models have the disadvantage that they have to be devised individually for each problem and each set of categorical predictors. This also requires individual derivations of learning algorithms such as stochastic gradient descent [5, 6] or Bayesian inference such as Gibbs sampling [2, 6] for each factorization model. Generalizing factorization models to polynomial regression models with factorized higher-order effects, i.e. Factorization Machines (FM) [8], solves this caveat and provides in total the following advantages over hand-crafted factorization models: Categorical, ordinal, and metric predictors are possible Single derivation of a learning algorithm is valid for all emulated factorization models Emulation of state-of-the-art factorization models by selection of appropriate input feature vector Compared to polynomial regression models, FMs bear other important improvements: Model equation can be computed in linear time Number of latent dimensions can be used to fine-tune the amount of expressiveness for higher-order interaction effects For a sufficient number of latent dimensions, FMs are as expressive as polynomial regression models Factorization of parameters allows for estimation of higher order interaction effects even if no observations for the interaction are available 3
3 3 Thus, the general class of FMs brings together the advantages of polynomial regression models and factorization machines: general applicability and predictive accuracy. Moreover, inference in factorized polynomial regression models can be understood as an intertwined combination of a Principal Component Analysis like feature selection step and a linear regression step on the resulting latent feature space representations. Please note, that in contrast to PCA regression both steps are intertwined. This paper is structured as follows. First, we repeat polynomial regression, go on to factorized polynomial regression, and finally introduce FMs. Next, we compare FMs to state-of-the-art factorization models and show how these models can be emulated by FMs. During this process, we discuss the conceptual properties commonalities and differences between polynomial regression, factorized polynomial regression, FMs and factorization models. Finally, in our evaluation section we will make use of previously published results on the Netflix dataset to show the equivalence of state-of-the-art factorization models to their emulating FM counterparts. This is possible since after the challenge has ended, the Netflix dataset including the ratings on test were published which made it a standard dataset for research on recommender systems. Many publications on factorization models are based on the provided train-validation-test split. Besides the equivalence study, we show results on some new factorization models which we easily developed by applying appropriate features and compare them to their non-factorized counterparts. Please note that except for rating prediction, factorization models may also be used for item prediction, i.e. classifying items into relevant/non-relevant or sorting items according to an estimated, ordinal relevance criterion. Without loss of generality, we will deal in the evaluation section with the rating prediction case since the models for rating and item prediction are identical, just the applied loss function changes. 2 Factorization Machines - Properly Factorized Polynomial Regression Factorization Machines can be seen as equivalent to polynomial regression models where the model parameters β are replaced by factorized representations thereof. 2. Polynomial Regression The well-known polynomial regression model of order O for an input vector X IR p+ consisting of all p predictor variables and the intercept term indicator x = 4, reads for o = 3: 4 Please note, that we use upper case Latin letters for denoting variables in general as well as matrices and lower case Latin letters for specific realizations of a variable. Bold printed variables are vectors or matrices.
4 4 y(x β) = β x + β i x i + i= i= j i β i,j x i x j + i= j i l j β i,j,l x i x j x l () Obviously, this model is an extension of the standard linear regression model. It can represent any non-linear relationship up to order O between the input parameters X and the target variable Y IR. A problem of polynomial regression models is the estimation of higher-order interaction effects because each interaction effect can only be learned iff these interactions are explicitly recorded in the available data set. Typically, this is not an issue for real-valued predictors since for each of these effects all training data points (except missing values) provide information about this interaction. However, for categorical predictors this is not the case because for each category linear effects are created separately, and each effect can only be estimated from observations of the related category. Training rows in which the categorical variable is not observed do not contribute to the estimation of the corresponding categorical effect. This sparsity problem for categorical variables gets even worse if higher order interaction effects of categorical variables are considered. For example, the estimation of two-way interaction effects for two categorical variables requires both categories to be observed in train. Obviously, this is a more strict requirement than for linear effects, leading to less training instances fulfilling it. This sparsity problem worsens with the order of an effect since the number of appropriate training rows further decreases. Equation 2 gives an example of a movie rental service for a design matrix X containing real-valued and categorical variables. The first two columns show age of lender (in years) and age of selected movie (in months), i.e. two realvalued predictors. The last five columns show the ID of 5 different lenders, i.e. realizations of a categorical variable with five states. While higher-order interaction effects for the real-valued variables are not critical since all train rows can be used, estimation of categorical effects is problematic as the number of appropriate train rows declines with the number of categories X = (2) Typically, the number of users is much higher. For recommender systems it is several (hundreds of) thousands. Thus, the number of predictors gets very large and the sparsity for estimating categorical linear effects gets worse. If one wants to include also the IDs for lent movies into the polynomial regression model, i.e. into the design matrix, this would finally give a matrix with at least several thousands of columns - only for the linear regression model. In the case
5 5 of the Netflix dataset the number of users is 48 and the number of movies is 7 7. For a polynomial regression model of order o = 2 estimation would become infeasible as there would be 8.5 billion predictors just for the user-item interaction. Figure depicts such an extended design matrix including dummy variables for each user and for each movie, fractions summing to one for all other movies rated by the same user, a real number for the age of the movie, and an indicator for the last rated movie. Feature vector x Target y x () y () x (2) y (2) x (3) y (2) x (4) y (3) x (5) y (4) x (6) y (5) x (7) A B C TI NH SW ST TI NH SW ST User Movie Other Movies rated Time TI NH SW ST Last Movie rated y (6) Fig.. Example for an extended design matrix having categorical variables user ID, item ID and last rated movie ID as well as age as real-valued predictors. To summarize, for polynomial regression models the number of predictors is polynomial of order O in the number of predictor variables p which is particularly harmful for any categorical interaction effect because its amount of training data is not the training data size but the number of occurrences of that categorical interaction. Please note, that the data sparsity problem of categorical predictors holds also true for real-valued predictors if splines and interactions of those splines are used and if the support of the spline is a proper subset of the real numbers. Besides data sparsity, another caveat of polynomial regression models is their computation time. It is polynomial in O. These shortcomings are tackled by factorizing higher-order interaction effects. 2.2 Factorized Polynomial Regression The rationale behind factorized polynomial regression is to remove the necessity of observing interactions for estimating interaction effects. While for polynomial regression models this necessity is active and leads to poor parameter estimates for higher-order interactions as discussed in the previous section, factorized polynomial regression models detach higher-order interaction effect estimation from
6 6 the occurrence of that interaction in train which improves parameter estimation in sparse data settings considerably. The approach of factorized polynomial regression is to replace all effects by their factorized representations. This yields for o = 3: k k 2 k 3 y(x β, V) = β x + v i,f x i + v i,f v j,f x ix j + v i,f v j,f v l,f x ix jx l i= f= i= j i f= i= j i l j f= (3) Since parameter factorization β i = k f= v i,f x i of one-way effects is redundant, the number of latent dimensions is fixed with k = equivalent to no factorization. In general, effects of different order have different numbers of factorization dimensions k,..., k o. This is because all effects of the same order o are factorized using the multi-modal parallel factor analysis (PARAFAC) factorization model of Harshman [4], which is defined as: β i,,i o = k o o f= j= Figure 2 depicts PARAFAC models for o {2, 3}. v ij,f (4) T = U I U I T U VU * VI U VU VI VT U, I U, I,T = * * I k k I k k k Fig. 2. PARAFAC: One parallel factor analysis model is used for each degree of interaction o =,..., o. The left panel shows the PARAFAC model for o = 2 and the decomposition of the user-item interaction. The right panel shows PARAFAC for o = 3 and the decomposition of a user-item-point-in-time interaction. The replacement of all model parameters β by factorized versions thereof replaces the original parameters β by a set of latent factor variables 5 V = {V o : o =,..., o} V o IR p k o o =,..., o (5) Additionally, to emphasize that k =, i.e. there is no change in the parametrization of linear effects, we change notation for the one-column matrix V IR p to the parameter vector w IR p and also for β to w. This changes 3 to: 5 Note that each matrix has p instead of p+ rows as the intercept β is not factorized.
7 7 k 2 k 3 y(x V) = w x + w ix i + v i,f v j,f x ix j + v i,f v j,f v l,f x ix jx l i= i= j i f= i= j i l j f= (6) Thus, each predictor has for each order o a set of latent features. This property of factorized polynomial regression models entails several advantages: The number of latent dimensions can be used to fine-tune the amount of expressiveness for higher-order interaction effects. For a sufficient number of latent dimensions, factorized polynomial regression is as expressive as polynomial regression models. Factorization of parameters allows for estimation of higher order interaction effects even if no observations for the interaction are available. Fast computation of the model By selecting the number of latent dimensions, the complexity of the corresponding interaction effects can be adjusted. Thus, the complexity of the model is more fine-grained. The number of latent dimensions, i.e. the optimal model complexity, is typically determined by cross-validation. Obviously, if the number of latent dimensions is large enough, the model complexity of factorized polynomial regression can be equal to polynomial regression. Besides the adaptive model complexity, another advantage is a more efficient parameter estimation of higher-order interactions. As discussed in section 2. higher-order interactions suffer from data sparsity because with increasing order of an interaction effect the available data for parameter estimation decreases. However, in the case of factorized effects this correlation is eliminated, e.g., a factorized user-item interaction is learned as soon as either the corresponding item ID or user ID is observed whereas in the non-factorized case both item ID and user ID have to be observed jointly. The last advantage of factorized polynomial regression compared to standard polynomial regression is that model predictions can be computed in linear time. Typically, the runtime complexity of calculating the prediction function is polynomial in the number of parameters, i.e. O(p o ). By factorization of the model parameters this complexity can be reduced to O(p o i= k o) by rearranging terms in equation 6 to y(x V) =w x + k 2 w ix i + v i,f x i i= f= i= j i } {{ } A k 3 v j,f x j + v i,f x i f= i= v j,f x j j i v l,f x l l j } {{ } B (7)
8 8 and exploiting that A and B can be calculated in linear time: ( A = k 2 ) 2 v i,f x i + vi,f 2 x 2 i 2 f= i= i= ( k 3 ) 3 ( B = v i,f x i + 5 ) 2 vi,f 3 x 3 i,f + v i,f x i,f v i,f x i,f f= i= i= Thus, model prediction is linear in order o and speeds up for large p. The reason for the speed up is that the nested sums can be decomposed into linear combinations of fast to compute (O(p)) functions of the latent factor variables. This decomposition can be derived by combinatorial analysis. For increasing order o this derivation is still possible but becomes a more and more tedious task, however, for standard applications like recommender systems it is not necessary to derive them for higher-order interactions as already two-way interactions typically suffice. i= i= (8) 2.3 Factorization Machines - Properly Factorized Polynomial Regression So far we have discussed factorized polynomial regression and its appealing advantages. However, up to now we have ignored an important problem of factorized polynomial regression using the PARAFAC factorization model: they are unable to model negative effects for polynomial terms like x 2 i, x4 i,..., i.e. for terms of a polynomial with even exponents. That is why we call factorized polynomial regression improper. A proper solution for that issue represent Factorization Machines discussed in this section. Factorization Machines were introduced by Rendle [8]. They follow the idea of (improperly) factorized polynomial regression, thus leveraging all previously described advantages, i.e. adaptive model complexity, faster model prediction, and efficient parameter estimation in sparse data settings. However, FMs exclude those effects that take into account squares, cubes, etc. of the same predictor variable, e.g. x 2 i, x3 i, or x2 i x j, which leads to the following model equation for FMs: k 2 k 3 y(x Θ) = w x + w ix i + v i,f v j,f x ix j + v i,f v j,f v l,f x ix jx l (9) i= i= j>i f= i= j>i l>j f= This has no impact on the advantages of factorized polynomial regression models discussed in the previous section, but has two important advantages: Properness: FMs eliminate the major drawback, the improperness of factorized polynomial regression, i.e. the inability of them to represent negative effects for polynomial functions like x 2 i, x4 i, etc. Please note, that this does not eliminate
9 9 the ability of FMs to model this kind of higher-order interactions. If one wants to model such an effect, this can be done easily by adding the corresponding transformed predictor, e.g., x 2 i, to the set of predictors, i.e. extending the design matrix. Multilinearity: FMs are multilinear in their parameters θ Θ = {w, V}, thus the model equation can be rewritten for each parameter as: y(x Θ) = g (x, Θ \ θ) + θh (x, Θ \ θ) () where both functions g (x, Θ \ θ) and h (x, Θ \ θ) depend on the selected parameter θ. For a better understanding of both functions, we rewrite y(x Θ) g (x, Θ \ θ) = θh (x, Θ \ θ) () and get the semantics of y g (x, Θ \ θ) as the residual error without parameter θ, i.e. θ = and h (x, Θ \ θ) as the gradient of FMs w.r.t. θ. In [] more detailed examples for both functions w.r.t. FMs are given. This general, yet simple, dependency of all parameters θ Θ makes parameter inference much easier as exploited in [8] for stochastic gradient descent (SGD) and in [] for alternating least squares (ALS) method. 2.4 Factorization Machines vs. Factorization Models So far factorization models have been devised for each problem independently but no effort for generalizing all of these models has been done. This results in many different papers, e.g., [5, 6, 3, 2, 9, ] each defining a new model and a tailored learning algorithm for each model. Having FMs this is not necessary anymore like it is not necessary to invent a polynomial regression model for each a new set of predictors. By deriving a learning algorithm for FMs all subsumed factorization models can be learned as well. In this section, we show the relation between FMs and other factorization models to demonstrate how easy state-of-the-art models can be represented as FMs. In general, the best-known factorization models are tensor factorization (like Tucker Decomposition [5], PARAFAC [4] or Matrix Factorization) that factorize an m-ary relation over m categorical variables. From a feature point of view these models require feature vectors x that are divided in m groups and in each group exactly one value has to be and the rest. All existing factorization models are derived from those more general decomposition models. However, these factorization models are not as general as FMs as they are devised for categorical predictors only. Moreover, they operate with a fixed order of interactions, e.g., 2-way interaction for matrix decomposition or 3-way interactions for 3-mode tensor decomposition. In the following, we will show according to [8] how state-of-the-art factorization models are subsumed by FMs by (i) defining a real-valued feature vector x that is created from transaction data S and (ii) analyzing what the factorization machine would exactly correspond to with such features.
10 Matrix Factorization Matrix factorization (MF) is one of the most studied factorization models, e.g., [4, 5, 2, 7]. It factorizes a relationship between two categorical variables, e.g., a set of users U and a set of items I: with model parameters Θ MF : y(u, i Θ MF ) := v u,f v i,f (2) f= V U IR U k, V I IR I k (3) An extension of this model, e.g., [5], adds bias terms on the variables: y(u, i Θ MF b ) := w + w u + w i + v u,f v i,f (4) f= with the following parameters extending Θ MF to Θ MF b : w IR, w U IR U, w I IR I This model is exactly the same as a factorization machine with o = 2 and two categorical predictors, i.e. users and items. Figure 3 shows the corresponding design matrix. With the corresponding feature vectors x, the FM becomes: y(x Θ) = w + w j x j + j= = w + w u + w i + j= l>j f= v j,f v l,f x j x l v u,f v i,f (5) which is the same as the bias matrix factorization model (eq. (4)). The reason is that x j is only non-zero for user u and item i, so all other biases and interactions vanish. Tensor Factorization Tensor factorization (TF) models generalize MF to an arbitrary number of categorical variables. Lets assume that we have a relation over m categorical variables: X... X m, where x i {x i,, x i,2,...}. The parallel factor analysis model (PARAFAC) [4] for a realization of an m-tuple (x,..., x m ) is defined as: with parameters Θ P ARAF AC : f= y(x,..., x m Θ P ARAF AC ) := f= i= V IR X k,..., V m IR Xm k m v xi,f (6)
11 Feature vector x x () x (2) x (3) x (4) x (5) x (6) x (7) A B C TI NH SW ST User Movie Fig. 3. Design matrix of FM emulating a biased matrix factorization. The factorization machine (o := m) includes the PARAFAC model if it uses all categories as single predictors (cf. fig. 4). Then, instead of just factorizing m-way interactions like PARAFAC, FM factorizes all interaction up to order o = m: The matrix and tensor factorization models described so far work only on 2-ary or m-ary relations over categorical variables. They cannot work, e.g., with sets of categorical variables or continuous attributes like FMs do. There are several specialized models that try to integrate other kind of information in basic tensor factorization models. Next, we describe one of them. For further mappings of succesfull factorization models into FM please refer to [8]. Feature vector x x () x (2) x (3) x (4) x (5) x (6) x (7) A B C S S2 S3 S4 T T2 T3 T4 User Song Tag Fig. 4. Design matrix of FM emulating a mode-3 tensor. The modes are user IDs, song IDs and tag IDs. This design matrix might be used for tag recommendation, i.e. to predict the propensity of a user to assign a specific tag to the currently selected song.
12 2 SVD++ For the task of rating prediction, i.e. regression, Koren [5] improves the matrix factorization model to the SVD++ model. In our notation the model can be written as: y(u, i Θ SV D++ ) := w + w u + w i + v i,f v u,f + f= f= l N u v i,f v l,f (7) where N u is the set of all movies the user has ever rated and the model parameters are Θ SV D++ : w IR, w U IR U, w I IR I, V IR ( U +2 I ) k (8) The information of N u can also be encoded in the feature vector x using indicator variables for N u. With the design matrix of fig. 5 the FM (o = 2) looks like: y(x Θ) = w + w u + w i + + Nu f= w l + l N u v u,f v i,f + Nu v u,f v l,f f= l N u f= + N u v i,f v l,f l N u l N u,l >l f= v l,f v l,f (9) Comparing this to the SVD++ (eq. (7)) one can see, that the factorization machine with this feature vector models exactly the same interactions as the SVD++ but the FM contains also some additional interactions between users and movies N u as well as basic effects for the movies N u and interactions between pairs of movies in N u. This is a general observation when comparing FMs to factorization models. It is possible to model all factorization models but due to the completeness of FMs in terms of all interaction effects, they typically include some additional interactions. In total, FMs and factorization models have the following differing properties: Standard factorization models like PARAFAC or MF are not general prediction models like factorization machines. Instead they require that the feature vector is partitioned in m parts and that in each part exactly one element is and the rest. There are many proposals for specialized factorization models designed for a single task. Factorization machines can emulate many of the most successful factorization models (including MF, SVD++, etc.) just by feature extraction. Thus, developing one learning algorithm for FMs suffices to get learning algorithms for all subsumed factorization models automatically. 3 Experiments After developing FMs and discussing their relationship to (factorized) polynomial regression and some factorization models we show in this section results
13 3 Feature vector x x () x (2) x (3) x (4).5.5 x (5).5.5 x (6).5.5 x (7).5.5 A B C TI NH SW ST TI NH SW ST User Movie Other Movies rated Fig. 5. Design Matrix of FM emulating SVD++. on the Netflix dataset, thus empirically comparing FMs to some factorization models and polynomial regression models (fig. 6). For all shown FMs we set o = 2. The left panel of fig. 6 shall show the ease with which different factorization models can be represented just by selecting different design matrices. By comparing them to standard factorization models (cf. 6, left panel) it becomes clear that both are equivalent. The right panel shows how FMs outperform non-factorized polynomial regression models in sparse data settings. The statistics for the Netflix data set are as follows: Dataset # users # movies size train size test Netflix FMs vs. state-of-the-art Factorization Models The left panel of fig. 6 shows the evolution of five different models on Netflix over different sizes of the latent space k. Since all models are of order o = 2, the size of the latent space is determined by k = k 2. Results for MF and SVD++ are taken from the literature [2, 5], the other three models show FMs with different feature vectors. The names of the FMs indicate the chosen features: u... user IDs i... item IDs (movie IDs) t... day when rating happened f... natural logarithm of the number of ratings a user has made on the day the sought rating has happened
14 4 Netflix Netflix RMSE MF (u,i) SVD++ (u,i,t) (u,i,t,f) RMSE FM: (u,i) FM: (u,i,t) FM: (u,i,t,f) PR: (u,i) PR: (u,i,t,f) k (#latent Dimensions) k (#latent Dimensions) Fig. 6. Empirical comparison of several instances of FMs to selected factorization models (left panel) and to non-factorized polynomial regression models (PR, right panel). By comparing the accuracy of MF and (u, i) the equivalence of both models can be seen. However, the FM model with features (u, i) is always slightly better than MF (eq. 3) which can be attributed to the additional bias terms for users and items (eq. 5). The accuracies of the remaining models indicate that the additional features time t, frequency f, and co-rated items (SVD++) entail several significantly different two-way interactions of the standard user and item features with time, frequency or co-rated items, i.e. user-time, item-time, user-frequency, etc. This can be seen from the increasing gap between the (u, i)-models and the extended models as the number of latent dimensions increases. The higher the complexity, i.e. k, the better the predictive accuracy. This means that there exist quite some significant two-way interactions besides the user-item interaction which could not be captured with a low number of latent dimensions since the smaller the latent space for the modeled interaction, the lower their possible variety. 3.2 FMs vs. Polynomial Regression Please note, the Netflix dataset is not amenable for non-factorized polynomial regression models of order o > if the chosen predictors are categorical variables with too many categories and/or too few observations for those interactions like users and items as the corresponding estimated model parameter would suffer from too sparse data, i.e. overfitting. For instance, for the user-item interaction the Netflix dataset contains at most one observation of a user-item rating, but typically no observation at all. Even worse, if the rating for an item by a user is sought in test, there is no rating for that user-item pair available in train. This is a prime example for many recommender systems, e.g., online shops, where customers buy an item, e.g., movies, songs, etc., at most once. With those data
15 5 two-way or higher order interaction effects cannot be reliably estimated. Just simple linear effects are feasible. Thus, in the case of Netflix where only user IDs, item IDs and the time of rating are available for predicting the dependent variable, polynomial regression models have to drop the user-item-interaction to avoid overfitting. Therefore, the polynomial regression model with only user ID and item ID (P R : (u, i)) as predictors in the right panel of fig. 6 is linear. The other polynomial regression model (P R : (u, i, t, f)) is able to include two-way interactions between either user ID or item ID and the time-derived covariates t and f because there are enough observations per interaction available. In the right panel of fig. 6 we compare predictive accuracies of both non-factorized polynomial regression models and their factorized siblings which are able to estimate two-way interaction effects from sparse data, i.e., the user-item interaction effect for Netflix. The first empirical finding is, that the polynomial regression model (u, i, t, f) including time-derived predictors gets significantly better results than the simpler linear (u, i)-model. This is due to the added time but more importantly due to the added two-way effects as can be seen from the left panel of fig. 6 and from the discussion in the previous section 3. about the increasing gap between the user-item model and the time-enhanced models: For k =, i.e. only linear effects, the difference in accuracies is small, however, with increasing complexity of the modeled two-way interactions, i.e. k, the accuracy increases as well. Thus, in addition to a second-order user-item interaction there are other relevant second-order interaction effects in the data. The second important finding is, that FMs extremely outperform their nonfactorized siblings. The reason is that the second order interaction effect between users and items is very important for rating prediction as different users have different preferences for items which linear effects are unable to capture. 4 Conclusion In this paper we have discussed the relationship of polynomial regression to factorized polynomial regression and Factorization Machines. There exist several advantages of factorized polynomial regression to standard polynomial regression. Especially in sparse data settings higher-order effect factorization improves the predictive accuracy. By introducing factorized polynomial regression, particullarly FMs, the derivation of single factorization models and their learning algorithms is no longer needed. It is enough to specify the feature vectors and a preselected learning algorithm such as SGD or ALS can be used to train all factorization models. Implementations of FMs for SGD are already available 6, thus FMs are an important generalization step saving a lot of time and effort for practitioners and researchers. 6
16 6 References [] R. M. Bell, Y. Koren, and C. Volinsky. The BellKor solution to the Netflix Prize, 29. [2] A. Töscher, M. Jahrer, and R. M. Bell. The BigChaos Solution to the Netflix Grand Prize, 29. [3] M. Piotte, and M. Chabbert. The Pragmatic Theory Solution to the Netflix Grand Prize, 29. [4] R. A. Harshman. Foundations of the PARAFAC procedure: models and conditions for an exploratory multimodal factor analysis. In UCLA Working Papers in Phonetics, pages 84, volume 6, 97. [5] Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceeding of the 4th ACM SIGKDD international conference on Knowledge discovery and data mining, pages , New York, 28. ACM. [6] Y. Koren. Collaborative filtering with temporal dynamics. In Proceedings of the 5th ACM SIGKDD international conference on Knowledge discovery and data mining, pages , New York, 29. ACM. [7] R. Pan, Y. Zhou, B. Cao, N. N. Liu, R. M. Lukose, M. Scholz, and Q. Yiang. One-Class Collaborative Filtering. In Proceedings of the IEEE International Conference on Data Mining, pages 52 5, 28. [8] S. Rendle. Factorization Machines. In Proceedings of the th IEEE International Conference on Data Mining, pages 995, 2. [9] S. Rendle, C. Freudenthaler, and L. Schmidt-Thieme. Factorizing Personalized Markov Chains for Next-Basket Recommendation. In Proceedings of the 9th international conference on World wide web, pages 8 82, 2, New York. ACM. [] S. Rendle, Z. Gantner, C. Freudenthaler, and L. Schmidt-Thieme. Fast Contextaware Recommendations with Factorization Machines. In Proceedings of the 34th ACM SIGIR Conference on Reasearch and Development in Information Retrieval, 2, New York. ACM. [] S. Rendle, and L. Schmidt-Thieme. Pairwise interaction tensor factorization for personalized tag recommendation. In Proceedings of the third ACM international conference on Web search and data mining, pages 8 9, 2, New York. ACM. [2] R. Salakhutdinov, and A. Mnih. Bayesian Probabilistic Matrix Factorization using Markov chain Monte Carlo. In Proceedings of the 25th International Conference on Machine Learning, 28. [3] R. Salakhutdinov, and A. Mnih. Probabilistic Matrix Factorization. In Proceedings of Advances in Neural Information Processing Systems, 28. [4] N. Srebro, J. D. M. Rennie, and T. S. Jaakola. Maximum-Margin Matrix Factorization. In Advances in Neural Information Processing Systems, pages , 25. MIT Press. [5] L. R. Tucker. MSome mathematical notes on three-mode factor analysis. In Psychometrika, pages 279 3, volume 3, 966. [6] L. Xiong, X. Chen, T. Huang, J. Schneider, and J. G. Carbonell. Temporal Collaborative Filtering with Bayesian Probabilistic Tensor Factorization. In Proceedings of the SIAM Conference on Data Mining, pages 2 222, 2.
Factorization Machines
Factorization Machines Steffen Rendle Department of Reasoning for Intelligence The Institute of Scientific and Industrial Research Osaka University, Japan [email protected] Abstract In this
Bayesian Factorization Machines
Bayesian Factorization Machines Christoph Freudenthaler, Lars Schmidt-Thieme Information Systems & Machine Learning Lab University of Hildesheim 31141 Hildesheim {freudenthaler, schmidt-thieme}@ismll.de
Collaborative Filtering. Radek Pelánek
Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains
Big Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Going For Large Scale Application Scenario: Recommender
Tensor Factorization for Multi-Relational Learning
Tensor Factorization for Multi-Relational Learning Maximilian Nickel 1 and Volker Tresp 2 1 Ludwig Maximilian University, Oettingenstr. 67, Munich, Germany [email protected] 2 Siemens AG, Corporate
Hybrid model rating prediction with Linked Open Data for Recommender Systems
Hybrid model rating prediction with Linked Open Data for Recommender Systems Andrés Moreno 12 Christian Ariza-Porras 1, Paula Lago 1, Claudia Jiménez-Guarín 1, Harold Castro 1, and Michel Riveill 2 1 School
Rating Prediction with Informative Ensemble of Multi-Resolution Dynamic Models
JMLR: Workshop and Conference Proceedings 75 97 Rating Prediction with Informative Ensemble of Multi-Resolution Dynamic Models Zhao Zheng Hong Kong University of Science and Technology, Hong Kong Tianqi
Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler
Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error
The Need for Training in Big Data: Experiences and Case Studies
The Need for Training in Big Data: Experiences and Case Studies Guy Lebanon Amazon Background and Disclaimer All opinions are mine; other perspectives are legitimate. Based on my experience as a professor
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
IPTV Recommender Systems. Paolo Cremonesi
IPTV Recommender Systems Paolo Cremonesi Agenda 2 IPTV architecture Recommender algorithms Evaluation of different algorithms Multi-model systems Valentino Rossi 3 IPTV architecture 4 Live TV Set-top-box
PREA: Personalized Recommendation Algorithms Toolkit
Journal of Machine Learning Research 13 (2012) 2699-2703 Submitted 7/11; Revised 4/12; Published 9/12 PREA: Personalized Recommendation Algorithms Toolkit Joonseok Lee Mingxuan Sun Guy Lebanon College
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven
Ensemble Learning Better Predictions Through Diversity. Todd Holloway ETech 2008
Ensemble Learning Better Predictions Through Diversity Todd Holloway ETech 2008 Outline Building a classifier (a tutorial example) Neighbor method Major ideas and challenges in classification Ensembles
Neural Networks for Sentiment Detection in Financial Text
Neural Networks for Sentiment Detection in Financial Text Caslav Bozic* and Detlef Seese* With a rise of algorithmic trading volume in recent years, the need for automatic analysis of financial news emerged.
Link Prediction on Evolving Data using Tensor Factorization
Link Prediction on Evolving Data using Tensor Factorization Stephan Spiegel 1, Jan Clausen 1, Sahin Albayrak 1, and Jérôme Kunegis 2 1 DAI-Labor, Technical University Berlin Ernst-Reuter-Platz 7, 10587
Identifying SPAM with Predictive Models
Identifying SPAM with Predictive Models Dan Steinberg and Mikhaylo Golovnya Salford Systems 1 Introduction The ECML-PKDD 2006 Discovery Challenge posed a topical problem for predictive modelers: how to
Lecture 8 February 4
ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt
Weekly Sales Forecasting
Weekly Sales Forecasting! San Diego Data Science and R Users Group June 2014 Kevin Davenport! http://kldavenport.com [email protected] @KevinLDavenport Thank you to our sponsors: The competition
Big Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Going For Large Scale Going For Large Scale 1
Addressing Cold Start in Recommender Systems: A Semi-supervised Co-training Algorithm
Addressing Cold Start in Recommender Systems: A Semi-supervised Co-training Algorithm Mi Zhang,2 Jie Tang 3 Xuchen Zhang,2 Xiangyang Xue,2 School of Computer Science, Fudan University 2 Shanghai Key Laboratory
Probabilistic Matrix Factorization
Probabilistic Matrix Factorization Ruslan Salakhutdinov and Andriy Mnih Department of Computer Science, University of Toronto 6 King s College Rd, M5S 3G4, Canada {rsalakhu,amnih}@cs.toronto.edu Abstract
In this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
called Restricted Boltzmann Machines for Collaborative Filtering
Restricted Boltzmann Machines for Collaborative Filtering Ruslan Salakhutdinov [email protected] Andriy Mnih [email protected] Geoffrey Hinton [email protected] University of Toronto, 6 King
Advanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
Big & Personal: data and models behind Netflix recommendations
Big & Personal: data and models behind Netflix recommendations Xavier Amatriain Netflix [email protected] ABSTRACT Since the Netflix $1 million Prize, announced in 2006, our company has been known to
THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell
THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether
Large Scale Learning to Rank
Large Scale Learning to Rank D. Sculley Google, Inc. [email protected] Abstract Pairwise learning to rank methods such as RankSVM give good performance, but suffer from the computational burden of optimizing
CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on
CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace
Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
Marketing Mix Modelling and Big Data P. M Cain
1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored
Statistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.
Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
D-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
Supporting Online Material for
www.sciencemag.org/cgi/content/full/313/5786/504/dc1 Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks G. E. Hinton* and R. R. Salakhutdinov *To whom correspondence
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
Fundamental Analysis Challenge
All Together Now: A Perspective on the NETFLIX PRIZE Robert M. Bell, Yehuda Koren, and Chris Volinsky When the Netflix Prize was announced in October of 6, we initially approached it as a fun diversion
Probabilistic user behavior models in online stores for recommender systems
Probabilistic user behavior models in online stores for recommender systems Tomoharu Iwata Abstract Recommender systems are widely used in online stores because they are expected to improve both user
Impact of Boolean factorization as preprocessing methods for classification of Boolean data
Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,
Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University [email protected]
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University [email protected] 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
Bootstrapping Big Data
Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano
Introduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics
Compact Representations and Approximations for Compuation in Games
Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions
BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING
BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING Xavier Conort [email protected] Session Number: TBR14 Insurance has always been a data business The industry has successfully
Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012
Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering
Feature Engineering and Classifier Ensemble for KDD Cup 2010
JMLR: Workshop and Conference Proceedings 1: 1-16 KDD Cup 2010 Feature Engineering and Classifier Ensemble for KDD Cup 2010 Hsiang-Fu Yu, Hung-Yi Lo, Hsun-Ping Hsieh, Jing-Kai Lou, Todd G. McKenzie, Jung-Wei
Categorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
A Logistic Regression Approach to Ad Click Prediction
A Logistic Regression Approach to Ad Click Prediction Gouthami Kondakindi [email protected] Satakshi Rana [email protected] Aswin Rajkumar [email protected] Sai Kaushik Ponnekanti [email protected] Vinit Parakh
ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING
ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING BY OMID ROUHANI-KALLEH THESIS Submitted as partial fulfillment of the requirements for the degree of
Learning is a very general term denoting the way in which agents:
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
Data Mining Yelp Data - Predicting rating stars from review text
Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University [email protected] Chetan Naik Stony Brook University [email protected] ABSTRACT The majority
Introduction to Machine Learning Using Python. Vikram Kamath
Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
Heritage Provider Network Health Prize Round 3 Milestone: Team crescendo s Solution
Heritage Provider Network Health Prize Round 3 Milestone: Team crescendo s Solution Rie Johnson Tong Zhang 1 Introduction This document describes our entry nominated for the second prize of the Heritage
Statistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
Bayesian Predictive Profiles with Applications to Retail Transaction Data
Bayesian Predictive Profiles with Applications to Retail Transaction Data Igor V. Cadez Information and Computer Science University of California Irvine, CA 92697-3425, U.S.A. [email protected] Padhraic
Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data
Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data Jun Wang Department of Mechanical and Automation Engineering The Chinese University of Hong Kong Shatin, New Territories,
Polynomial Neural Network Discovery Client User Guide
Polynomial Neural Network Discovery Client User Guide Version 1.3 Table of contents Table of contents...2 1. Introduction...3 1.1 Overview...3 1.2 PNN algorithm principles...3 1.3 Additional criteria...3
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
Regularized Logistic Regression for Mind Reading with Parallel Validation
Regularized Logistic Regression for Mind Reading with Parallel Validation Heikki Huttunen, Jukka-Pekka Kauppi, Jussi Tohka Tampere University of Technology Department of Signal Processing Tampere, Finland
MapReduce Approach to Collective Classification for Networks
MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty
Factorization Models for Forecasting Student Performance
Factorization Models for Forecasting Student Performance Nguyen Thai-Nghe, Tomáš Horváth and Lars Schmidt-Thieme, University of Hildesheim, Germany Predicting student performance (PSP) is one of the educational
RECOMMENDATION SYSTEM
RECOMMENDATION SYSTEM October 8, 2013 Team Members: 1) Duygu Kabakcı, 1746064, [email protected] 2) Işınsu Katırcıoğlu, 1819432, [email protected] 3) Sıla Kaya, 1746122, [email protected]
Predict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
Component Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
A semi-supervised Spam mail detector
A semi-supervised Spam mail detector Bernhard Pfahringer Department of Computer Science, University of Waikato, Hamilton, New Zealand Abstract. This document describes a novel semi-supervised approach
Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data
Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.
International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
Multiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
How I won the Chess Ratings: Elo vs the rest of the world Competition
How I won the Chess Ratings: Elo vs the rest of the world Competition Yannis Sismanis November 2010 Abstract This article discusses in detail the rating system that won the kaggle competition Chess Ratings:
A Negative Result Concerning Explicit Matrices With The Restricted Isometry Property
A Negative Result Concerning Explicit Matrices With The Restricted Isometry Property Venkat Chandar March 1, 2008 Abstract In this note, we prove that matrices whose entries are all 0 or 1 cannot achieve
Scalable Machine Learning - or what to do with all that Big Data infrastructure
- or what to do with all that Big Data infrastructure TU Berlin blog.mikiobraun.de Strata+Hadoop World London, 2015 1 Complex Data Analysis at Scale Click-through prediction Personalized Spam Detection
Predicting the Stock Market with News Articles
Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is
Efficient online learning of a non-negative sparse autoencoder
and Machine Learning. Bruges (Belgium), 28-30 April 2010, d-side publi., ISBN 2-93030-10-2. Efficient online learning of a non-negative sparse autoencoder Andre Lemme, R. Felix Reinhart and Jochen J. Steil
Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour
Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour Thomas Kneib Department of Statistics Ludwig-Maximilians-University Munich joint work with Bernhard Baumgartner & Winfried
Mean-Shift Tracking with Random Sampling
1 Mean-Shift Tracking with Random Sampling Alex Po Leung, Shaogang Gong Department of Computer Science Queen Mary, University of London, London, E1 4NS Abstract In this work, boosting the efficiency of
Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights
Seventh IEEE International Conference on Data Mining Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights Robert M. Bell and Yehuda Koren AT&T Labs Research 180 Park
1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2
PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand
A Social Network-Based Recommender System (SNRS)
A Social Network-Based Recommender System (SNRS) Jianming He and Wesley W. Chu Computer Science Department University of California, Los Angeles, CA 90095 [email protected], [email protected] Abstract. Social
Simple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
