# Addressing Cold Start in Recommender Systems: A Semi-supervised Co-training Algorithm

Save this PDF as:

Size: px
Start display at page:

## Transcription

4 Input: the training set L with labeled (rated) examples, and the unlabeled (not rated) example set U; Output: regressor h(u, i) for the rating prediction; Step : Construct multiple regressors; Generate the two regressors h and h 2 by manipulating the training set (M s ) or manipulating the attributes (M v in Section 4.); Create a pool U by randomly picking examples from U; Initialize the teaching sets T = ϕ; T 2 = ϕ; Step 2: Co-Training; repeat for j = to 2 do foreach x ui U do Obtain the confidence C j (x ui ) by Eq. 0: C j (x ui ) = d(j) u d (j) i c G,O,A,S d(j) c N ; end Select N examples to form T j by the Roulette algorithm with the probability P r(x ui, j) given by Eq. : T j Roulette(P r(x ui, j)); U = U T j ; end Teach the peer regressors: L = L T 2 ; L 2 = L 2 T ; Update the two regressors: h L ; h 2 L 2 ; until t rounds; Step 3: Assembling the results (with the methods in Section 4.3); h(u, i) = Assemble(h (u, i), h 2 (u, i)). Algorithm : Semi-supervised Co-Training (CSEL). Algorithm gives the algorithm framework. In the following more details will be given about the methods of constructing multiple regressors, constructing the teaching sets, and assembling. Note that in principle, the algorithm can be easily extended to accommodate multiple regressors (more than two). In this work, for the sake of simplicity we focus on two regressors. 4. Constructing Multiple Regressors Generally speaking, in order to generate different learners for ensemble, one way is to train the models with different examples by manipulating the training set, while the other way is to build up different views by manipulating the attributes [6], which are both adopted in this paper to generate multiple regressors for co-training. Constructing Regressors by Manipulating the Training Set. A straightforward way of manipulating the training set is Bagging. In each run, Bagging presents the learning algorithm with a training set that consists of a sample of k training examples drawn randomly with replacement from the original training set. Such a training set is called a bootstrap replicate of the original training set. In this work two subsets are generated with the Bagging method from the original training set, and two regressors are trained on the two different subsets, respectively, for the purpose of working collaboratively in the co-training process. The basic learner for generating the regressors could be the standard factorization model (Eq. 4), i.e., h (u, i) = h 2 (u, i) = µ + b u + b i + qi T p u g genres(i) + b ug (6) + b ia + b io + b is Some related work tries to use diverse regressors to reduce the negative influence of the newly labeled noisy data [30]. Our experiments also showed that a good diversity between the regressors indeed helps much in improving the performance of co-training. Thus it is an important condition for generating good combinations of regressors. Intuitively the diversity between the two regressors generated in this way are coming from the different examples they are trained on, and can be evaluated by their difference on predictions. Moreover, another important condition to make a good ensemble is, the two regressors should be sufficient and redundant. Then being translated to this case, it means each training set should be sufficient for learning, respectively and the predictions made by the two individual regressors should be as accurate as possible. However, the above two conditions, diversity and sufficiency, are contradictive to each other. That is, let the size of the two subsets generated with the Bagging method both be k, then with the original training set size fixed, when k is big, the overlap between the two subsets will be big as well, which will result in a small diversity between regressors. In the mean time the accuracy performance of individual regressors will be good with a big set of training examples. And vice versa when k is small. Thus in the experiments we need to find an appropriate value of k for the trade-off between these two criteria. This method is referred to as M s hereafter. Constructing Regressors by Manipulating the Attributes. Another way to generate multiple regressors is to divide attributes into multiple views, such as h (u, i) = µ + b u + b i + q T i p u + g genres(i) b ug (7) h 2(u, i) = µ + b u + b i + q T i p u + b ia + b io + b is (8) The two regressors are both a part of the model given in Eq. 4. Unlike M s, here the regressors are both trained on the entire training set. To guarantee enough diversity between the constructed regressors, the contexts are separated, specifically, user-related contexts and item-related contexts are given to the different regressors. Further, in order to keep a good performance in terms of prediction accuracy (with respect to the sufficient condition) for the individual regressors, the common part, µ + b u + b i + q T i p u, is kept for both regressors. This method is referred to as M v hereafter. Constructing Regressors by a Hybrid Method. As described above, M s is to train a same model on different subsets whereas M v is to construct different models and let them be trained on a single set. The hybrid method is to construct different regressors as well as training them on the different subsets of the attributes. Obviously, it will bring in more diversity between the regressors. The hybrid method is referred to as M sv. 4.2 Semi-supervised Co-training Now with the multiple regressors at hand, the task becomes how to co-train the different regressors. Specifically, we need to construct a teaching set T j for each regressor j and use it to teach its peer (the other regressors). This is a key step for launching the semi-supervised learning (SSL). One challenge here is how to determine the criteria for selecting unlabeled examples from U so as to build the teaching set. The criteria we used in this work is the regressor s confidence. Every candidate example has a specific confidence value to reflect the probability of its prediction to be accurate. As given in the algorithm, before SSL processes a smaller set U U is drawn randomly from U because U could be huge. Confidence for the FactCF Model. In many cases the user can benefit from observing the confidence scores [], e.g. when the system reports a low confidence in a recommended item, the user may tend to further research the item before making a decision. However, not every regression algorithm can generate a confidence

5 by itself, like the factorization regressor this work is based on. Thus it is needed to design a confidence for it. In [30] the authors proposed the predictive confidence estimation for the knn regressors. The idea is that the most confidently labeled example of a regressor should decrease most the error of the regressor on the labeled example set, if it is utilized. However, it requires the example to be merged with the labeled data and re-train the model, and this process needs to be repeated for all candidate examples. Similarly, in [0] another confidence measure was proposed, again, for the knn regressor. The idea is to calculate the conflicts level between the neighbors of the current item for the current user. However, the above two methods are both based on the k nearest neighbors of the current user and cannot be directly applied to our model. In the following we define a new confidence measure that suits our model. Confidence in the recommendation can be defined as the system s trust in its recommendations or predictions []. As we have noted above, collaborative filtering recommenders tend to improve their accuracy as the amount of data over items/users grows. In fact, how well the factorization regressor works directly depends on the number of labeled examples (ratings) that is associated with each factor in the model. And the factors in the model pile up together to have an aggregate impact on the final results. Based on this idea, we propose a simple but effective definition for the confidence on a prediction made by regressor j for example x ui : C j (x ui ) = d(j) u where N is the normalization term; d (j) u d (j) i N (9) and d (j) i are the user u and item i s popularities associated with regressor j. The idea is, the more chances (times) a factor gets to be updated during the training process, the more confident it is with the predictions being made with it. The rating number that is associated with the user-related factors b u and p u in the model is expressed by d u, while the rating number that is associated with the item-related factors b i and q i is expressed by d i. Here the assumption is that the effects of the parameters are accumulative. Confidence for the Context-aware Models. With the similar idea the confidence for the context-aware models is defined as below: C j(x ui) = d(j) u d (j) i c G,O,A,S d(j) c N (0) where c stands for the general information including genre, age, gender, occupation, etc.; d c represents the fraction of items that fall in a certain category c. For example, d g is the fraction of items that fall in the genre Fictions, d g2 is the fraction of items that fall in the genre Actions, etc., with G = {g,..., g 8 } (there are 8 genres in the MovieLens dataset). Likewise, d a, d s, d o, are associated with the categories of A = {a,..., a 7 } (the entire range of age is divided into 7 intervals), S = {male, female} and O = {o,..., o 20 } (20 kinds of occupations), respectively, can be obtained in the same way. Specifically, for the model with the item-view, i.e., h in E- q. 7, the confidence for x ui is C (x ui) = d () u d () i d gk, where d gk = g k genres(i) d g k and is the number of genres i belongs to. On the other hand, for the model with the user-view, i.e., h 2 in Eq. 8, the confidence is C 2(x ui) = d (2) u d (2) i d a d s d o. This way the confidence depends on the view applied by the model. With this definition, the two regressors (given by Eq. 6) being trained on different sample sets (M s ) will naturally have different d u, d i and d c. E.g., for a user u, its d () u for h is the rating numbers of u being divided into the first sub-set, while its d (2) u for h 2 is the rating numbers of u being divided into the second sub-set, and d i likewise. Therefore, it will result in two different confidence values for these two models. Note that in this work the confidence value of a prediction, C j (x ui ), is for selecting examples within a single model j, in order to steer the process of semi-supervised learning. Hence they are not comparable between different models. For the same reason the normalization part N is a constant within a model and thus can be omitted. Constructing and Co-training with the Teaching Set. For each regressor j, when constructing the teaching set T j with C j(x ui), to avoid focusing on the examples that are associated with the most popular users and items that have relatively high d (j) u and d (j) i, rather than directly selecting the examples with the highest confidence, we obtain a probability for each candidate example based on C j (x ui ) and select with the Roulette algorithm []. Specifically, the probability given to an example x ui is calculated by: C j (x ui ) P r(x ui, j) = x k U C j (x k ) () What s noteworthy is, in the experiments we found that simply s- electing examples with P r(x ui, j) sometimes cannot guarantee the performance. Moreover, we observed that a big diversity between the outputs of the two regressors is important for the optimization. The experiments show that for the selected examples, if the difference between the predictions made for them is not big enough the teaching effect can be trivial. This is consistent with the conclusion in [20] where the authors emphasize the diversity between the two learners are important. To guarantee a progress in SSL we propose to apply an extra condition in the teaching set selection. That is, only those examples with a big difference between the predictions made by the two regressors are selected. Specifically, when an example x ui is drawn from U with P r(x ui, j), its predictions made by the two regressors are compared to a threshold τ. If the difference between them is bigger than τ then x ui is put into T j, otherwise it is discarded. Next, the examples in the teaching set are labeled with the predictions made by the current regressor, likewise for its peer regressor. Note again that for the sake of diversity the teaching sets for the two regressors should be exclusive. At last the regressors are updated (re-trained) with these new labeled data. In this way the unlabeled data are incorporated into the learning process. According to the results given in the experiments, by incorporating the unlabeled data the CSEL strategy provides a significant boost on the system performance. This process of constructing the teaching sets and co-training will be repeated for t iterations in the algorithm. 4.3 Assembling the Results The final step of the algorithm is to assemble the results of the regressors after the optimization by SSL. A straightforward method is to take the average, i.e., h(u, i) = l j=..l h j (u, i) (2) where l is the number of regressors being assembled. In our case, to assemble results from two regressors it is h(u, i) = 2 [h (u, i)+

7 In the evaluation, the available ratings were split into train, validation and test sets such that 0% ratings of each user were placed in the test set, another 0% were used in the validation set, and the rest were placed in the training set. The validation dataset is used for early termination and for setting meta-parameters, then we fixed the meta-parameters and re-built our final model using both the train and validation sets. The results on the test set are reported. Comparison Algorithms. We compare the following algorithms. FactCF: the factorization-based collaborative filtering algorithm given by Eq. [5]; UB k-nn and IB k-nn: the user-based and item-based k- nearest neighbor collaborative filtering algorithms; Context: the context-aware model we propose in Section 3, given by Eq. 4; Ensemble s and Ensemble v : directly assembling the results of h and h 2 generated by manipulating the training set (M s ) and manipulating the attributes (M v in Section 4.), respectively. Among the options of ensemble method in Section 4.3, the weighted method in Eq. 3 gives a slightly better result than the other ones and thus being adopted. CSEL s, CSEL v and CSEL sv: the semi-supervised cotraining algorithms using the h and h 2 generated by M s, M v and M sv (Cf. Section 4.), respectively. Besides the state-of-the-art FactCF algorithm we also use another standard algorithm, k-nn, to serve as a baseline. Two kinds of k-nn algorithms are implemented and compared to our methods. With user-based k-nn, to provide a prediction for x ui, the k n- earest neighbors of u who have also rated i are selected according to their similarities to u, and the prediction is made by taking the weighted average of their ratings. With item-based k-nn, the k n- earest neighbors of i that have also been rated u are selected to form the prediction. Note that as another standard recommendation approach, the content-based algorithm has a different task from ours and thus not considered in this work. It aims to provide top-n recommendations with the best content similarities to the user, rather than providing predictions. When training the models we mainly tuned the parameters manually and sometimes resorted to an automatic parameter tuner (APT) to find the best constants (learning rates, regularization, and log basis). Specifically, we were using APT2, which is described in [27]. All the algorithms are performed with 0-fold cross-validations. Evaluation Measures. The quality of the results is measured by the root mean squared error of the predictions: RMSE = (r ui ˆr ui ) 2 T estset (u,i) T estset where r ui is the real rating and ˆr ui is the rating estimated by a recommendation model. 6.2 Results The overall performance on the four datasets are given in Table. The CSEL methods outperform both k-nn and FactCF algorithms in terms of the overall RMSE. Specifically, compared to FactCF the prediction accuracy averaged on all the test examples is improved by 3.4% on D, 3.6% on D 2, 3.2% on D and 3% on D 2 respectively, with CSEL v performing the best among the methods. An Table : The overall RMSE Performance on the Four Datasets. Models D D 2 D D 2 UB k-nn IB k-nn FactCF Context Ensemble s Ensemble v CSEL s CSEL v CSEL sv unexpected result is that the hybrid method CSEL sv does not outperform CSEL v. It is because the double splitting, i.e., on both the training set and views, leads to a bigger diversity (see Table 4) as well as worse accuracy for the regressors. As expected, FactCF outperforms k-nn on the overall performance by exploiting the latent factors. We tried different values of k from 0 to 00, k = 20 gives the best results. CSEL presents a much bigger advantage over the k-nn algorithms, i.e., 2% over UB and 6% over IB. The optimal solution is obtained by the CSEL v for all datasets. Cold-start Performance. Except for the overall RMSE performance, we also present the recommendation performance according to the popularity of items. Specifically, we estimate the popularity of each item based on the number of ratings and partition all the items into 0 bins according to their popularity with equal number, the average RMSE scores are obtained for each bin. The results on these 0 bins over dataset D 2 are depicted in Table 2 corresponding to Bin () i through Bin (0) i. Likewise, all the users are partitioned into 0 bins, the average RMSE scores are reported for each bin, corresponding to Bin () u through Bin (0) u in the table. The average rating number in each bin is given, too. With the users and items being grouped by their popularity we can observe how well the cold-start problem is solved. For the presented results, the differences between the algorithms are statistically significant (p < 0.05 with t-test). Due to space limitation, we do not list results on all the datasets, as according to the experimental results the behaviors of the algorithms on the four datasets (D, D 2, D, and D 2 ) turned out to be similar with each other. Table 2 demonstrates how well the cold-start problem can be addressed by the proposed CSEL algorithm. The performance of the k-nn algorithms shows again that the cold-start problem is common to all kinds of recommendation algorithms, i.e., the items and users with less ratings receive less accurate predictions. The reason is that the effectiveness of the k-nn collaborative filtering recommendation algorithms depends on the availability of sets of users/items with similar preferences [2]. Compared to the baseline algorithms, CSEL successfully tackles the cold-start problem by largely improving the performance of the unpopular bins. Again, CSEL v provides the best results, compared to FactCF, RMSE drops up to 8.3% for the most unpopular user bins and 0.% for the most unpopular item bins. Compared to UB k-nn, it drops up to 3.5% for the most unpopular user bins and 22.3% for the most unpopular item bins, respectively. 6.3 Analysis and Discussions We further evaluate how the different components (contextaware factorization, ensemble methods, etc.) contribute in the proposed algorithm framework. In the following the performance of the proposed context-aware model and ensemble methods are given, and the results of CSEL are further analyzed.

8 Table 2: RMSE Performance of Different Algorithms on D 2. Bin () u the set of most active users; Bin (0) u the set of most inactive users. Bin () i the set of most popular items; Bin (0) i the set of most unpopular items. D 2 Total RMSE Bin () u Bin (2) u Bin (3) u Bin (4) u Bin (5) u Bin (6) u Bin (7) u Bin (8) u Bin (9) u Bin (0) u Rating # 06 (AVG#) UB k-nn IB k-nn FactCF Context Ensemble s Ensemble v CSEL s CSEL v CSEL sv D 2 Total RMSE Bin () i Bin (2) i Bin (3) i Bin (4) i Bin (5) i Bin (6) i Bin (7) i Bin (8) i Bin (9) i Bin (0) i Rating # 60 (AVG#) UB k-nn IB k-nn FactCF Context Ensemble s Ensemble v CSEL s CSEL v CSEL sv Table 3: RMSE Performance of the evolving model. RMSE reduces while adding model components. # Models D D 2 D D 2 ) µ + b u + b i + qi T p u ) ) + b io + b ia + b is (h v ) g genres(i) bug 3) ) + (h v2 ) g genres(i) 4) 2) + b ug Context-aware Model. Table 3 depicts the results of the evolving model by adding on the biases of b i and b u separately. This way the effect of each part can be observed. Obviously, b ug contributes more than b i for the final improvement in RMSE. The reason can be traced back again to the rating numbers falling into into each category. Specifically, the average rating number of items is 60 and the average rating number of users is 06. The b i and b u are related to the specific i and u now, which means these ratings are then divided into these categories, i.e., S, O, A and G. Roughly speaking, the ratings for each item are divided by 2 for S, 20 for O and 7 for A, respectively, whereas the ratings for each user are shared between 8 genres but not exclusively, which leaves relatively richer information for b ug than b io, b ia and b is. With the context-aware model, the overall RMSE is improved by.7% on D,.3% on D 2,.5% on D and % on D 2, respectively, compared to FactCF. The performance on the cold-start problem is depicted in Table 2. There is a certain level improvement on all the bins comparing to the baseline algorithms. Ensemble Results. In Table we give the ensemble results obtained by two approaches M s and M v, respectively. Let h s and h s2 be the regressors generated by M s, h v and h v2 be the ones generated by M v. Their performance serves as a start point for the SSL process. The corresponding ensemble results are depicted as Ensemble s and Ensemble v, respectively. The ensemble methods present a much better results in overall RMSE than the FactCF algorithm. Given the individual regressors h v and h v2 s performance in Table 3, by simply assembling them together before any optimization with SSL, the accuracy has been largely improved. Table 2 shows the performance on bins. Comparing to Context, with a slightly better performance on the overall RMSE, their performance on the unpopular bins are better. The biggest improvement is observed in the last bin, where the RMSE is reduced by 4.6% for users and 6% for items. The empirical results demonstrate that the ensemble methods can be used to address the coldstart problem. For a more theoretical analysis for the ensemble methods, please refer to [6]. Semi-supervised Co-training Algorithm (CSEL). In order to generate the bootstrap duplicates with the M s method, we set a parameter 0 ζ as the proportion to take from the original training set. In our evaluation, by trying different values ζ is set to 0.8, i.e., 0.8 T examples are randomly selected for each duplicate, which provides the best performance in most cases. As mentioned in Section 4.2, selecting the examples with a big difference between h and h 2 to form the teaching set is essential to the success of the SSL process. In order to do so, a threshold β is set to select the examples with the top 0% biggest difference. So, according to the distribution of the difference depicted in Figure 2 we have β =0.675 for M s, 0.85 for M v and for M sv, respectively. As expected, M sv generates the most diverse regressors whereas M s generates the least. During the iterations the difference changes with the updated regressors. That is, if there is not enough unlabeled examples meet the condition the threshold will automatically shrink by β = 0.8 β. The performance on solving the cold-start problem on D 2 is depicted by the comparison of the left and right figures in Figure 3, which presents the results on user bins of the FactCF method and the CSEL v method, respectively. The

9 Distribution CSEL s CSEL v CSEL sv Table 4: Diversity of the Individual Regressors h and h 2 Preand Post- CSEL Methods. Methods D D 2 D D 2 Before CSEL s After CSEL s Before CSEL v After CSEL v Before CSEL sv After CSEL sv Difference Figure 2: Difference Distribution between the Multiple Regressors Generated for the CSEL Methods (on D 2 ). We estimate the difference between the predictions made by the multiple regressors generated with different methods, i.e., M s, M v and M sv. The threshold of difference is set according to the distributions for selecting the examples to form the teaching set. Table 5: Predictions made for the users and items with different popularity. Methods x x 2 x 2 x 22 Real Rating FactCF Context Ensemble v CSEL s CSEL v RMSE User Bins from Popular to Unpopular RMSE User Bins from Popular to Unpopular Figure 3: Comparison between the FactCF algorithm (left) and CSEL v (right) on the cold-start problem for D 2. The average RMSE scores on user bins with different popularity are given. RMSE on the cold-start bins drops dramatically. The performance on those bins are much more closer to the performance on the popular bins now. In other words, the distribution of RMSE becomes much more balanced than FactCF. Meanwhile, the performance on the popular bins has also been improved, which means the overall RMSE performance is generally improved. The improvement on the unpopular bins means the cold-start problem is well addressed in the sense that the predictions for the users and items that possess limited ratings become more accurate, which makes it possible to provide accurate recommendations for them. Efficiency Performance. Roughly speaking, the cost of the algorithm is t times of the standard factorization algorithm since the individual models are re-trained for t times. In the experiments we found that teaching set size N has a big influence on convergence. We tried many possible values of N from, 000 up to 20, 000 with respect to the optimization effect and time efficiency, and found that 20% of the training set size is a good choice, e.g., N = 8, 000 for D and N = 6, 000 for D 2, which leads to a fast convergence and optimal final results. Besides, we found the quality of the teaching set, i.e., difference and accuracy of the predictions made for the selected examples, has a even bigger impact on the performance and convergence of CSEL. With the methods described in Section 4.2 we managed to guarantee the best examples that satisfy the conditions are selected. The convergence point is set to when the RMSE in the validation set does drop any more for 2 iterations in a row. The entire training process is performed off-line. Diversity Analysis. The diversity of the regressors pre- and postperforming the CSEL Methods is depicted in Table 4 for all the datasets under estimation. Here diversity is defined as the average difference between the predictions made by the regressors. In the experiments we observed that the diversity drops rapidly when the accuracy goes up. At the point of convergence the regressors become very similar to each other and the ensemble of the results does not bring any more improvements at this point. As mentioned before diversity is an important condition to achieve a good performance. Apparently, simply distinguishing the teaching sets of the two regressors cannot help enough with maintaining the diversity and some strategies can be designed in this respect. Moreover, when generating the multiple regressors with M v or M s there is a setback on their performance, i.e., h and h 2 do not perform as good as the full model (Eq. 4) being trained on the full training set. For example, as shown in Table 3, h v and h v2 have a setback in accuracy compared to the full model. Although by simply assembling the two regressors together the accuracy can be brought back (see Table ), some improvement could still be done in this respect to obtain an even better initial point at the beginning of the SSL process. 6.4 Qualitative Case Study To have a more intuitive sense about how the algorithms work in different cases, we pick two users and two items from the dataset and look at their predictions. Let u be an active user and u 2 be an inactive user, i be a popular item and i 2 be an unpopular item. Then four examples x, x 2, x 2, x 22 are drawn from the dataset with the given users and items, where x kj is given by u k to i j. This way x is rated by an active user to a popular item whereas x 22 is rated by an inactive user to an unpopular item, etc., and the performance of the algorithms on them can be observed. In this case study we pick the four examples that have the real ratings, r, r 2, r 2, r 22. The results are given in Table 5, showing that providing accurate predictions also helps with personalization. That is, to recommend items that the end-user likes, but that are not generally popular, which can be regarded as the users niche tastes. By providing more accurate predictions for these items, the examples such as

10 x 2 or x 22 can be identified and recommended to the users who like them. Thus the users personalized tastes can be retrieved. 7. CONCLUSION This paper resorts to the semi-supervised learning methods to solve the cold-start problem in recommender systems. Firstly, to compensate the lack of ratings, we propose to combine the contexts of users and items into the model. Secondly, a semi-supervised cotraining framework is proposed to incorporate the unlabeled examples. In order to perform the co-training process and let the models teach each other, a method of constructing the teaching sets is introduced. The empirical results show that our strategy improves the overall system performance and makes a significant progress in solving the cold-start problem. As future work, there are many things worth to try. For example, our framework provides a possibility for accommodating any kinds of regressors. One interesting problem is how to choose the right regressors for a specific recommendation task. Meanwhile, this semi-supervised strategy can be easily expanded to include more than two regressors. For example, three regressors can be constructed and a semi-supervised tri-training can then be performed on them. Finally, it is also intriguing to further consider rich social context [28] in the recommendation task. Acknowledgements. This work is partially supported by National Natural Science Foundation of China (603078, ), National Basic Research Program of China (202CB36006, 204CB340500), NSFC (622222), and Fundamental Research Funds for the Central Universities in China and Shanghai Leading Academic Discipline Project (B4). We also thank Prof. Zhihua Zhou for his valuable suggestions. 8. REFERENCES [] T. Back. Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms. Oxford University Press, 996. [2] R. Bell, Y. Koren, and C. Volinsky. Modeling relationships at multiple scales to improve accuracy of large recommender systems. In KDD 07, pages 95 04, [3] A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In COLT 98, 998. [4] M. Degemmis, P. Lops, and G. Semeraro. A content-collaborative recommender that exploits wordnet-based user profiles for neighborhood formation. User Modeling and User-Adapted Interaction, 7(3):27 255, [5] M. Deshpande and G. Karypis. Item-based top-n recommendation algorithms. ACM Transaction on Information Systems, 22():43 77, [6] T. G. Dietterich. Ensemble methods in machine learning. In Workshop on Multiple Classifier Systems, pages 5, [7] G. Dror, N. Koenigstein, and Y. Koren. Yahoo! music recommendations: Modeling music ratings with temporal dynamics and item taxonomy. In RecSys, pages 65 72, 20. [8] K. Goldberg, T. Roeder, D.Gupta, and C. Perkins. Eigentaste: A constant time collaborative filtering algorithm. Information Retrieval, 4(2):33 5, 200. [9] N. Good, J. Schafer, and J. etc. Combining collaborative filtering with personal agents for better recommendations. In AAAI 99, pages , 999. [0] G. Guo. Improving the performance of recommender systems by alleviating the data sparsity and cold start problems. In IJCAI 3, pages , 203. [] J. Herlocker, J. Konstan, and J. Riedl. Explaining collaborative filtering recommendations. In CSCW 00, pages , [2] P. B. Kantor, F. Ricci, L. Rokach, and B. Shapira. Recommender Systems Handbook. Springer, 200. [3] I. Konstas, V. Stathopoulos, and J. M. Jose. On social networks and collaborative recommendation. In SIGIR 09, pages , [4] Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In KDD 08, pages , [5] Y. Koren. The bellkor solution to the netflix grand prize. Technical report, [6] J. Li and O. Zaiane. Combining usage, content, and structure data to improve web site recommendation. In EC-Web 04, [7] J. Lin, K. Sugiyama, M.-Y. Kan, and T.-S. Chua. Addressing cold-start in app recommendation: Latent user models constructed from twitter followers. In SIGIR 3, pages , 203. [8] P. Melville, R. Mooney, and R. Nagarajan. Content-boosted collaborative filtering for improved recommendations. In AAAI 02, pages 87 92, [9] S.-T. Park and W. Chu. Pairwise preference regression for cold-start recommendation. In RecSys 09, pages 2 28, [20] L. Rokach. Ensemble-based classifiers. Artificial Intelligence Review, 33(): 39, 200. [2] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Using filtering agents to improve prediction quality in the grouplens research collaborative filtering system. In CSCW 98, pages , 998. [22] A. Schein, A. Popescul, L. Ungar, and D. Pennock. Methods and metrics for cold-start recommendations. In SIGIR 02, pages , [23] I. Soboroff and C. Nicholas. Combining content and collaboration in text filtering. In IJCAI Workshop on Machine Learning for Information Filtering, pages 86 9, 999. [24] M. Sun, F. Li, J. Lee, K. Zhou, G. Lebanon, and H. Zha. Learning multiple-question decision trees for cold-start recommendation. In WSDM 3, pages , 203. [25] G. Takacs, I. Pilaszy, B. Nemeth, and D. Tikk. Scalable collaborative filtering approaches for large recommender systems. Journal of Machine Learning Research, 0: , [26] J. Tang, S. Wu, J. Sun, and H. Su. Cross-domain collaboration recommendation. In KDD 2, pages , 202. [27] A. Toscher and M. Jahrer. The bigchaos solution to the netflix prize Technical Report, [28] Z. Yang, K. Cai, J. Tang, L. Zhang, Z. Su, and J. Li. Social context summarization. In SIGIR, pages , 20. [29] K. Zhou, S.-H. Yang, and H. Zha. Functional matrix factorizations for cold-start recommendation. In SIGIR, pages , 20. [30] Z.-H. Zhou and M. Li. Semi-supervised regression with co-training. In IJCAI 05, pages , [3] Z.-H. Zhou and M. Li. Semi-supervised regression with co-training style algorithms. IEEE Transactions on Knowledge and Data Engineering, 9(): , 2007.

Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains

### Chapter 6. The stacking ensemble approach

82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

### Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

### CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes

### Hybrid model rating prediction with Linked Open Data for Recommender Systems

Hybrid model rating prediction with Linked Open Data for Recommender Systems Andrés Moreno 12 Christian Ariza-Porras 1, Paula Lago 1, Claudia Jiménez-Guarín 1, Harold Castro 1, and Michel Riveill 2 1 School

### Data Mining Practical Machine Learning Tools and Techniques

Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

### Machine Learning using MapReduce

Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

### Big Data Analytics. Lucas Rego Drumond

Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Going For Large Scale Application Scenario: Recommender

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### Cross-Domain Collaborative Recommendation in a Cold-Start Context: The Impact of User Profile Size on the Quality of Recommendation

Cross-Domain Collaborative Recommendation in a Cold-Start Context: The Impact of User Profile Size on the Quality of Recommendation Shaghayegh Sahebi and Peter Brusilovsky Intelligent Systems Program University

### A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

### Data Mining. Nonlinear Classification

Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

### Model Combination. 24 Novembre 2009

Model Combination 24 Novembre 2009 Datamining 1 2009-2010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy

### The Need for Training in Big Data: Experiences and Case Studies

The Need for Training in Big Data: Experiences and Case Studies Guy Lebanon Amazon Background and Disclaimer All opinions are mine; other perspectives are legitimate. Based on my experience as a professor

### Local outlier detection in data forensics: data mining approach to flag unusual schools

Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential

### An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

### How I won the Chess Ratings: Elo vs the rest of the world Competition

How I won the Chess Ratings: Elo vs the rest of the world Competition Yannis Sismanis November 2010 Abstract This article discusses in detail the rating system that won the kaggle competition Chess Ratings:

### Introducing diversity among the models of multi-label classification ensemble

Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and

### Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training

### Prediction of Atomic Web Services Reliability Based on K-means Clustering

Prediction of Atomic Web Services Reliability Based on K-means Clustering Marin Silic University of Zagreb, Faculty of Electrical Engineering and Computing, Unska 3, Zagreb marin.silic@gmail.com Goran

### Evaluation & Validation: Credibility: Evaluating what has been learned

Evaluation & Validation: Credibility: Evaluating what has been learned How predictive is a learned model? How can we evaluate a model Test the model Statistical tests Considerations in evaluating a Model

### Predicting borrowers chance of defaulting on credit loans

Predicting borrowers chance of defaulting on credit loans Junjie Liang (junjie87@stanford.edu) Abstract Credit score prediction is of great interests to banks as the outcome of the prediction algorithm

### Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

### The Predictive Data Mining Revolution in Scorecards:

January 13, 2013 StatSoft White Paper The Predictive Data Mining Revolution in Scorecards: Accurate Risk Scoring via Ensemble Models Summary Predictive modeling methods, based on machine learning algorithms

### Cross-Validation. Synonyms Rotation estimation

Comp. by: BVijayalakshmiGalleys0000875816 Date:6/11/08 Time:19:52:53 Stage:First Proof C PAYAM REFAEILZADEH, LEI TANG, HUAN LIU Arizona State University Synonyms Rotation estimation Definition is a statistical

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for

### Machine Learning Final Project Spam Email Filtering

Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE

### Distributed forests for MapReduce-based machine learning

Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication

### IPTV Recommender Systems. Paolo Cremonesi

IPTV Recommender Systems Paolo Cremonesi Agenda 2 IPTV architecture Recommender algorithms Evaluation of different algorithms Multi-model systems Valentino Rossi 3 IPTV architecture 4 Live TV Set-top-box

### Advanced Ensemble Strategies for Polynomial Models

Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer

### Content-Based Recommendation

Content-Based Recommendation Content-based? Item descriptions to identify items that are of particular interest to the user Example Example Comparing with Noncontent based Items User-based CF Searches

### Chapter 7. Feature Selection. 7.1 Introduction

Chapter 7 Feature Selection Feature selection is not used in the system classification experiments, which will be discussed in Chapter 8 and 9. However, as an autonomous system, OMEGA includes feature

### D-optimal plans in observational studies

D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

### Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification

### Predict Influencers in the Social Network

Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

### Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

### A Social Network-Based Recommender System (SNRS)

A Social Network-Based Recommender System (SNRS) Jianming He and Wesley W. Chu Computer Science Department University of California, Los Angeles, CA 90095 jmhek@cs.ucla.edu, wwc@cs.ucla.edu Abstract. Social

### Automatic Web Page Classification

Automatic Web Page Classification Yasser Ganjisaffar 84802416 yganjisa@uci.edu 1 Introduction To facilitate user browsing of Web, some websites such as Yahoo! (http://dir.yahoo.com) and Open Directory

### Leveraging Ensemble Models in SAS Enterprise Miner

ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to

### Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598. Keynote, Outlier Detection and Description Workshop, 2013

Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier

### Specific Usage of Visual Data Analysis Techniques

Specific Usage of Visual Data Analysis Techniques Snezana Savoska 1 and Suzana Loskovska 2 1 Faculty of Administration and Management of Information systems, Partizanska bb, 7000, Bitola, Republic of Macedonia

### Data Mining Classification: Alternative Techniques. Instance-Based Classifiers. Lecture Notes for Chapter 5. Introduction to Data Mining

Data Mining Classification: Alternative Techniques Instance-Based Classifiers Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar Set of Stored Cases Atr1... AtrN Class A B

### Heritage Provider Network Health Prize Round 3 Milestone: Team crescendo s Solution

Heritage Provider Network Health Prize Round 3 Milestone: Team crescendo s Solution Rie Johnson Tong Zhang 1 Introduction This document describes our entry nominated for the second prize of the Heritage

### Gerry Hobbs, Department of Statistics, West Virginia University

Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

### The Scientific Data Mining Process

Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

### KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS

ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,

### Rating Prediction with Informative Ensemble of Multi-Resolution Dynamic Models

JMLR: Workshop and Conference Proceedings 75 97 Rating Prediction with Informative Ensemble of Multi-Resolution Dynamic Models Zhao Zheng Hong Kong University of Science and Technology, Hong Kong Tianqi

### Stock Market Forecasting Using Machine Learning Algorithms

Stock Market Forecasting Using Machine Learning Algorithms Shunrong Shen, Haomiao Jiang Department of Electrical Engineering Stanford University {conank,hjiang36}@stanford.edu Tongda Zhang Department of

### The primary goal of this thesis was to understand how the spatial dependence of

5 General discussion 5.1 Introduction The primary goal of this thesis was to understand how the spatial dependence of consumer attitudes can be modeled, what additional benefits the recovering of spatial

### BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an

### Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

### OUTLIER ANALYSIS. Data Mining 1

OUTLIER ANALYSIS Data Mining 1 What Are Outliers? Outlier: A data object that deviates significantly from the normal objects as if it were generated by a different mechanism Ex.: Unusual credit card purchase,

### Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods

Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing

### Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.

### Decompose Error Rate into components, some of which can be measured on unlabeled data

Bias-Variance Theory Decompose Error Rate into components, some of which can be measured on unlabeled data Bias-Variance Decomposition for Regression Bias-Variance Decomposition for Classification Bias-Variance

### Logistic Regression for Spam Filtering

Logistic Regression for Spam Filtering Nikhila Arkalgud February 14, 28 Abstract The goal of the spam filtering problem is to identify an email as a spam or not spam. One of the classic techniques used

### Data Preprocessing. Week 2

Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.

### Beating the MLB Moneyline

Beating the MLB Moneyline Leland Chen llxchen@stanford.edu Andrew He andu@stanford.edu 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series

### Relational Learning for Football-Related Predictions

Relational Learning for Football-Related Predictions Jan Van Haaren and Guy Van den Broeck jan.vanhaaren@student.kuleuven.be, guy.vandenbroeck@cs.kuleuven.be Department of Computer Science Katholieke Universiteit

### large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven

### Introduction to Statistical Machine Learning

CHAPTER Introduction to Statistical Machine Learning We start with a gentle introduction to statistical machine learning. Readers familiar with machine learning may wish to skip directly to Section 2,

### A Lightweight Solution to the Educational Data Mining Challenge

A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn

### Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

### FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

### Recommendation Tool Using Collaborative Filtering

Recommendation Tool Using Collaborative Filtering Aditya Mandhare 1, Soniya Nemade 2, M.Kiruthika 3 Student, Computer Engineering Department, FCRIT, Vashi, India 1 Student, Computer Engineering Department,

### Chapter 12 Bagging and Random Forests

Chapter 12 Bagging and Random Forests Xiaogang Su Department of Statistics and Actuarial Science University of Central Florida - 1 - Outline A brief introduction to the bootstrap Bagging: basic concepts

### BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE

BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE Alex Lin Senior Architect Intelligent Mining alin@intelligentmining.com Outline Predictive modeling methodology k-nearest Neighbor

### Chapter 7: Data Mining

Chapter 7: Data Mining Overview Topics discussed: The Need for Data Mining and Business Value The Data Mining Process: Define Business Objectives Get Raw Data Identify Relevant Predictive Variables Gain

### Predicting the Stock Market with News Articles

Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is

### Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

### L25: Ensemble learning

L25: Ensemble learning Introduction Methods for constructing ensembles Combination strategies Stacked generalization Mixtures of experts Bagging Boosting CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna

### II. RELATED WORK. Sentiment Mining

Sentiment Mining Using Ensemble Classification Models Matthew Whitehead and Larry Yaeger Indiana University School of Informatics 901 E. 10th St. Bloomington, IN 47408 {mewhiteh, larryy}@indiana.edu Abstract

### Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,

### Bayesian Factorization Machines

Bayesian Factorization Machines Christoph Freudenthaler, Lars Schmidt-Thieme Information Systems & Machine Learning Lab University of Hildesheim 31141 Hildesheim {freudenthaler, schmidt-thieme}@ismll.de

### Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

### On Top-k Recommendation using Social Networks

On Top-k Recommendation using Social Networks Xiwang Yang, Harald Steck,Yang Guo and Yong Liu Polytechnic Institute of NYU, Brooklyn, NY, USA 1121 Bell Labs, Alcatel-Lucent, New Jersey Email: xyang1@students.poly.edu,

### Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

### Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

### The Artificial Prediction Market

The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory

### On Adaboost and Optimal Betting Strategies

On Adaboost and Optimal Betting Strategies Pasquale Malacaria School of Electronic Engineering and Computer Science Queen Mary, University of London Email: pm@dcs.qmul.ac.uk Fabrizio Smeraldi School of

### arxiv: v1 [cs.ir] 20 Dec 2016

Classification and Learning-to-rank Approaches for Cross-Device Matching at CIKM Cup 2016 Nam Khanh Tran L3S Research Center - Leibniz Universität Hannover ntran@l3s.de arxiv:1612.07117v1 [cs.ir] 20 Dec

### 6.2.8 Neural networks for data mining

6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural

### Entropy based Graph Clustering: Application to Biological and Social Networks

Entropy based Graph Clustering: Application to Biological and Social Networks Edward C Kenley Young-Rae Cho Department of Computer Science Baylor University Complex Systems Definition Dynamically evolving

### A Learning Algorithm For Neural Network Ensembles

A Learning Algorithm For Neural Network Ensembles H. D. Navone, P. M. Granitto, P. F. Verdes and H. A. Ceccatto Instituto de Física Rosario (CONICET-UNR) Blvd. 27 de Febrero 210 Bis, 2000 Rosario. República

Server Load Prediction Suthee Chaidaroon (unsuthee@stanford.edu) Joon Yeong Kim (kim64@stanford.edu) Jonghan Seo (jonghan@stanford.edu) Abstract Estimating server load average is one of the methods that

### 2. EXPLICIT AND IMPLICIT FEEDBACK

Comparison of Implicit and Explicit Feedback from an Online Music Recommendation Service Gawesh Jawaheer Gawesh.Jawaheer.1@city.ac.uk Martin Szomszor Martin.Szomszor.1@city.ac.uk Patty Kostkova Patty@soi.city.ac.uk

### International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

### A semi-supervised Spam mail detector

A semi-supervised Spam mail detector Bernhard Pfahringer Department of Computer Science, University of Waikato, Hamilton, New Zealand Abstract. This document describes a novel semi-supervised approach

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### Ensemble Learning Better Predictions Through Diversity. Todd Holloway ETech 2008

Ensemble Learning Better Predictions Through Diversity Todd Holloway ETech 2008 Outline Building a classifier (a tutorial example) Neighbor method Major ideas and challenges in classification Ensembles

### Simple Predictive Analytics Curtis Seare

Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

IADIS International Journal on WWW/Internet Vol. 12, No. 1, pp. 52-64 ISSN: 1645-7641 USER INTENT PREDICTION FROM ACCESS LOG IN ONLINE SHOP Hidekazu Yanagimoto. Osaka Prefecture University. 1-1, Gakuen-cho,

### A General Approach to Incorporate Data Quality Matrices into Data Mining Algorithms

A General Approach to Incorporate Data Quality Matrices into Data Mining Algorithms Ian Davidson 1st author's affiliation 1st line of address 2nd line of address Telephone number, incl country code 1st

### TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP

TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions

### Predicting Market Value of Soccer Players Using Linear Modeling Techniques

1 Predicting Market Value of Soccer Players Using Linear Modeling Techniques by Yuan He Advisor: David Aldous Index Introduction ----------------------------------------------------------------------------

### Drug Store Sales Prediction

Drug Store Sales Prediction Chenghao Wang, Yang Li Abstract - In this paper we tried to apply machine learning algorithm into a real world problem drug store sales forecasting. Given store information,