arxiv: v1 [cs.ir] 12 Jun 2015
|
|
|
- Flora Boyd
- 9 years ago
- Views:
Transcription
1 Reducing offline evaluation bias of collaborative filtering algorithms Arnaud de Myttenaere 1,2, Boris Golden 1, Bénédicte Le Grand 3 & Fabrice Rossi 2 arxiv: v1 [cs.ir] 12 Jun Viadeo 30 rue de la Victoire, Paris - France 2 - Université Paris 1 Panthéon - Sorbonne - SAMM EA rue de Tolbiac, Paris - France 3 - Université Paris 1 Panthéon - Sorbonne - Centre de Recherche en Informatique 90 rue de Tolbiac, Paris - France Abstract. Recommendation systems have been integrated into the majority of large online systems to filter and rank information according to user profiles. It thus influences the way users interact with the system and, as a consequence, bias the evaluation of the performance of a recommendation algorithm computed using historical data (via offline evaluation). This paper presents a new application of a weighted offline evaluation to reduce this bias for collaborative filtering algorithms. 1 Introduction Recommendation systems have been very frequently studied in the literature and aim to provide a user with a set of possibly ranked items that are supposed to match the interests of the user [5]. Applications of such systems are ubiquitous in the Internet (e-commerce, online advertising, social networks,...), and can be seen as a way to adapt a system to a user. Obviously, recommendation algorithms must be evaluated before and during their active use in order to ensure their performance. Live monitoring is generally achieved using online performance metrics (e.g. click-through rate of displayed ads) whereas offline evaluation is computed using historical data. Offline evaluation allows to quickly test several strategies without having to wait for real metrics to be collected nor impacting the performance of the online system. One of the main strategies of offline evaluation consists in simulating a recommendation by removing a confirmation action (click, purchase, etc.) from a user profile and testing whether the item associated to this action would have been recommended based on the rest of the profile [7]. As presented in [3, 1] this scheme ignores various factors that have influenced historical data as the recommendation algorithms previously used, promotional offers on some specific products, etc. Even if limits of evaluation strategies for recommendation algorithms have been identified ([2, 4, 6]), this protocol is still intensively used in practice. We study in this paper the general principle of instance weighting proposed in [1] and show its practical relevance beyond the simple case of constant recommendation (i.e. if recommendations are the same for every user). In addition to its good performances, this method is more realistic than solutions proposed
2 in [2, 4] for which a data collection phase based on random recommendations has to be performed. While this phase allows one to build a bias free evaluation data set, it has also adverse effects in terms of e.g. public image or business performance when used on a live system. The rest of the paper is organized as follows. Section 2 describes in details the setting and the problem. Section 3 introduces the weighting scheme proposed to reduce the evaluation bias. Section 4 demonstrates the practical relevance of our method on real world data extracted from Viadeo (professional social network 1 ). 2 Problem formulation 2.1 Notations and setting We denote U the set of users, I the set of items and D t the historical data available at time t. A recommendation algorithm is a function g from U D t to some set built from I. We will denote g t (u) = g(u, D t ) the recommendation computed by g at instant t for user u. We assume given a quality function l from the product of the result space of g and I to R + that measures to what extent an item i is correctly recommended by g at time t via l(g t (u), i). We denote I u the items associated to a user u. Offline evaluation is based on the possibility of removing any item i from a user profile. The result is denoted u i and g t (u i ) is the recommendation obtained at instant t when i has been removed from the profile of user u. Finally, offline evaluation follows a general scheme in which a user is chosen according to some probability on users P (u), which might reflect the business importance of the users. Given a user, an item i is chosen among the items associated to its profile, according to some conditional probability on items P (i u). When an item i is not associated to a user u (that is i I u ), P (i u) = 0. A very common choice for P (u) is the uniform probability on U and it is also very common to use a uniform probability for P (i u) (other strategy could favor items recently associated to a profile). As the system evolves over the time, P (u) and P (i) depends on t. The two distributions P (u) and P (i u) lead to a joint distribution P (u, i) = P (i u)p (u) on U I. 2.2 Origin of the bias in offline evaluation The classic offline evaluation procedure consists in calculating the quality of the recommendation algorithm g at instant t as L t (g) = E(l(g t (u i ), i)) where the expectation is taken with respect to the joint distribution: L t (g) = P t (i u)p t (u)l(g t (u i ), i). (1) (u,i) U I Then if two algorithms are evaluated at two different moments, their qualities are not directly comparable. Although as in an online system P (i u) evolves over 1 See for more information about Viadeo.
3 time 2 once a recommendation algorithm is chosen based on a given state of the system, it starts influencing the state of the system when put in production, inducing an increasing distance between its evaluation environment (i.e. the initial state of the system) and the evolving state of the system. This influence is responsible for a bias on offline evaluation as it relies on historical data. A naive solution to this bias would be to compare algorithms only with respect to the original database at t 0, but it would discard natural evolutions of user profiles. 3 Reducing the evaluation bias 3.1 A suggested method to reduce the bias A simple transformation of equation (1) shows that for a constant algorithm g: L t (g) = i I P t(i)l(g t, i). As a consequence, a way to guarantee a stationary evaluation framework for a constant algorithm is to have constant values for the marginal distribution of items, P t (i). A natural solution would be to record those probabilities at t 0 and use them as the probability to select an item in offline evaluation at t 1 > t 0. However, as the selection of users and items leads to a joint distribution, this would require to revert the way offline evaluation is done: first select an item, then select a user having this item with a certain probability π t (u i) leading to a different probability of users selection. Finally this process leads to a similar problem on users, and as in most of systems #U > #I, it is more efficient to follow the classical evaluation protocol. Moreover, we will see that the recalibration of every item is not necessary to reduce the main part of the bias. Indeed in practice most of the time a few items concentrate most of the recommendations (very popular items, discount on selected products,...). Thus one can reduce the major part of the bias by optimizing the weight of the p items such that the deviation given by P t0 (i) P t1 (i) have the strongest values. In practice p is chosen according to practical constraints (time) or business constraints. Thus the weighting strategy that we described in [1] consists in keeping the classical choice for P t (u) and weighting P t (i u) by departing from the classical values for P t (i u) (such as using a uniform probability) in order to mimic static values for P t0 (i) by : P t (i u, ω) = ω i P t (i u) j I t ω j P t (j u). (2) These weighted conditional probabilities lead to weighted item probabilities defined by: P t (i ω) = u U P t (i u, ω)p t (u). (3) 2 even if P (u) could also evolve over time we do not consider the effects of such evolution in the present article.
4 Then we minimize the distance between P t1 (i ω) and P t0 (i) by optimizing the Kullback-Leibler divergence, defined by : D(ω) = i I t0 P t0 (i) log P t0 (i) P t1 (i ω) where I t0 represents the set of items present at t 0. The asymmetric nature of this distance is useful in our context to consider time t 0 as a reference. Moreover this asymmetry reduces the influence of rare items at time t 0 (as they were not very important in the calculation of L t0 (g)). 3.2 Previous results As described in [1], in the classical offline evaluation approach the score of an algorithm in production, given by the classical offline evaluation, tends to increase over time. More generally, the classical offline evaluation tends to overestimate (resp. underestimate) the unbiased score of an algorithm similar (resp. orthogonal) to the one in production. We have also shown in [1] that the suggested weighting strategy perfectly recalibrates the score obtained by the classical offline evaluation for constant algorithms and high values of p. Thus, this method seems to reduce the bias for the very simple class of constant algorithms. In the next part we apply this method to collaborative filtering algorithms. 4 Experimentations on a collaborative filtering 4.1 Data and metrics We consider real world data extracted from Viadeo, where skills are attached to user s profile. The objective of the recommendation systems consists in suggesting new skills to users. The dataset contains users and 180 items (skills), leading to couples (u, i). Both probabilities P t (u) and P t (i u) are uniform, and the quality function l is given by l(g t (u i ), i) = 1 i gt(u i) where g t (u i ) is a set of 5 items. The quality of a recommendation algorithm, L t (g), is estimated via stochastic sampling in order to simulate what could be done on a larger data set than the one used for this illustration. We selected repeatedly couples (user, item) (first we select a user u uniformly, then an item according to P t (i u, ω)). 4.2 Collaborative filtering algorithms Let X u,t be the vector of items of user u at time t (X u,t {0, 1} #I ). Then X u,t is a sparse vector as most of users are associated to only a few items. The objective of collaborative filtering algorithms is to estimate X u,t for t > t using the information known on other users. In this paper we will present two different collaborative filtering algorithms:
5 a) X u,t = v U\{u} X u,t, X v,t Xu,t X v,t X v,t b) X #(U i U j ) u,t (i) = max j I u(t) #U j The equation a) is known as collaborative filtering with cosine similarity, whereas the equation b) computes the proportion of users associated to item i among the one associated to items possessed by u. Then we will note naive CF (Collaborative Filtering) the algorithm b). Finally, the recommendation strategy consists in recommending the k items having the highest values in X u,t. 4.3 Results We apply the method described in Section 3 to compute optimal weights at different instants and for several values of the parameter p. The collaborative filtering algorithms are the one presented in section 4.2. Results are summarized in figure 1. score p = 0 p = 10 p = 25 p = 50 p = 100 p = 150 score p = 0 p = 10 p = 25 p = 50 p = 100 p = time time (a) cosine similarity (b) naive CF Fig. 1: Results on the collaborative filtering with cosine similarity and naive CF, respectively defined by equation a) and b) in section 4.2, for several values of p (the number of weights optimized). The analysis is conducted on a 201 days period, from day 300 to day 500, where day 0 corresponds to the launch date of the skill feature. It is important to notice that two recommendation campaigns were conducted by Viadeo during this period at t = 330 and t = 430 respectively. As we can see on figure 1, the scores strongly decrease after the first recommendation campaign (t = 330). Thus those campaigns have strongly biased the collected data, leading to a significant bias in the offline evaluation score. The figure 1 shows the influence of the value of p: the higher is p the more weights are optimized and the more the bias is corrected. However, the efficiency of the recalibration depends on the algorithms. The results show that
6 the weighting protocol permits to reduce the impact of recommendation campaigns on offline evaluation results as intended. However it does not lead to the stationarity of the score of collaborative filtering algorithms (while it leads to constant scores for constant algorithms). This can be explained by the nature of collaborative filtering: we cannot expect the score to be constant for such an algorithm as it depends on the correlation between users, which have been modified by the recommendation campaigns. 5 Conclusion Various factors influence historical data and bias the score obtained by classical offline evaluation strategy. Indeed, as recommendations influence users, a recommendation algorithm in production tends to be favored by offline evaluation. We have presented a new application of the item weighting strategy inspired by techniques designed for tackling the covariate shift problem. Whereas our previous results presented the efficiency of this method for constant algorithms, we have shown that this method also reduces the bias of more elaborate algorithms. However the efficiency of this approach depends on algorithms as a recommendation campaign also introduces bias in the correlation between users. Thus the presented strategy reduces a part of the bias, and future works will focus on the structural bias introduced by recommendation campaigns. References [1] A. De Myttenaere, B. Golden, B. Le Grand, and F. Rossi. Reducing offline evaluation bias in recommendation systems. In B. Frénay, M. Verleysen, and P. Dupont, editors, Proceedings of 23rd annual Belgian-Dutch Conference on Machine Learning (Benelearn 2014), pages 55 62, Brussels (Belgium), [2] J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl. Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems, 22(1):5 53, January [3] L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evaluation of contextualbandit-based news article recommendation algorithms. In Proceedings of the fourth ACM international conference on Web search and data mining, pages ACM, [4] S. M. McNee, J. Riedl, and J. A. Konstan. Being accurate is not enough: how accuracy metrics have hurt recommender systems. In CHI 06 extended abstracts on Human factors in computing systems, pages ACM, [5] D. H. Park, H. K. Kim, I. Y. Choi, and J. K. Kim. A literature review and classification of recommender systems research. Expert Systems with Applications, 39(11): , [6] A. Said, B. Fields, B. J. Jain, and S. Albayrak. User-centric evaluation of a k-furthest neighbor collaborative filtering recommender algorithm. In Proceedings of the 2013 conference on Computer supported cooperative work, pages ACM, [7] G. Shani and A. Gunawardana. Evaluating recommendation systems. In P. B. Kantor, L. Rokach, F. Ricci, and B. Shapira, editors, Recommender systems handbook, pages Springer, 2011.
Predict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
Accurate is not always good: How Accuracy Metrics have hurt Recommender Systems
Accurate is not always good: How Accuracy Metrics have hurt Recommender Systems Sean M. McNee [email protected] John Riedl [email protected] Joseph A. Konstan [email protected] Copyright is held by the
Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.
Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.38457 Accuracy Rate of Predictive Models in Credit Screening Anirut Suebsing
Collaborative Filtering. Radek Pelánek
Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains
The Need for Training in Big Data: Experiences and Case Studies
The Need for Training in Big Data: Experiences and Case Studies Guy Lebanon Amazon Background and Disclaimer All opinions are mine; other perspectives are legitimate. Based on my experience as a professor
Understanding the Impact of Weights Constraints in Portfolio Theory
Understanding the Impact of Weights Constraints in Portfolio Theory Thierry Roncalli Research & Development Lyxor Asset Management, Paris [email protected] January 2010 Abstract In this article,
Contact Recommendations from Aggegrated On-Line Activity
Contact Recommendations from Aggegrated On-Line Activity Abigail Gertner, Justin Richer, and Thomas Bartee The MITRE Corporation 202 Burlington Road, Bedford, MA 01730 {gertner,jricher,tbartee}@mitre.org
Identification of Demand through Statistical Distribution Modeling for Improved Demand Forecasting
Identification of Demand through Statistical Distribution Modeling for Improved Demand Forecasting Murphy Choy Michelle L.F. Cheong School of Information Systems, Singapore Management University, 80, Stamford
Best Usage Context Prediction for Music Tracks
Best Usage Context Prediction for Music Tracks Linas Baltrunas [email protected] Lior Rokach Ben-Gurion University of the Negev, P.O.B. 653, Beer-Sheva, Israel [email protected] Marius Kaminskas [email protected]
Identifying Market Price Levels using Differential Evolution
Identifying Market Price Levels using Differential Evolution Michael Mayo University of Waikato, Hamilton, New Zealand [email protected] WWW home page: http://www.cs.waikato.ac.nz/~mmayo/ Abstract. Evolutionary
Chapter 4 Software Lifecycle and Performance Analysis
Chapter 4 Software Lifecycle and Performance Analysis This chapter is aimed at illustrating performance modeling and analysis issues within the software lifecycle. After having introduced software and
Imputing Missing Data using SAS
ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are
171:290 Model Selection Lecture II: The Akaike Information Criterion
171:290 Model Selection Lecture II: The Akaike Information Criterion Department of Biostatistics Department of Statistics and Actuarial Science August 28, 2012 Introduction AIC, the Akaike Information
Introduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
A simple analysis of the TV game WHO WANTS TO BE A MILLIONAIRE? R
A simple analysis of the TV game WHO WANTS TO BE A MILLIONAIRE? R Federico Perea Justo Puerto MaMaEuSch Management Mathematics for European Schools 94342 - CP - 1-2001 - DE - COMENIUS - C21 University
Dimensioning an inbound call center using constraint programming
Dimensioning an inbound call center using constraint programming Cyril Canon 1,2, Jean-Charles Billaut 2, and Jean-Louis Bouquard 2 1 Vitalicom, 643 avenue du grain d or, 41350 Vineuil, France [email protected]
Prediction of Software Development Modication Eort Enhanced by a Genetic Algorithm
Prediction of Software Development Modication Eort Enhanced by a Genetic Algorithm Gerg Balogh, Ádám Zoltán Végh, and Árpád Beszédes Department of Software Engineering University of Szeged, Szeged, Hungary
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Healthcare data analytics. Da-Wei Wang Institute of Information Science [email protected]
Healthcare data analytics Da-Wei Wang Institute of Information Science [email protected] Outline Data Science Enabling technologies Grand goals Issues Google flu trend Privacy Conclusion Analytics
A NURSING CARE PLAN RECOMMENDER SYSTEM USING A DATA MINING APPROACH
Proceedings of the 3 rd INFORMS Workshop on Data Mining and Health Informatics (DM-HI 8) J. Li, D. Aleman, R. Sikora, eds. A NURSING CARE PLAN RECOMMENDER SYSTEM USING A DATA MINING APPROACH Lian Duan
Classification by Pairwise Coupling
Classification by Pairwise Coupling TREVOR HASTIE * Stanford University and ROBERT TIBSHIRANI t University of Toronto Abstract We discuss a strategy for polychotomous classification that involves estimating
BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE
BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE Alex Lin Senior Architect Intelligent Mining [email protected] Outline Predictive modeling methodology k-nearest Neighbor
1 o Semestre 2007/2008
Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction
Genetic algorithms for solving portfolio allocation models based on relative-entropy, mean and variance
Journal of Scientific Research and Development 2 (12): 7-12, 2015 Available online at www.jsrad.org ISSN 1115-7569 2015 JSRAD Genetic algorithms for solving portfolio allocation models based on relative-entropy,
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek
Recommender Systems: Content-based, Knowledge-based, Hybrid Radek Pelánek 2015 Today lecture, basic principles: content-based knowledge-based hybrid, choice of approach,... critiquing, explanations,...
FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL
FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL STATIsTICs 4 IV. RANDOm VECTORs 1. JOINTLY DIsTRIBUTED RANDOm VARIABLEs If are two rom variables defined on the same sample space we define the joint
Comparison of Request Admission Based Performance Isolation Approaches in Multi-tenant SaaS Applications
Comparison of Request Admission Based Performance Isolation Approaches in Multi-tenant SaaS Applications Rouven Kreb 1 and Manuel Loesch 2 1 SAP AG, Walldorf, Germany 2 FZI Research Center for Information
Supply planning for two-level assembly systems with stochastic component delivery times: trade-off between holding cost and service level
Supply planning for two-level assembly systems with stochastic component delivery times: trade-off between holding cost and service level Faicel Hnaien, Xavier Delorme 2, and Alexandre Dolgui 2 LIMOS,
Numerical Methods for Option Pricing
Chapter 9 Numerical Methods for Option Pricing Equation (8.26) provides a way to evaluate option prices. For some simple options, such as the European call and put options, one can integrate (8.26) directly
Language Modeling. Chapter 1. 1.1 Introduction
Chapter 1 Language Modeling (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction In this chapter we will consider the the problem of constructing a language model from a set
Up/Down Analysis of Stock Index by Using Bayesian Network
Engineering Management Research; Vol. 1, No. 2; 2012 ISSN 1927-7318 E-ISSN 1927-7326 Published by Canadian Center of Science and Education Up/Down Analysis of Stock Index by Using Bayesian Network Yi Zuo
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
The Adomaton Prototype: Automated Online Advertising Campaign Monitoring and Optimization
: Automated Online Advertising Campaign Monitoring and Optimization 8 th Ad Auctions Workshop, EC 12 Kyriakos Liakopoulos 1, Stamatina Thomaidou 1, Michalis Vazirgiannis 1,2 1 : Athens University of Economics
Automated Collaborative Filtering Applications for Online Recruitment Services
Automated Collaborative Filtering Applications for Online Recruitment Services Rachael Rafter, Keith Bradley, Barry Smyth Smart Media Institute, Department of Computer Science, University College Dublin,
Hybrid model rating prediction with Linked Open Data for Recommender Systems
Hybrid model rating prediction with Linked Open Data for Recommender Systems Andrés Moreno 12 Christian Ariza-Porras 1, Paula Lago 1, Claudia Jiménez-Guarín 1, Harold Castro 1, and Michel Riveill 2 1 School
Cross-Validation. Synonyms Rotation estimation
Comp. by: BVijayalakshmiGalleys0000875816 Date:6/11/08 Time:19:52:53 Stage:First Proof C PAYAM REFAEILZADEH, LEI TANG, HUAN LIU Arizona State University Synonyms Rotation estimation Definition is a statistical
Improved Software Testing Using McCabe IQ Coverage Analysis
White Paper Table of Contents Introduction...1 What is Coverage Analysis?...2 The McCabe IQ Approach to Coverage Analysis...3 The Importance of Coverage Analysis...4 Where Coverage Analysis Fits into your
PDF hosted at the Radboud Repository of the Radboud University Nijmegen
PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is an author's version which may differ from the publisher's version. For additional information about this
OUTLIER ANALYSIS. Data Mining 1
OUTLIER ANALYSIS Data Mining 1 What Are Outliers? Outlier: A data object that deviates significantly from the normal objects as if it were generated by a different mechanism Ex.: Unusual credit card purchase,
Traffic Behavior Analysis with Poisson Sampling on High-speed Network 1
Traffic Behavior Analysis with Poisson Sampling on High-speed etwork Guang Cheng Jian Gong (Computer Department of Southeast University anjing 0096, P.R.China) Abstract: With the subsequent increasing
Role of Social Networking in Marketing using Data Mining
Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:
The BBP Algorithm for Pi
The BBP Algorithm for Pi David H. Bailey September 17, 2006 1. Introduction The Bailey-Borwein-Plouffe (BBP) algorithm for π is based on the BBP formula for π, which was discovered in 1995 and published
1 Teaching notes on GMM 1.
Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in
An Environment Model for N onstationary Reinforcement Learning
An Environment Model for N onstationary Reinforcement Learning Samuel P. M. Choi Dit-Yan Yeung Nevin L. Zhang pmchoi~cs.ust.hk dyyeung~cs.ust.hk lzhang~cs.ust.hk Department of Computer Science, Hong Kong
An Empirical Analysis of Sponsored Search Performance in Search Engine Advertising. Anindya Ghose Sha Yang
An Empirical Analysis of Sponsored Search Performance in Search Engine Advertising Anindya Ghose Sha Yang Stern School of Business New York University Outline Background Research Question and Summary of
Offline sorting buffers on Line
Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: [email protected] 2 IBM India Research Lab, New Delhi. email: [email protected]
Supervised and unsupervised learning - 1
Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in
A New Quantitative Behavioral Model for Financial Prediction
2011 3rd International Conference on Information and Financial Engineering IPEDR vol.12 (2011) (2011) IACSIT Press, Singapore A New Quantitative Behavioral Model for Financial Prediction Thimmaraya Ramesh
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
Unemployment Insurance Generosity: A Trans-Atlantic Comparison
Unemployment Insurance Generosity: A Trans-Atlantic Comparison Stéphane Pallage (Département des sciences économiques, Université du Québec à Montréal) Lyle Scruggs (Department of Political Science, University
Testing Carlo Cipolla's Laws of Human Stupidity with Agent Based Modeling
Testing Carlo Cipolla's Laws of Human Stupidity with Agent Based Modeling Andrea G. B. Tettamanzi and Célia da Costa Pereira Université Nice Sophia Antipolis, I3S, UMR 7271 06900 Sophia Antipolis, France
Linear programming approach for online advertising
Linear programming approach for online advertising Igor Trajkovski Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, Rugjer Boshkovikj 16, P.O. Box 393, 1000 Skopje,
Geography 4203 / 5203. GIS Modeling. Class (Block) 9: Variogram & Kriging
Geography 4203 / 5203 GIS Modeling Class (Block) 9: Variogram & Kriging Some Updates Today class + one proposal presentation Feb 22 Proposal Presentations Feb 25 Readings discussion (Interpolation) Last
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Load Balancing Algorithm Based on Services
Journal of Information & Computational Science 10:11 (2013) 3305 3312 July 20, 2013 Available at http://www.joics.com Load Balancing Algorithm Based on Services Yufang Zhang a, Qinlei Wei a,, Ying Zhao
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its
Strategic Online Advertising: Modeling Internet User Behavior with
2 Strategic Online Advertising: Modeling Internet User Behavior with Patrick Johnston, Nicholas Kristoff, Heather McGinness, Phuong Vu, Nathaniel Wong, Jason Wright with William T. Scherer and Matthew
Extreme Value Modeling for Detection and Attribution of Climate Extremes
Extreme Value Modeling for Detection and Attribution of Climate Extremes Jun Yan, Yujing Jiang Joint work with Zhuo Wang, Xuebin Zhang Department of Statistics, University of Connecticut February 2, 2016
Exploring Big Data in Social Networks
Exploring Big Data in Social Networks [email protected] ([email protected]) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about
Downloaded from UvA-DARE, the institutional repository of the University of Amsterdam (UvA) http://hdl.handle.net/11245/2.122992
Downloaded from UvA-DARE, the institutional repository of the University of Amsterdam (UvA) http://hdl.handle.net/11245/2.122992 File ID Filename Version uvapub:122992 1: Introduction unknown SOURCE (OR
PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA
PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA ABSTRACT The decision of whether to use PLS instead of a covariance
A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE
STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LIX, Number 1, 2014 A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE DIANA HALIŢĂ AND DARIUS BUFNEA Abstract. Then
Myth or Fact: The Diminishing Marginal Returns of Variable Creation in Data Mining Solutions
Myth or Fact: The Diminishing Marginal Returns of Variable in Data Mining Solutions Data Mining practitioners will tell you that much of the real value of their work is the ability to derive and create
Credit Risk Models: An Overview
Credit Risk Models: An Overview Paul Embrechts, Rüdiger Frey, Alexander McNeil ETH Zürich c 2003 (Embrechts, Frey, McNeil) A. Multivariate Models for Portfolio Credit Risk 1. Modelling Dependent Defaults:
itesla Project Innovative Tools for Electrical System Security within Large Areas
itesla Project Innovative Tools for Electrical System Security within Large Areas Samir ISSAD RTE France [email protected] PSCC 2014 Panel Session 22/08/2014 Advanced data-driven modeling techniques
OPTIMAL DESIGN OF A MULTITIER REWARD SCHEME. Amir Gandomi *, Saeed Zolfaghari **
OPTIMAL DESIGN OF A MULTITIER REWARD SCHEME Amir Gandomi *, Saeed Zolfaghari ** Department of Mechanical and Industrial Engineering, Ryerson University, Toronto, Ontario * Tel.: + 46 979 5000x7702, Email:
Time series analysis of the dynamics of news websites
Time series analysis of the dynamics of news websites Maria Carla Calzarossa Dipartimento di Ingegneria Industriale e Informazione Università di Pavia via Ferrata 1 I-271 Pavia, Italy [email protected] Daniele
Rating music: Accounting for rating preferences Manish Gupte Purdue University ([email protected])
Rating music: Accounting for rating preferences Manh Gupte Purdue University ([email protected]) Th paper investigates how consumers likes and dlikes influence their decion to rate a song. Yahoo! s survey
Random Forest Based Imbalanced Data Cleaning and Classification
Random Forest Based Imbalanced Data Cleaning and Classification Jie Gu Software School of Tsinghua University, China Abstract. The given task of PAKDD 2007 data mining competition is a typical problem
Traffic Prediction in Wireless Mesh Networks Using Process Mining Algorithms
Traffic Prediction in Wireless Mesh Networks Using Process Mining Algorithms Kirill Krinkin Open Source and Linux lab Saint Petersburg, Russia [email protected] Eugene Kalishenko Saint Petersburg
Categorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
On the Traffic Capacity of Cellular Data Networks. 1 Introduction. T. Bonald 1,2, A. Proutière 1,2
On the Traffic Capacity of Cellular Data Networks T. Bonald 1,2, A. Proutière 1,2 1 France Telecom Division R&D, 38-40 rue du Général Leclerc, 92794 Issy-les-Moulineaux, France {thomas.bonald, alexandre.proutiere}@francetelecom.com
Probability and Random Variables. Generation of random variables (r.v.)
Probability and Random Variables Method for generating random variables with a specified probability distribution function. Gaussian And Markov Processes Characterization of Stationary Random Process Linearly
Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data
Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.
Email Marketing for Success. A practical guide to growing your customer base, nurturing leads, and building trust throughout the purchase process
Email Marketing for Success A practical guide to growing your customer base, nurturing leads, and building trust throughout the purchase process Email Marketing The Email Marketer's Challenge The Email
PRODUCTS AND SERVICES RECOMMENDATION SYSTEMS IN E-COMMERCE. RECOMMENDATION METHODS, ALGORITHMS, AND MEASURES OF THEIR EFFECTIVENESS
INFORMATYKA EKONOMICZNA BUSINESS INFORMATICS 1(31) 2014 ISSN 1507-3858 Mirosława Lasek, Dominik Kosieradzki University of Warsaw PRODUCTS AND SERVICES RECOMMENDATION SYSTEMS IN E-COMMERCE. RECOMMENDATION
A Logistic Regression Approach to Ad Click Prediction
A Logistic Regression Approach to Ad Click Prediction Gouthami Kondakindi [email protected] Satakshi Rana [email protected] Aswin Rajkumar [email protected] Sai Kaushik Ponnekanti [email protected] Vinit Parakh
Recommender Systems for Large-scale E-Commerce: Scalable Neighborhood Formation Using Clustering
Recommender Systems for Large-scale E-Commerce: Scalable Neighborhood Formation Using Clustering Badrul M Sarwar,GeorgeKarypis, Joseph Konstan, and John Riedl {sarwar, karypis, konstan, riedl}@csumnedu
Enhancing Data Security in Cloud Storage Auditing With Key Abstraction
Enhancing Data Security in Cloud Storage Auditing With Key Abstraction 1 Priyadharshni.A, 2 Geo Jenefer.G 1 Master of engineering in computer science, Ponjesly College of Engineering 2 Assistant Professor,
MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS
MAXIMIZING RETURN ON DIRET MARKETING AMPAIGNS IN OMMERIAL BANKING S 229 Project: Final Report Oleksandra Onosova INTRODUTION Recent innovations in cloud computing and unified communications have made a
Moving Least Squares Approximation
Chapter 7 Moving Least Squares Approimation An alternative to radial basis function interpolation and approimation is the so-called moving least squares method. As we will see below, in this method the
Introducing diversity among the models of multi-label classification ensemble
Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and
Two Correlated Proportions (McNemar Test)
Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
The primary goal of this thesis was to understand how the spatial dependence of
5 General discussion 5.1 Introduction The primary goal of this thesis was to understand how the spatial dependence of consumer attitudes can be modeled, what additional benefits the recovering of spatial
Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Impact of Feature Selection on the Performance of ireless Intrusion Detection Systems
Understanding the popularity of reporters and assignees in the Github
Understanding the popularity of reporters and assignees in the Github Joicy Xavier, Autran Macedo, Marcelo de A. Maia Computer Science Department Federal University of Uberlândia Uberlândia, Minas Gerais,
Part 1: Link Analysis & Page Rank
Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Exam on the 5th of February, 216, 14. to 16. If you wish to attend, please
An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset
P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang
Standard Deviation Estimator
CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of
