# Silence is Also Evidence: Interpreting Dwell Time for Recommendation from Psychological Perspective

Save this PDF as:

Size: px
Start display at page:

Download "Silence is Also Evidence: Interpreting Dwell Time for Recommendation from Psychological Perspective"

## Transcription

4 Dwell Time (s) Like eutral umber of words Figure 3: Trend of dwell time w.r.t item length is reasonable because longer texts take more time for people to read. Another interesting observation is that the blue solid line is always above the green dashed line, suggesting that the average dwell time for positively voted items is longer than that of the neutral ones when the item length is similar. Recall that in Figure 2 we found positively voted item has a higher quality than neutral. This suggests that people tend to spend more time on the items of higher quality. This observation makes sense because in reality when a person meets a good item, e.g., a really funny joke, she is more likely to read it multiple times, resulting in longer dwell time. Finally we explore the distribution of dwell time. As shown in Figure 3, the average dwell time correlates with the story s length. Therefore we first group all stories according to their length. Then we select the group of items with the same length which has most number of views and count the frequency of the dwell time. Figure 4a shows the frequency for stories whose length ranges from 36 to 46. ote that the x-axis is in log-scale. The statistical result indicates that the dwell time may satisfy a log-gaussian distribution. To further prove this hypothesis, we compare the log value of dwell time sample to the standard Gaussian distribution and plot the quantile-quantile curve in Figure 4b. As can be seen, the blue markers closely lie on the red straight line, indicating that the two distributions are linearly related. Therefore it is proper to use a log-gaussian distribution to model the dwell time. Freqency Dwell Time log scale Standard ormal Quantiles (a) Raw frequency (b) Q-Q plot Figure 4: Characteristics of dwell time In the end of this section, we summarize our observations into principles that will guide the design of our model. Firstly, people tend to have multiple action bounds which differ from person to person (see Figure 2). Secondly, the length of a textual item determines the expected dwell time, which at the mean time is affected by the difference between the item s quality and the user s action bound (see Figure 3). Finally, it is proper to choose log-gaussian distribution to model the dwell time (see Figure 4). Quantiles of dwell time sample VIEWIG-VOTIG MODEL Table 2: List of Symbols Symbol u b π r α v t q l p Meaning the user latent action bound (LAB) probability of LAB selection reading speed quality sensitivity vote dwell time the item quality the item length the Bernoulli parameter In this section we discuss the details of the model. Table 2 lists the symbols we used in the work. We assume that each user has several latent action bounds (LABs). When viewing an item, the user will randomly select a LAB, which, together with the quality of the item, jointly affects the user s voting behavior, i.e., to leave a vote or not. Generally, if the selected LAB is bigger than quality, the user is less likely to leave a vote and vice versa. The LABs can be treated as the expectation of the person to the quality of the item. The higher the user s expectation is, the harder the item can entertain her, and vice versa. However, different from item s quality that can either be measured or reflected (by people s votes), the expectation is hard to measure. Therefore we model it as a hidden parameter. Besides the voting, another value, the dwell time, which is corresponding to the viewing behavior, is also generated. This is a metric that measures how long the user would stay on this item before moving forward. In general, it is affected by LAB and quality in a way that is similar to how the vote is produced. For big LAB and small quality, the dwell time tends to be short since the item can hardly entertain the user. For small LAB and big quality, the dwell time would be long as the user is extremely attracted to the item. r u Viewing behavior t b l q p Voting behavior Figure 5: The graphical model. The shaded circles stand for observable values while the unshaded ones for latent variables. The graphical model is shown in Figure 5. Each user u has a set of LABs. Each item consists of two components, i.e., property l (in our case it is the length of the joke) and quality q. The quality q and bound b jointly generate the vote v, where 1 means positive vote and no vote. The dwell time t is determined by two groups of factors, (i) user-related and (ii) item-related. As the name suggests, the former one stands for the personalized factors that affect the dwell time. The latter one, on the other hand, may be different as the item differs. Specifically in the graph model, the t is generated by five parameters, among which LAB b, reading speed r and quality sensitivity α are user-related v 992

5 while the other two, namely q and l are item-related. The speed r measures how quickly a person can read while the sensitivity α reflects how sensitive the person is to items of diversified quality. The details will be discussed later in Section 3.2. The item-related factors have different semantics with regarding to the different item types. In this work, we focus our attention on the textual item and discuss these two factors in the context of an article. In the graphical model, the q represents the quality of the item and l represents the length of the article. Algorithm 1 Generation process 1. Choose a LAB b Multinomial(π i), 1 i k 2. Choose a parameter p Be(q, b) 3. Choose a vote v Bernoulli(p) 4. Choose a time log t (μ, σ 2 ) The generation process is summarized in Algorithm 1. The user first selects one of k LABs, where the probability of selecting i th bound b i is π i.afterthelabb is generated, it goes along with the item s quality q to determine the vote, which is modeled as a Bernoulli process and the parameter p satisfies a Beta distribution with parameters (q, b). We will discuss its details in Section 3.1. Finally, a dwell time t is generated by a log-gaussian distribution whose mean is μ and variance is σ 2, as suggested in Figure 4. Details of the dwell time model as well as these parameters are discussed in Section Model of Voting Behavior Here we discuss the model of voting. As described earlier, it is a Bernoulli process, where the probability of positive vote is p while the neutral is 1 p. The parameter p is produced by a Beta distribution Be(q, b) defined in Equation (2). Γ(q + b) Be(p; q, b) Γ(q)Γ(b) pq 1 (1 p) b 1 pq 1 (1 p) b 1 (2) B(q, b) where q is the item s quality and b is the user s LAB. In probability theory, Beta distribution is often used to describe a prior distribution of a parameter for some distribution, e.g., Bernoulli distribution. Specifically, the q and b jointly determine the probability distribution of p. When q>b 1, the value of p is more likely to be large. In case of 1 q<b, the generated p is closer to. That means, the user is more likely to give a positive vote if the quality of the item is bigger than the LAB b. If not, a silent viewing may possibly happen. To sum up, given a selected LAB b andanitemwhose quality is q, the probability of a positive vote is given in Equation (3). 1 P (v 1 b, q) P (v 1,p b, q)dp 1 p Be(p; q, b)dp q (3) q + b Similarly, the probability of no vote (v ) is shown below: P (v b, q) 1 P (v 1 b, q) 1 q q + b b q + b (4) Finally, Equation (3) and (4) can be unified as below. P (v b, q) (1 v) b + v q q + b 3.2 Model of Dwell Time In this section we discuss the model of dwell time. Specifically, we consider the following three cases with regarding to selected LAB b and item quality q: b>q. The item is not good enough to motivate the user to vote. In this case, the user would not spend too much time on this item and she would even quit before reaching the end of the passage. b q. The item s quality is very close to the user s LAB, which may lead to the user s hesitance of voting. Therefore, the user may continue to read the item to its end. And the time is related to the item s property, i.e., the length of the article. b<q. The item s quality goes beyond the user s expectation and naturally the user may spend much more time reading this item and may remain on it after finishing reading, making the time longer than expected. Based on the result of statistical analysis obtained in Section 2, we use log-gaussian distribution to model the dwell time. The mean value μ is determined by the item s length l, quality q, theselectedlabb as well as two personalized parameters reading speed r and quality sensitivity α. Recallin Figure 3 we found that the average dwell time is linearly correlated to the item s length and thus l/r measures how long the person would spend reading a whole article of length l. The sensitivity α> determines the influence of the difference between quality and LAB, i.e., q b on the dwell time. Given the item s length l, quality q and the user s LAB b as well as r and α, the probability that the user would spend time t on this item is given in Equation (6). P (t l, q, b, r, α) (log t; μ(r, l, α, q, b),σ 2 ) 1 (log t μ)2 e 2σ 2 2πσ 2 where μ(r, l, α, q, b) log(l/r)+α(q b) ote that if q>b, the mean dwell time would increase and vice versa. The sensitivity α determines the time growthdecay proportion and is different from person to person. Big α may indicate a picky person whose dwell time is heavily quality-dependent, i.e., spending little time on trash but much time on high-quality items. The small α, on the other hand, suggests a tolerant person whose dwell time does not vary too much on items of diversified quality. 3.3 Model Learning Based on earlier discussion, the model parameters are LAB distribution π i, 1 i k, the series of LABs b i,reading speed r, quality sensitivity α and dwell time variance σ 2. Let θ denote all of these parameters. The learning is a process to determine proper values for θ in order to maximize the probability of observed data, i.e., the item s quality q and length l, theuser svotev {, 1} as well as her dwell time t. Let (Q {q 1,,q n}, L {l 1,,l n}) denotethen items accessed by the user while V {v 1,,v n} and T {t 1,,t n} respectively stand for the vote and dwell time of the user on the corresponding item. Finally B are the (5) (6) 993

6 unobserved LABs that affect the user s behavior when viewing the item. A loss function with constraint k i1 πi 1is defined in Equation (7). k L(θ; Q, L, V, T, B) logp (V, T, B Q, L,θ)+λ( π i 1) i1 n k k τ ij log P (v i,t i,b j q i,l i,θ)+λ( π i 1) i1 j1 i1 n k τ ij (log P (v i q i,θ j )+logp (t i l i,q i,θ j )+logπ j ) i1 j1 k + λ( π i 1) i1 where λ is a Lagrange multiplier and τ ij is a judge function (7) defined in Equation (8). { 1; if LAB b j is selected for item i τ ij (8) ; otherwise We use EM (expectation-maximum) algorithm [8] to solve the problem. In the following, θ s denotes the parameter values for the s th iteration. E-step. E s+1 (τ ij )P (b s j v i,t i,q i,l i,r s,α s ) P (v i b s j,q i)p (t i l i,q i,b s j,rs,α s )P (b s j ) k m1 P (v i b s m,q i)p (t i l i,q i,b s m,rs,α s )P (b s m ) P (v i b s j,q i)p (t i l i,q i,b s j,rs,α s )π α j k m1 P (v i b s m,q i )P (t i l i,q i,b s m,r s,α s )π α m The definition of P (v i b j,q i)andp (t i l i,q i,b j,r,α)arein Equation (5) and (6). M-step. π s+1 j n i1 Es+1 (τ ij ) n i1 j1 Es+1 (τ ij ) i1 Es+1 (τ ij ) n ( ) (9) (1) n E s+1 1 v i 1 (τ ij ) i1 (1 v i )b s+1 j + v i q i b s+1 j + q i ( n α s E s+1 (log t i log(l i /r s )+α s (b s+1 ) j q i )) (τ ij ) (σ i1 s ) 2 (11) σ s+1 i1 j1 Es+1 (τ ij )(log t i μ s ij )2 i1 j1 Es+1 (τ ij ) (12) n k i1 j1 Es+1 (τ ij )(log t i μ s ij )2 n log r s+1 i1 j1 Es+1 (τ ij )(log l i log t i α s (b s j q i)) i1 j1 Es+1 (τ ij ) (13) α s+1 i1 j1 Es+1 (τ ij )(log(l i /r s ) log t i )(b s j q i) i1 j1 Es+1 (τ ij )(b s j q i) 2 (14) All parameters except b j can be directly computed while b j can be obtained by finding the root for Equation (11) with ewton s method. 4. RECOMMEDATIO Here we introduce how to use the proposed model to interpret the silent viewing behavior for recommendation. Specifically, we split the task into three phases. In phase I, given a user s dwell time on an item, the model is applied to find her LAB that is most likely to be selected to guide the viewing behavior, referred to as LAB choice estimation. In phase II, namely vote prediction, the probability is computed that how likely the user would vote for the item given this estimated LAB. These pseudo votes are used to enrich the original sparse user-vote matrix. Finally in phase III, conventional rating-based recommendation techniques, e.g., CF, is applied to make recommendation based on both the actual and pseudo votes. Phase I. Formally, for a particular item o, letq o and l o respectively denote its quality and length. Suppose the dwell time of i th user on the item is t o i. Let u i { π ij,b ij,σ i,r i,α i 1 j k} denote the i th user s profile, which is obtained after model training. LAB choice estimation is to find a LAB b i o that is most likely to be selected to generate the observed dwell time, as shown in Equation (15). b i o argmax bij P (t o i π i j,b ij,σ i,r i,α i ) argmax bij π ij (log t o i ; μ(r i,l o,α,q o,b ij ),σ 2 i ) (15) Phase II. After a proper estimated LAB b i o is found, we can interpret the user u i s dwell time to her possible preference on that item, which is modeled as the expected vote as given in Equation (16). E(v o i qo,b i o )1 q o q o + b i o + b i o q o + b i o q o q o + b i o (16) Phase III. After the first two phases, the pseudo votes from the VV model enrich the original sparse voting matrix, on which conventional recommendation techniques can be applied. 5. EVALUATIO For evaluation we use the real log from JokeBox as introduced in Section 2. Although using multiple data would strengthen the work, it is really hard to obtain such viewvote data from another domain. In our experiments, we only keep the users who have viewed no less than 2 items. As a result, we obtain 96 users and 19,196 items, among which there are 2,53 votes and 33,158 silent views. As the default in our experimental setting, the number of LABs is set to 5 unless noted explicitly. For evaluation, we split the vote data into four parts and conduct 4-fold cross-validation. For every round, three parts of the data are used as the training data to obtain user profiles with Equation (1) to Equation (14). Then the remaining part is used as test data. Two metrics are exploited in the evaluation. The first one is referred to as hit ratio. For each target user, a list of candidate items are ranked and recommended. Among them some do receive positive vote from the target user, treated 994

8 Hit 2 Hit A UCF UCF VV UCF umber of cross validation (a) Hit ratio of UCF umber of cross validation A SVD++ SVD++ VV SVD++ Hit A UCF UCF VV UCF umber of cross validation (b) Hit rank of UCF umber of cross validation (c) Hit ratio of SVD (d) Hit rank of SVD Figure 7: Evaluation on the impact of data sparsity Hit A SVD++ SVD++ VV SVD++ and Equation (18), we can see that compared with hit ratio, hit rank focus more on the ranks of the recommended item. When the training data increases, the test data decreases, leading to more noise in candidate item. However, we can see that VV-SVD++ suffers less degrade compared to baseline, demonstrating its accurate interpretation of dwell time. Hit Hit k1 k5 k1 k2 k5 k (a) Hit ratio of VV-UCF k1 k5 k1 k2 k5 k Hit k1 k5 k1 k2 k5 k (b) Hit rank of VV-UCF (c) Hit ratio of VV-SVD++ (d) Hit rank of VV-SVD++ Figure 8: Evaluation on the impact of k In the final experiment, we test the impact of LAB size lab. Figure 8 shows the performance of UCF and SVD++ that are combined with VV model w.r.t different LAB size. As can be seen, generally smaller k achieves better performance. Particularly when >5 for hit ratio and >1 for hit rank, VV model with single LAB display best performance. This scenario suggests that for this joke-reading dataset, people are quite consistent in terms of expectation and small number of LAB fits the dataset better than large one. A further check of learned user profiles reveals that when k>1, the learned dwell time variance σ 2 for all users tend to be small. But when k 1, for a small number of people, their learned dwell time variance σ 2 is extremely huge, which means their time spent on items vary a lot and Hit k1 k5 k1 k2 k5 k1 the prediction of their dwell time would be quite inaccurate. This observation provides such a possibility that besides the values of LAB, the number k might be another personalized parameter, which will be explored in future work. 6. RELATED WORK Our work is related to three areas, i) data sparsity in recommendation, ii) implicit user feedback modeling and iii) diffusion model in psychology. Data Sparsity in Recommendation. Data sparsity is a really big issue that impacts the performance of collaborative filtering in recommendation. Different approaches have been developed to incorporate various features on users and items into recommendation to tackle this problem. To build the connections among users, new similarity measurement were proposed to find the similar users in [3, 5]. To create new connections among items, different item-based similarity measures are proposed for different types of items, such as textual items [23, 28], POIs (point-of-interest) [32], movies [24] etc. Furthermore, Matrix factorization (MF) [17] is proposed to flexibly embed various features. For instance, in [22, 21], social network is imported to constrain the factorization process in such a way that socially connected people tend to have similar user feature vectors. Also in [1], Agarwal et. al. proposed flda, a hybrid recommendation method that combines matrix factorization with LDA, aiming to exploit items with rich textual information to improve the rate prediction. To sum up, these works are all based on the condition that users provide explicit opinions. Our work tries to throw away this burden, and tackle the sparsity problem from a new angle, i.e., predicting user s interest in items according to her dwell time. Study on Implicit Feedbacks. There are some previous works focusing on different types of implicit feedbacks, e.g., playcount of music tracks or albums [14], frequency of visits to a content or category [25] and so on. Some methods are also proposed to integrate implicit feedbacks to the rating-based solution [19]. We believe that our model is complement to these works. There are a few existing works to study dwell time as implicit feedback of users. In [2, 4, 6], dwell time is treated as an extra feature to rank the relevance of retrieved web page to user s query. In [15, 3], the correlation between display time and document usefulness/relevance is studied in information-seeking task. In [2], a system BrowseRank was developed which makes use of user s browsing behavior to rank the importance of web page. Other works [13, 31] try to make use of people s watching time for recommending TV shows or programs. Recently in [25], a positive correlation is observed between display time of picture and user s high ratings. Although Liu et. al. [18] used Weibull distribution to model people s dwell time on web pages, we found that a log-gaussian distribution is more proper for the dwell time on jokes, as shown in Figure 4. Furthermore, unlike previous works that simply interpret longer dwell time to positive feedback, we explore the latent human factors that determine the dwell time by borrowing the concepts from Psychology, which to our best knowledge is the first work. Diffusion Model in 2-AFC Task. Psychologist proposed a diffusion model [27] to simulate a person s response time when facing a two-alternative forced choice (2-AFC) task. The participant is required to make a judgement on 996

### 2. EXPLICIT AND IMPLICIT FEEDBACK

Comparison of Implicit and Explicit Feedback from an Online Music Recommendation Service Gawesh Jawaheer Gawesh.Jawaheer.1@city.ac.uk Martin Szomszor Martin.Szomszor.1@city.ac.uk Patty Kostkova Patty@soi.city.ac.uk

### Pharos: Social Map-Based Recommendation for Content-Centric Social Websites

Pharos: Social Map-Based Recommendation for Content-Centric Social Websites Wentao Zheng Michelle Zhou Shiwan Zhao Quan Yuan Xiatian Zhang Changyan Chi IBM Research China IBM Research Almaden ABSTRACT

### arxiv:1505.07900v1 [cs.ir] 29 May 2015

A Faster Algorithm to Build New Users Similarity List in Neighbourhood-based Collaborative Filtering Zhigang Lu and Hong Shen arxiv:1505.07900v1 [cs.ir] 29 May 2015 School of Computer Science, The University

Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains

### Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

### Recommendation Tool Using Collaborative Filtering

Recommendation Tool Using Collaborative Filtering Aditya Mandhare 1, Soniya Nemade 2, M.Kiruthika 3 Student, Computer Engineering Department, FCRIT, Vashi, India 1 Student, Computer Engineering Department,

### A survey on click modeling in web search

A survey on click modeling in web search Lianghao Li Hong Kong University of Science and Technology Outline 1 An overview of web search marketing 2 An overview of click modeling 3 A survey on click models

### Subordinating to the Majority: Factoid Question Answering over CQA Sites

Journal of Computational Information Systems 9: 16 (2013) 6409 6416 Available at http://www.jofcis.com Subordinating to the Majority: Factoid Question Answering over CQA Sites Xin LIAN, Xiaojie YUAN, Haiwei

### Achieve Better Ranking Accuracy Using CloudRank Framework for Cloud Services

Achieve Better Ranking Accuracy Using CloudRank Framework for Cloud Services Ms. M. Subha #1, Mr. K. Saravanan *2 # Student, * Assistant Professor Department of Computer Science and Engineering Regional

### IPTV Recommender Systems. Paolo Cremonesi

IPTV Recommender Systems Paolo Cremonesi Agenda 2 IPTV architecture Recommender algorithms Evaluation of different algorithms Multi-model systems Valentino Rossi 3 IPTV architecture 4 Live TV Set-top-box

### Cross-Domain Collaborative Recommendation in a Cold-Start Context: The Impact of User Profile Size on the Quality of Recommendation

Cross-Domain Collaborative Recommendation in a Cold-Start Context: The Impact of User Profile Size on the Quality of Recommendation Shaghayegh Sahebi and Peter Brusilovsky Intelligent Systems Program University

### EVALUATING THE IMPACT OF DEMOGRAPHIC DATA ON A HYBRID RECOMMENDER MODEL

Vol. 12, No. 2, pp. 149-167 ISSN: 1645-7641 EVALUATING THE IMPACT OF DEMOGRAPHIC DATA ON A HYBRID RECOMMENDER MODEL Edson B. Santos Junior. Institute of Mathematics and Computer Science Sao Paulo University

### Super-Agent Based Reputation Management with a Practical Reward Mechanism in Decentralized Systems

Super-Agent Based Reputation Management with a Practical Reward Mechanism in Decentralized Systems Yao Wang, Jie Zhang, and Julita Vassileva Department of Computer Science, University of Saskatchewan,

### Data Mining in Web Search Engine Optimization and User Assisted Rank Results

Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management

### Chapter 6. The stacking ensemble approach

82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

### The Impact of YouTube Recommendation System on Video Views

The Impact of YouTube Recommendation System on Video Views Renjie Zhou, Samamon Khemmarat, Lixin Gao College of Computer Science and Technology Department of Electrical and Computer Engineering Harbin

### The Application Research of Ant Colony Algorithm in Search Engine Jian Lan Liu1, a, Li Zhu2,b

3rd International Conference on Materials Engineering, Manufacturing Technology and Control (ICMEMTC 2016) The Application Research of Ant Colony Algorithm in Search Engine Jian Lan Liu1, a, Li Zhu2,b

### Addressing Cold Start in Recommender Systems: A Semi-supervised Co-training Algorithm

Addressing Cold Start in Recommender Systems: A Semi-supervised Co-training Algorithm Mi Zhang,2 Jie Tang 3 Xuchen Zhang,2 Xiangyang Xue,2 School of Computer Science, Fudan University 2 Shanghai Key Laboratory

### The Need for Training in Big Data: Experiences and Case Studies

The Need for Training in Big Data: Experiences and Case Studies Guy Lebanon Amazon Background and Disclaimer All opinions are mine; other perspectives are legitimate. Based on my experience as a professor

### Importance of Online Product Reviews from a Consumer s Perspective

Advances in Economics and Business 1(1): 1-5, 2013 DOI: 10.13189/aeb.2013.010101 http://www.hrpub.org Importance of Online Product Reviews from a Consumer s Perspective Georg Lackermair 1,2, Daniel Kailer

### Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10

### RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS

ISBN: 978-972-8924-93-5 2009 IADIS RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS Ben Choi & Sumit Tyagi Computer Science, Louisiana Tech University, USA ABSTRACT In this paper we propose new methods for

### Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

### An Adaptive Index Structure for High-Dimensional Similarity Search

An Adaptive Index Structure for High-Dimensional Similarity Search Abstract A practical method for creating a high dimensional index structure that adapts to the data distribution and scales well with

### Diagnosis of Students Online Learning Portfolios

Diagnosis of Students Online Learning Portfolios Chien-Ming Chen 1, Chao-Yi Li 2, Te-Yi Chan 3, Bin-Shyan Jong 4, and Tsong-Wuu Lin 5 Abstract - Online learning is different from the instruction provided

### Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

### Learning to Rank for Personalised Fashion Recommender Systems via Implicit Feedback

Learning to Rank for Personalised Fashion Recommender Systems via Implicit Feedback Hai Thanh Nguyen 1, Thomas Almenningen 2, Martin Havig 2, Herman Schistad 2, Anders Kofod-Petersen 1,2, Helge Langseth

### Response prediction using collaborative filtering with hierarchies and side-information

Response prediction using collaborative filtering with hierarchies and side-information Aditya Krishna Menon 1 Krishna-Prasad Chitrapura 2 Sachin Garg 2 Deepak Agarwal 3 Nagaraj Kota 2 1 UC San Diego 2

### A Lightweight Solution to the Educational Data Mining Challenge

A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn

### From Entities to Geometry: Towards exploiting Multiple Sources to Predict Relevance

From Entities to Geometry: Towards exploiting Multiple Sources to Predict Relevance Emanuele Di Buccio Department of Information Engineering University of Padua, Italy dibuccio@dei.unipd.it Mounia Lalmas

### Moral Hazard. Itay Goldstein. Wharton School, University of Pennsylvania

Moral Hazard Itay Goldstein Wharton School, University of Pennsylvania 1 Principal-Agent Problem Basic problem in corporate finance: separation of ownership and control: o The owners of the firm are typically

### A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster

, pp.11-20 http://dx.doi.org/10.14257/ ijgdc.2014.7.2.02 A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster Kehe Wu 1, Long Chen 2, Shichao Ye 2 and Yi Li 2 1 Beijing

### Topical Authority Identification in Community Question Answering

Topical Authority Identification in Community Question Answering Guangyou Zhou, Kang Liu, and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences 95

### Designing Ranking Systems for Consumer Reviews: The Impact of Review Subjectivity on Product Sales and Review Quality

Designing Ranking Systems for Consumer Reviews: The Impact of Review Subjectivity on Product Sales and Review Quality Anindya Ghose, Panagiotis G. Ipeirotis {aghose, panos}@stern.nyu.edu Department of

### Predict the Popularity of YouTube Videos Using Early View Data

000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

### Personalizing Image Search from the Photo Sharing Websites

Personalizing Image Search from the Photo Sharing Websites Swetha.P.C, Department of CSE, Atria IT, Bangalore swethapc.reddy@gmail.com Aishwarya.P Professor, Dept.of CSE, Atria IT, Bangalore aishwarya_p27@yahoo.co.in

### So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

### Customer Satisfaction Feedback in an IT Outsourcing Company: A Case Study on the Insigma Hengtian Company

Customer Satisfaction Feedback in an IT Outsourcing Company: A Case Study on the Insigma Hengtian Company Xin Xia 1, David Lo 2, Jingfan Tang 3, and Shanping Li 1 1 College of Computer Science and Technology,

### Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

### Towards Inferring Web Page Relevance An Eye-Tracking Study

Towards Inferring Web Page Relevance An Eye-Tracking Study 1, iconf2015@gwizdka.com Yinglong Zhang 1, ylzhang@utexas.edu 1 The University of Texas at Austin Abstract We present initial results from a project,

### Automated Collaborative Filtering Applications for Online Recruitment Services

Automated Collaborative Filtering Applications for Online Recruitment Services Rachael Rafter, Keith Bradley, Barry Smyth Smart Media Institute, Department of Computer Science, University College Dublin,

### Model-Based Cluster Analysis for Web Users Sessions

Model-Based Cluster Analysis for Web Users Sessions George Pallis, Lefteris Angelis, and Athena Vakali Department of Informatics, Aristotle University of Thessaloniki, 54124, Thessaloniki, Greece gpallis@ccf.auth.gr

### Viewability Prediction for Online Display Ads

Viewability Prediction for Online Display Ads Chong Wang Information Systems New Jersey Institute of Technology Newark, NJ 07003, USA cw87@njit.edu Achir Kalra Forbes Media 499 Washington Blvd Jersey City,

### Big Data Analytics. Lucas Rego Drumond

Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Going For Large Scale Application Scenario: Recommender

### Comparing Recommendations Made by Online Systems and Friends

Comparing Recommendations Made by Online Systems and Friends Rashmi Sinha and Kirsten Swearingen SIMS, University of California Berkeley, CA 94720 {sinha, kirstens}@sims.berkeley.edu Abstract: The quality

### EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

### Recommending News Articles using Cosine Similarity Function Rajendra LVN 1, Qing Wang 2 and John Dilip Raj 1

Paper 1886-2014 Recommending News s using Cosine Similarity Function Rajendra LVN 1, Qing Wang 2 and John Dilip Raj 1 1 GE Capital Retail Finance, 2 Warwick Business School ABSTRACT Predicting news articles

### REVIEW ON QUERY CLUSTERING ALGORITHMS FOR SEARCH ENGINE OPTIMIZATION

Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING

### Identifying Market Price Levels using Differential Evolution

Identifying Market Price Levels using Differential Evolution Michael Mayo University of Waikato, Hamilton, New Zealand mmayo@waikato.ac.nz WWW home page: http://www.cs.waikato.ac.nz/~mmayo/ Abstract. Evolutionary

### A Web Recommender System for Recommending, Predicting and Personalizing Music Playlists

A Web Recommender System for Recommending, Predicting and Personalizing Music Playlists Zeina Chedrawy 1, Syed Sibte Raza Abidi 1 1 Faculty of Computer Science, Dalhousie University, Halifax, Canada {chedrawy,

### A Social Network-Based Recommender System (SNRS)

A Social Network-Based Recommender System (SNRS) Jianming He and Wesley W. Chu Computer Science Department University of California, Los Angeles, CA 90095 jmhek@cs.ucla.edu, wwc@cs.ucla.edu Abstract. Social

### Web Database Integration

Web Database Integration Wei Liu School of Information Renmin University of China Beijing, 100872, China gue2@ruc.edu.cn Xiaofeng Meng School of Information Renmin University of China Beijing, 100872,

### Dynamical Clustering of Personalized Web Search Results

Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC xshen@cs.uiuc.edu Hong Cheng CS Dept, UIUC hcheng3@uiuc.edu Abstract Most current search engines present the user a ranked

### Draft Martin Doerr ICS-FORTH, Heraklion, Crete Oct 4, 2001

A comparison of the OpenGIS TM Abstract Specification with the CIDOC CRM 3.2 Draft Martin Doerr ICS-FORTH, Heraklion, Crete Oct 4, 2001 1 Introduction This Mapping has the purpose to identify, if the OpenGIS

### MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of

### On Top-k Recommendation using Social Networks

On Top-k Recommendation using Social Networks Xiwang Yang, Harald Steck,Yang Guo and Yong Liu Polytechnic Institute of NYU, Brooklyn, NY, USA 1121 Bell Labs, Alcatel-Lucent, New Jersey Email: xyang1@students.poly.edu,

### ! E6893 Big Data Analytics Lecture 5:! Big Data Analytics Algorithms -- II

! E6893 Big Data Analytics Lecture 5:! Big Data Analytics Algorithms -- II Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and

### Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek

Recommender Systems: Content-based, Knowledge-based, Hybrid Radek Pelánek 2015 Today lecture, basic principles: content-based knowledge-based hybrid, choice of approach,... critiquing, explanations,...

### Contact Recommendations from Aggegrated On-Line Activity

Contact Recommendations from Aggegrated On-Line Activity Abigail Gertner, Justin Richer, and Thomas Bartee The MITRE Corporation 202 Burlington Road, Bedford, MA 01730 {gertner,jricher,tbartee}@mitre.org

IADIS International Journal on WWW/Internet Vol. 12, No. 1, pp. 52-64 ISSN: 1645-7641 USER INTENT PREDICTION FROM ACCESS LOG IN ONLINE SHOP Hidekazu Yanagimoto. Osaka Prefecture University. 1-1, Gakuen-cho,

### Content-Boosted Collaborative Filtering for Improved Recommendations

Proceedings of the Eighteenth National Conference on Artificial Intelligence(AAAI-2002), pp. 187-192, Edmonton, Canada, July 2002 Content-Boosted Collaborative Filtering for Improved Recommendations Prem

### The Expectation Maximization Algorithm A short tutorial

The Expectation Maximiation Algorithm A short tutorial Sean Borman Comments and corrections to: em-tut at seanborman dot com July 8 2004 Last updated January 09, 2009 Revision history 2009-0-09 Corrected

### Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

, pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and

### Random forest algorithm in big data environment

Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest

### Identifying Best Bet Web Search Results by Mining Past User Behavior

Identifying Best Bet Web Search Results by Mining Past User Behavior Eugene Agichtein Microsoft Research Redmond, WA, USA eugeneag@microsoft.com Zijian Zheng Microsoft Corporation Redmond, WA, USA zijianz@microsoft.com

### Comparing IPL2 and Yahoo! Answers: A Case Study of Digital Reference and Community Based Question Answering

Comparing and : A Case Study of Digital Reference and Community Based Answering Dan Wu 1 and Daqing He 1 School of Information Management, Wuhan University School of Information Sciences, University of

### User Data Analytics and Recommender System for Discovery Engine

User Data Analytics and Recommender System for Discovery Engine Yu Wang Master of Science Thesis Stockholm, Sweden 2013 TRITA- ICT- EX- 2013: 88 User Data Analytics and Recommender System for Discovery

### Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints

Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints Michael Bauer, Srinivasan Ravichandran University of Wisconsin-Madison Department of Computer Sciences {bauer, srini}@cs.wisc.edu

### Measuring Lift Quality in Database Marketing

Measuring Lift Quality in Database Marketing Gregory Piatetsky-Shapiro Xchange Inc. One Lincoln Plaza, 89 South Street Boston, MA 2111 gps@xchange.com Sam Steingold Xchange Inc. One Lincoln Plaza, 89 South

### 1 Maximum likelihood estimation

COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

### Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com

### arxiv:1506.04135v1 [cs.ir] 12 Jun 2015

Reducing offline evaluation bias of collaborative filtering algorithms Arnaud de Myttenaere 1,2, Boris Golden 1, Bénédicte Le Grand 3 & Fabrice Rossi 2 arxiv:1506.04135v1 [cs.ir] 12 Jun 2015 1 - Viadeo

### SOFTWARE REQUIREMENTS SPECIFICATION DOCUMENT FOR MUSIC RECOMMENDER SYSTEM

SOFTWARE REQUIREMENTS SPECIFICATION DOCUMENT FOR MUSIC RECOMMENDER SYSTEM METU DEPARTMENT OF COMPUTER ENGINEERING MEMORYLEAK 2013 VERSION 1.1 1 Table of Contents 1. Introduction... 4 1.1 Problem Definition...

### Linear programming approach for online advertising

Linear programming approach for online advertising Igor Trajkovski Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, Rugjer Boshkovikj 16, P.O. Box 393, 1000 Skopje,

### GAZETRACKERrM: SOFTWARE DESIGNED TO FACILITATE EYE MOVEMENT ANALYSIS

GAZETRACKERrM: SOFTWARE DESIGNED TO FACILITATE EYE MOVEMENT ANALYSIS Chris kankford Dept. of Systems Engineering Olsson Hall, University of Virginia Charlottesville, VA 22903 804-296-3846 cpl2b@virginia.edu

### DYNAMIC RANGE IMPROVEMENT THROUGH MULTIPLE EXPOSURES. Mark A. Robertson, Sean Borman, and Robert L. Stevenson

c 1999 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or

### BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE

BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE Alex Lin Senior Architect Intelligent Mining alin@intelligentmining.com Outline Predictive modeling methodology k-nearest Neighbor

### II. RELATED WORK. Sentiment Mining

Sentiment Mining Using Ensemble Classification Models Matthew Whitehead and Larry Yaeger Indiana University School of Informatics 901 E. 10th St. Bloomington, IN 47408 {mewhiteh, larryy}@indiana.edu Abstract

### CS229 Project Report Automated Stock Trading Using Machine Learning Algorithms

CS229 roject Report Automated Stock Trading Using Machine Learning Algorithms Tianxin Dai tianxind@stanford.edu Arpan Shah ashah29@stanford.edu Hongxia Zhong hongxia.zhong@stanford.edu 1. Introduction

### Chapter 21: The Discounted Utility Model

Chapter 21: The Discounted Utility Model 21.1: Introduction This is an important chapter in that it introduces, and explores the implications of, an empirically relevant utility function representing intertemporal

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

### Hybrid model rating prediction with Linked Open Data for Recommender Systems

Hybrid model rating prediction with Linked Open Data for Recommender Systems Andrés Moreno 12 Christian Ariza-Porras 1, Paula Lago 1, Claudia Jiménez-Guarín 1, Harold Castro 1, and Michel Riveill 2 1 School

### Predicting the Stock Market with News Articles

Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is

### Semantically Enhanced Web Personalization Approaches and Techniques

Semantically Enhanced Web Personalization Approaches and Techniques Dario Vuljani, Lidia Rovan, Mirta Baranovi Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, HR-10000 Zagreb,

### A Model For Revelation Of Data Leakage In Data Distribution

A Model For Revelation Of Data Leakage In Data Distribution Saranya.R Assistant Professor, Department Of Computer Science and Engineering Lord Jegannath college of Engineering and Technology Nagercoil,

### Visualizing FRBR. Tanja Merčun 1 Maja Žumer 2

Visualizing FRBR Tanja Merčun 1 Maja Žumer 2 Abstract FRBR is a model with great potential for our catalogues and bibliographic databases, especially for organizing and displaying search results, improving

### Big & Personal: data and models behind Netflix recommendations

Big & Personal: data and models behind Netflix recommendations Xavier Amatriain Netflix xavier@netflix.com ABSTRACT Since the Netflix \$1 million Prize, announced in 2006, our company has been known to

### COMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection.

COMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection. Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise

### Gamma Distribution Fitting

Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

### A General Approach to Incorporate Data Quality Matrices into Data Mining Algorithms

A General Approach to Incorporate Data Quality Matrices into Data Mining Algorithms Ian Davidson 1st author's affiliation 1st line of address 2nd line of address Telephone number, incl country code 1st

### KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS

ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,

### Financial Risk Forecasting Chapter 8 Backtesting and stresstesting

Financial Risk Forecasting Chapter 8 Backtesting and stresstesting Jon Danielsson London School of Economics 2015 To accompany Financial Risk Forecasting http://www.financialriskforecasting.com/ Published

### Probabilistic topic models for sentiment analysis on the Web

University of Exeter Department of Computer Science Probabilistic topic models for sentiment analysis on the Web Chenghua Lin September 2011 Submitted by Chenghua Lin, to the the University of Exeter as

### lop Building Machine Learning Systems with Python en source

Building Machine Learning Systems with Python Master the art of machine learning with Python and build effective machine learning systems with this intensive handson guide Willi Richert Luis Pedro Coelho

### Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

### Mining Navigation Histories for User Need Recognition

Mining Navigation Histories for User Need Recognition Fabio Gasparetti and Alessandro Micarelli and Giuseppe Sansonetti Roma Tre University, Via della Vasca Navale 79, Rome, 00146 Italy {gaspare,micarel,gsansone}@dia.uniroma3.it