How to Calculate a People Review and Its Importance



Similar documents
Priorities-Based Review Computation

A model of indirect reputation assessment for adaptive buying agents in electronic markets

Trust, Reputation and Fairness in Online Auctions

Super-Agent Based Reputation Management with a Practical Reward Mechanism in Decentralized Systems

Permanent Link:

arxiv: v1 [cs.ir] 12 Jun 2015

Reputation Rating Mode and Aggregating Method of Online Reputation Management System *

Detection of Collusion Behaviors in Online Reputation Systems

A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data

A Reputation Management and Selection Advisor Schemes for Peer-to-Peer Systems

Reputation Management: Experiments on the Robustness of ROCQ

A Review on Trust and Reputation for Web Service Selection

Utility of Distrust in Online Recommender Systems

Credit Card Fraud Detection and Concept-Drift Adaptation with Delayed Supervised Information

A Blueprint for Universal Trust Management Services

Multiagent Reputation Management to Achieve Robust Software Using Redundancy

Attacks and vulnerabilities of trust and reputation models *

Immunizing Online Reputation Reporting Systems Against Unfair Ratings and Discriminatory Behavior

Introducing diversity among the models of multi-label classification ensemble

Hedaquin: A reputation-based health data quality indicator

Decoupling Service and Feedback Trust in a Peer-to-Peer Reputation System

How To Perform An Ensemble Analysis

SIP Service Providers and The Spam Problem

Reputation and Endorsement for Web Services

A Sarsa based Autonomous Stock Trading Agent

A Super-Agent Based Framework for Reputation Management and Community Formation in Decentralized Systems

Using News Articles to Predict Stock Price Movements

BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE

A Trust-Evaluation Metric for Cloud applications

IDENTIFYING HONEST RECOMMENDERS IN REPUTATION SYSTEMS

How To Manage Your Information On A Network With A User Account On A Computer Or Cell Phone (For A Free)

Accurate is not always good: How Accuracy Metrics have hurt Recommender Systems

A Reputation Management System in Structured Peer-to-Peer Networks

Content Delivery Network (CDN) and P2P Model

Reputation-Oriented Trustworthy Computing in E-Commerce Environments

An Overview of Knowledge Discovery Database and Data mining Techniques

Characterizing Task Usage Shapes in Google s Compute Clusters

Towards Incentive-Compatible Reputation Management

IMPROVING BUSINESS PROCESS MODELING USING RECOMMENDATION METHOD


Journal of Industrial Engineering Research. Adaptive sequence of Key Pose Detection for Human Action Recognition

EADRM: A Framework for Explanation-Aware Distributed Reputation Management of Web Services

A Model For Revelation Of Data Leakage In Data Distribution

A QoS-Aware Web Service Selection Based on Clustering

Data Mining - Evaluation of Classifiers

arxiv: v1 [cs.si] 18 Aug 2013

Mimicking human fake review detection on Trustpilot

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

A Secure Online Reputation Defense System from Unfair Ratings using Anomaly Detections

Machine Learning Final Project Spam Filtering

On the Application of Trust and Reputation Management and User-centric Techniques for Identity Management Systems

Fraud Detection in Electronic Auction

Online Reputation Analysis For Sellers

IBM SPSS Direct Marketing 23

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Achieve Better Ranking Accuracy Using CloudRank Framework for Cloud Services

Evaluation Model of Buyers Dynamic Reputation in E-commerce

Conceptual Model of Web Service Reputation

PathTrust: A Trust-Based Reputation Service for Virtual Organization Formation

Key Factors for Developing a Successful E-commerce Website

IBM SPSS Direct Marketing 22

Performance Metrics for Graph Mining Tasks

CLASSIFYING SERVICES USING A BINARY VECTOR CLUSTERING ALGORITHM: PRELIMINARY RESULTS

Adaptation of the ACO heuristic for sequencing learning activities

A Simulation Analysis of Formations for Flying Multirobot Systems

Analyzing the Procurement Process in a Simplified Version of the TAC SCM Game

Performance Evaluation of Reusable Software Components

International Journal of Engineering Research ISSN: & Management Technology November-2015 Volume 2, Issue-6

The Optimistic Total Order Protocol

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2

Design of a Multiagent-based E-Marketplace to Secure Service Trading on the Internet

Term extraction for user profiling: evaluation by the user

Supplement to Call Centers with Delay Information: Models and Insights

A Web Recommender System for Recommending, Predicting and Personalizing Music Playlists

Component Ordering in Independent Component Analysis Based on Data Power

Big Data with Rough Set Using Map- Reduce

A Better Statistical Method for A/B Testing in Marketing Campaigns

Dishonest Behaviors in Online Rating Systems: Cyber. Competition, Attack Models, and Attack Generator

SocioPath: In Whom You Trust?

The Role of Visualization in Effective Data Cleaning

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age.

Bootstrapping Big Data

Establishing How Many VoIP Calls a Wireless LAN Can Support Without Performance Degradation

Trust and Reputation Management

Intelligent Agents Serving Based On The Society Information

Recommending News Articles using Cosine Similarity Function Rajendra LVN 1, Qing Wang 2 and John Dilip Raj 1

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

A reputation-based trust management in peer-to-peer network systems

A Multi-agent System for Knowledge Management based on the Implicit Culture Framework

Personal and Behavioural Model of Reputation

A Reference Model for Reputation Systems

Abstract. Governance TRADERS CITIZENS

Towards Inferring Web Page Relevance An Eye-Tracking Study

Weighted Graph Approach for Trust Reputation Management

Implementation of P2P Reputation Management Using Distributed Identities and Decentralized Recommendation Chains

A Survey on Product Aspect Ranking

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

The Method of Personalized Recommendation with Ensemble Combination

Transcription:

Subjective Review-based Reputation Gianpiero Costantino, Charles Morisset, and Marinella Petrocchi IIT-CNR, Via Moruzzi 1, Pisa, Italy name.surname@iit.cnr.it ABSTRACT The choice of a product or a service is often influenced by its reputation, which is usually calculated from existing reviews of this product or service. A review can be either objective, for instance when referring to concrete features of a product, or subjective, for instance when referring to the feeling of the reviewer about one aspect. Subjective reviews are potentially biased by the characteristics of the reviewers, and therefore two subjective reviews should not be treated equally. We propose in this paper a model of reputation compensating the subjective bias of different categories of reviewers. We firstly calculate this bias by analyzing the ratio between reviews coming from different categories, and then we project a subjective reputation for a given category of reviewer. We demonstrate the accuracy of our bias calculation with an experimentation on public reviews for hotels, and two specific categories of users. Keywords Reputation, Compensation, Review 1. INTRODUCTION The choice of a service (or product) is often guided by the reputation associated with this service, which can be provided by different sources from the Internet. These sources usually offer a mechanism for previous users to submit a review concerning this service, and aggregate these reviews in order to calculate a reputation score, which in turn provides a classification allowing a user to select the best service. A review could consist of multiple parameters. Popular websites giving traveler advice, such as Booking.com, offer the possibility to review hotels based, for instance, on cleanness, staff, or location. The target of the reviews will obtain The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant no 257930 (Aniketos) and under grant no 256980 (NESSoS). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SAC 12 March 25-29, 2012, Riva del Garda, Italy. Copyright 2011 ACM 978-1-4503-0857-1/12/03...$10.00. an overall score due to the elaboration of the reviews on the single parameters. Depending on the parameters involved, a review can either be objective or subjective. An objective review usually concerns facts or features of the service, that are observed by the reviewer, and that are not subject to interpretation. For instance, a review concerning an airline can include the number of flight delays the reviewer had to go through. Whether the reviewer is a businessman or a backpack traveler, the number of delays this reviewer has suffered is the same. A user consulting the reviews for this airline can then calculate, statistically speaking, the probability she will experience a flight delay if she flies with this airline. On the other hand, a subjective review contains the appreciation of the reviewer on some features of the service. The major difference with an objective review is that the subjective review can vary according to the typology of the reviewer. For instance, on a scale of five stars, a businessman used to travel with high-standards airlines, with as few delays as possible, might rate the punctuality of an airline with one flight delayed over ten as two stars. However, a backpack traveler used to travel with budget airlines might give four stars for the punctuality of the same airline. In other words, for the same observation, two different reviewers might give two different reviews. When a user consults the global score for this airline, she is given the average of the two previous subjective reviews, and so a punctuality score of three stars. However, this value is not necessarily meaningful: if the user is a backpack traveler, she will rather be interested in the review given by the backpack traveler, and similarly if she is a businesswoman. Most reputation providers permit to browse reputation scores by categories of reviewers, however, in some cases, there is no review provided by a given category. For instance, consider another airline for which there are only reviews from businessman: if a backpack traveler consults the reputation for such an airline, then she knows there might be a bias. Two questions arise from this simple example: the first one is whether it is possible or not to calculate automatically this bias, using previous reviews for similar services. In other words, given on the one hand the reviews of businessman travelers and backpack travelers on one airline, and on the other hand only the reviews of businessman travelers for another airline, whether it is possible to automatically calculate what would be the review of a backpack traveler for this airline. The second question is whether we can extend this calculation to define a notion of subjective reputation, that is, for a given category of users, compensating all the

reviews submitted by reviewers from another category in order to obtain a more accurate reputation. The main contribution of this paper is to introduce a model for subjective review-based reputation, which automatically calculates the bias between reviews from different categories, based on the analysis of existing reviews. This model is illustrated and validated on the example of hotel reviews, by considering two different categories of users, and we show how to project a review from one category based on a review from the other. We demonstrate that our projections brings more accuracy than relying only on the existing reviews. This paper is structured as follows. Section 2 recalls existing reputation models. In Section 3, we formalise the proposed prediction model. Section 4 presents our experimental approach, consisting of a first phase for computing the compensation parameters, and a second phase aiming at validating them. Section 5 discusses the validity of our proposal. Finally, Section 6 concludes the paper. 2. EXISTING REPUTATION MODELS Online reputation systems have been adopted in the past few years as a mean for decision making on the Web. Parties in a community may decide to choose a specific object (e.g., a product or a service) based on their reputation scores. Usually, a reputation system computes and makes public such scores, based on a collection of reviews that the community gives about that object. The reputation can be dynamically updated according to various kind of algorithms, that are generally built on the principle that the updated reputation is a function of the old reputation and the most recent feedback. Indeed, Jøsang et al. note that, from the relying party point of view, reputation scores can be computed based on a combination of personal experience (represented by the last new feedback) and second hand referrals (represented by past feedback) [12]. Simple reputation models, as the one adopted by Ebay prior to May, 2008, exploit a plain combination of past and new feedback. In particular, ebay (www.ebay.com) recorded the feedback obtained after that a business transaction has been carried out between a seller and a buyer. The reputation of a user was thus computed by summing the number of positive and negative ratings separately, and keeping a total score as the positive score minus the negative score. From May 2008, Ebay started considering only the percentage of positive ratings of the last twelve months. The authors in [19] consider a simple scheme, whose principle has been adopted by popular websites for e-commerce, such as Amazon (www.amazon.com) and Epinions (www.epinions.com). In such a scheme, the reputation score is the average of all the ratings. Thus, past and new feedback contribute in an equal manner to the calculation of the user s reputation. More complex reputation models combine the old value of reputation and the new feedback to form a new value computed as a weighted mean. Some proposals exist in the literature on how to determine the weights. For example, work in [3, 5, 23] consider the trustworthiness of the reviewer, while authors of [11] evaluate the different degrees of the user s satisfaction for a set of parameters characterizing the object. Work in [24] prioritises reviews by their origin, such that reviews posted by users with an established attendance in the system are weighted more than the reviews given by beginners. Similarly, different priorities may be put on reviews posted by different categories of users, in order for instance to weigh more the reviews given by professionals and give less weight to reviews given by regular users [21, 4]. Work in [1] considers the evolution of sellers reputation in electronic marketplaces as an aggregation of past and recent transactions, and proposes an optimisation of the Window Aggregation Mechanism, in which the seller s score is the average of the last n most recent ratings. Also, work in [2] considers the Weighted Aggregation Mechanism, where the seller s score is a weighted average of past ratings, optimal with respect to incentivise the seller to be truthful. Fan et al. [8] propose to achieve a similar goal by adopting exponential smoothing to aggregate ratings, and evaluate it through simulations. While this is not the focus of our paper, we acknowledge research work in the area of immunising reviewing systems against unfair (or incomplete) ratings [6, 22, 25, 7, 10]. Our work is closely related to the paradigm of Collaborative Filtering systems, see, e.g., [18, 13, 20, 15, 9]. As reputation systems, they collect feedback from users about products or services. The intuition behind such systems is that tastes vary from person to person, and people review objects according to their subjective attitude and preferences. Collaborative Filtering aims at creating categories of users, also called neighborhood, grouped by similarities of tastes and ways in which they rate and choose objects, in such a way that it would be possible to recommend objects, which one user likes, to members of her neighborhood [17]. This idea finds its implementation in the so called Recommender Systems, see, e.g., [16, 26, 14]. To some extent, we propose here an inter-neighborhood collaborative filtering technique. Indeed, we present a formalization and an experimental approach to predict a review from one neighbourhood of users based on a review from another neighbourhood. 3. GENERAL MODEL We present in this section our subjective review-based reputation model. Intuitively, we first calculate the distance between the reviews from two categories of users, using a set of objects as references. This distance is then used to define a compensation function, which, given a review from one category of users, predicts the score for another category. The subjective reputation score for one category is finally obtained by compensating all the reviews from the other categories. 3.1 Compensation function Roughly speaking, a review is given by a user over an object. Hence, we consider a set of users U, a set of objects O and a set of atomic scores S, and we write R u = U O S for the set of all possible user reviews. Users can often be associated with categories, and therefore we introduced a set C of categories. We assume here that atomic scores belong to a domain providing usual arithmetic operations, and that it is therefore possible to calculate the average of a set of scores. Given a function f : U C associating each user with a category, and an object o O, we introduce the function γ f : O C S, which gives the score of an object o for a certain category of users, and is defined as: o O c C γ f (o, c) = avg({s (u, o, s) R u f(u) = c}) where avg denotes the function returning the average of a

set of atomic scores. In practice, it can be observed that users from different categories can give different reviews for a same object. For instance, some tourists from a given country might have higher standards than tourists from other countries, and thus giving lower scores. Similarly, people traveling alone for business purposes might not attach the same importance to the different features of a hotel compared to, for instance, the more senior travelers. Following this intuition, we claim that the ratio between the scores of two different categories can be analysed and predicted. More formally, given two categories c 1 and c 2 and a set of objects X, we define the function, which computes the average ratio between the scores of the two categories for the objects in X: (c 1, c 2, X) = avg({ γ f (o, c 2) γ f (o, c 1) o X}) Assuming a fixed set X to calculate the function, we can now estimate the score of an object for a given category, based on the score of another category and the average ratio. Such an estimation is done with the function π : O C C S, which is defined by: o O \ X c 1, c 2 C π(o, c 1, c 2) = γ f (o, c 2) (c 2, c 1, X) In order to make a clear difference between the objects that are used to define the function and the objects on which we want to predict the score, the function π is not defined on the set X. With this definition, π(o, c 1, c 2) can be read as the score that could have given the category c 1, using the score of the category c 2 and the average ratio between the score from c 2 to c 1. Consider a user of the category c 1, who wants to read a review for an object o. She potentially has the choice between three different scores: the one given by the users of the same category, γ f (o, c 1), the one given by the users of another category c 2, γ f (o, c 2), or the projection of the score of the users of the category c 2 adjusted for the category c 1, π(o, c 1, c 2). Clearly, if it exists, the score γ f (o, c 1) is probably the most adapted, however, in some situations, such a score might not exist, or might not be accurate enough due to a small number of reviews for the category c 1. We claim, and demonstrate in Section 4 for a specific example, that the score given by π(o, c 1, c 2) is more accurate than γ f (o, c 2), that is, it is usually interesting for a user of the category c 1 to use the projection rather than the score of another category. 3.2 Subjective Reputation In its simplest form, the reputation score of an object is calculated by averaging all the reviews from the different categories, that is, the reputation is given by a function R : O S defined as: R(o) = avg({s (u, o, s) R u}) We propose here to use the compensation function introduced in the previous Section to define a reputation subjectively defined for a given category of users, where all reviews are compensated to this category. Hence, given a category c and an object o, we can calculate the overall subjective reputation R s c of o for the category c as: R s c(o) = avg({π(o, c, c ) c C \ {c}} {γ f (o, c)}) Clean Comf. Serv. Staff Q/P (mc, st, X) 0,972 0,976 0.962 0.962 0.972 Variance 0.009 0.01 0.015 0.006 0.011 Table 1: Ratio and Variance This subjective reputation allows a user from a category to obtain the review on the object o as if all reviews were submitted by reviewers from the same category, even if there is no review from this category, i.e., {γ f (o, c)} =. Since the calculation of each π is directly based on the original scores of the other categories, it is possible to weigh these scores, and therefore other reputation models can be straight-forwardly included, such as different weights for different reviewers attributes. 4. EXPERIMENTAL APPROACH We carry out an analysis of existing reviews of categories of users. Reviews are taken from the Booking.com website, a popular website specialised in publishing travelers advice. For this particular experiment, we focus on four stars hotels, and we collect reviews given by two categories of users: the solo travelers (st) and the mature couples (mc), and we show how to compensate the review from a mature couple in order to obtain the review from a solo traveler. Each hotel review consists of an individual review for five different parameters: Clean, Comfort, Services, Staff, and Quality/Price, such that each individual review is a numerical value comprised between 1 and 10, and expresses the degree of satisfaction of the user who has experienced that hotel. In order to show to the website visitors a comprehensive score for a hotel, Booking.com presents an overall reputation score that consists of the mean of the five reviews. We consider here a reviewed object to be a pair (hotel, subservice), and we directly extract for each hotel and each subservice the review for each category of users. For instance, γ f ((h, Clean), st) stands for the review given by the solo travelers for the cleanliness of the hotel h. By extension, we also consider the implicit review for a hotel, which is given by the average of the reviews for each subservice, and we write directly γ f (h, c) for such a review. Our experimentation consists of two phases: the calculation phase, where we analyse the reviews in order to define the ratio function (and thus the projection function π), and the validation phase, where we check if this function is accurate enough. 4.1 Calculation Phase In this phase, we pick a set X of fifty hotels in New York City, and we calculate for each subservice sub the ratio (mc, st, (X, sub)), as presented in Table 1, in order to define the projection function π. We also show for each ratio the corresponding variance obtained (i.e. the mean of the squared deviation). We can observe that the ratio is always below 1, meaning that solo travelers give, in general, a lower review compared to mature couples. Moreover, the variance is quite small, which indicates a narrow range for the ratio. Note that we do not calculate directly the ratio for the overall score, as it is calculated from the scores of the subservices. The interest here of using the subservices instead of directly calculating the ratio for the overall score is to allow a user to ignore

Clean Comf. Serv. Staff Q/P Score γ f (h, mc) 8.2 8 7.7 8.4 6.8 7.82 γ f (h, st) 7.9 7.8 7.1 7.6 6.4 7.36 π(h, st, mc) 8 7,8 7.4 8.1 6.6 7,57 some subservices. Table 2: Scores and projection 4.2 Validation Phase The second phase aims at validating the compensation ratio that we calculated in the first phase. To do this, we collect three sets of data from Booking.com, related to hotels in Paris, Madrid, and New York City (this last set is different from the one used for the calculation phase). Each set considers fifty hotels in the four stars category. The validation phase consists in, for each hotel in the validation set: (i) Calculating the compensated score for each subservice of the review of mature couples to obtain the compensated review for solo travelers; (ii) Calculating the distance between the overall score of the compensated review and overall score of the real review for solo travelers, in order to obtain our error margin; (iii) Checking that the distance between the compensated review and the real solo traveler review is smaller than the distance between the mature couples review and the real solo travel review, in order to see if the compensated review is more accurate than the mature couples one. 4.2.1 Validation Case: New York City In our first survey, we focus on fifty hotels in NYC belonging to the four stars category (this set is different from the one considered in Section 4.1). We collect all reviews from mature couples, and then we apply our compensation function to them. To estimate how much the prediction is accurate, we calculate the distance between the compensated reviews and the real reviews by solo travelers. For instance, consider a hotel h, such that the reviews given by mature couples and solo travelers are given in Table 2 by γ f (h, mc) and γ f (h, st), respectively. The last row of Table 2 gives the compensated score π(h, st, mc), calculated using (mc, st, X) from Section 4.1. We can observe two points on this example. The first one is that the accuracy of the compensation is not the same for each subservice: for instance, the comfort has been correctly projected, while there is a significant difference between the projected score for Staff (8.1) and the actual score (7.6). The second point is that each projected score is lower than the corresponding score given by mature couples (which is consistent with the fact that each ratio in Table 1 is below 1), and above the actual score given by solo travelers. Hence, we can see that in this example, our projection is closer to the actual review than the review from mature couples is. The distance between the compensated overall score and the real score changes from a hotel to another. For instance, for the hotel h, we can observe that the distance between γ f (h, st) and π(h, st, mc) is equal to 0.21. Figure 1 shows the repartition of the minimal distance between the compensated overall score and the real score for the NYC validation set. This graph can be read as follows: in a bit more than 80% of the cases, the error margin is greater or equal to 0.1 (leftmost bar), while in only 10% of the cases the error Error (%) 100 90 80 70 60 50 40 30 20 10 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 1,5 2 Distance Figure 1: NYC 4 star hotels: Validation results Error (%) 100 90 80 70 60 50 40 30 20 10 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 1,5 2 Distance Figure 2: Paris 4 star hotels: Validation results margin is greater or equal to 1. It is interesting to observe that in only 30% of the case, the distance will be greater or equal to 0.4, which is quite small, considering that the overall score is a numerical value between 1 and 10. The distances shown in Figure 1 are absolute distances, as they only consider distances between the compensated review and the real one. We focus in Section 4.2.4 on the relative distances, that are distances between the compensated reviews and the real reviews on the one hand, and on the compensated reviews and the mature couples reviews on the other hand. 4.2.2 Validation Case: Paris In the second survey, we collect reviews of mature couples for 50 hotels in Paris. Figure 2 shows the trend of the distance between the real and the compensated review. We obtain better results for the Paris dataset than for the NYC dataset. Indeed, we are able to predict a compensated review with a distance to the real review smaller than 0.1 in 40% of the cases (while for the NYC the same distance was predictable in only 20% of the cases). The probability of making an error in the prediction further decreases by considering bigger, but still acceptable, distances. Indeed, in only 10% of the cases, the distance is greater or equal to 0.3. 4.2.3 Validation Case: Madrid For the last survey, we consider 50 hotels located in Madrid. Figure 3 shows that in 80% of our predictions, the distance between the compensated review and the real one is greater or equal to 0.1. However, in only about 15% of the cases, the distance is greater or equal to 0.3. We notice that, also in this scenario, the compensation function provides reviews that are close to those from solo

Error (%) 100 90 80 70 60 50 40 30 20 10 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 1,5 2 Distance Figure 3: Madrid 4 star hotels: Validation results travelers, even if the best result of our validation has been obtained in the dataset of the hotels located in Paris. 4.2.4 Compensated Vs Mature Couples The previous Sections show the absolute distance between our compensated values and the actual values. Another important measure of accuracy for our approach is the comparison of, on the one hand, the distance between the original mature couple review and the actual solo traveler review, and on the other hand the distance between the compensated review and the actual solo traveler review. For instance, with the values of Table 2, we can see that for the distance of the hotel h between γ f (h, mc) and γ f (h, st) is equal to 0.46, while the distance between γ f (h, st) and π(h, st, mc) is equal to 0.21. In other words, our prediction is closer to the actual review than the review from mature couples is. It is therefore better for a solo traveler to consult our prediction compared to the mature couple review. In Table 3, we summarize the results obtained using our compensation function. For each dataset, we compare the compensated review with the review given by mature couples. Then, we count how many compensated values are closer to the real solo travelers reviews. This case is labelled Better in the table, since it means that it is better for a solo traveler to consider the compensated review than the mature couple one, and that therefore our method is better than doing nothing. Similarly, with Worse we indicate the situation in which a mature couple score is closer to a solo traveler one than our compensated value, that is, when it would be better for a solo traveler to consider the mature couple review instead of the compensated one. For instance, for 72% of the hotels in the NYC validation set the compensated review is closer to the one of solo travelers than to the one of mature couples. Similar findings have been obtained in the dataset of hotels located in Paris. However, in the last test carried out with the hotels in Madrid, the prediction is better in only 56% of the cases, even though the we obtain the smallest distances for this dataset, as shown in Figure 3. It is interesting to observe that for this particular dataset, the reviews given by mature couples and solo travelers were quite similar, which shows how our compensation function provides less accuracy. From the analysis of all the validation tests that we have carried out, we can conclude that, in absence of reviews posted by solo travelers, people belonging to that category can derive a more accurate review by compensating the reviews of mature couples, rather than relying on the last ones. NYC Paris Madrid Better Worse Better Worse Better Worse 72% 28% 74% 26% 56% 44% Table 3: Prediction statistics 5. DISCUSSION In the previous Section, we have shown that the difference between reviews from solo travelers and mature couples concerning four stars hotels could be calculated with a good accuracy, since for each validation sets, our compensated mature couple review was in average closer to the solo travel review than the original mature couple review. Since we chose the categories and the hotel sets randomly, we can reasonably expect that such a behaviour can be reproduced for other categories and other hotel sets. An implicit assumption of our model is that all users from one category behave similarly, and that the difference between the reviews from two different categories is significant enough. Indeed, as the Madrid example demonstrated, our approach is less accurate when the reviews from one category are not very different from the reviews from another category. Note that the subjective reputation score itself is in practice hard to evaluate. Indeed, we would need a way to verify that the global score corresponds to what a user expects. Our experimentation was done in order to validate our model, rather than to define concrete distance values between mature couples and solo travelers. Clearly, a realworld implementation of this model requires to consider more hotels, and more categories. In particular, an interesting point to consider is the conjunction of compensated reviews. For instance, consider three categories of users A, B, and C, such that we have the compensation functions from A to C and from B to C. Based on the validation phase, the accuracy of these functions can be established, and when computing the subjective reputation for the category C, this accuracy can be used to weight the compensated reviews. In order to further validate our model, we would need to consider other reviewing systems, in order to make sure that the results we have found are not only valid in the context of hotel reviewing system. A particularly interesting use-case would be the reviewing system for scientific conferences. Indeed, informally, some reviewers might be in general more strict than the average, and tend to reject more papers, while other reviewers accept more papers than the average. In this case, for each reviewer, we consider two categories: the category containing only the reviewer, and the category containing all the other reviewers. By calculating the distance between these two categories, we could find out the compensation function for each reviewer. It follows that the review for each paper can be compensated, leading to a normalised review for each paper. 6. CONCLUSION - FUTURE WORKS We have presented in this paper a subjective review-based reputation model, which calculates the distance between reviews coming from two different categories of reviewers in order to compensate the differences and integrate the subjectivity of each review. This model is formally introduced in Section 3, and we demonstrate its applicability in Section 4 by calculating the distance between two categories of

users for hotel reviews, and showing that our compensated reviews are more accurate than non compensated reviews. One of the strengths of this model resides in its user transparency: the compensation functions are calculated automatically from existing reviews, and therefore do not require any extra input from the user (apart from her category). Note that the validation can also be done automatically, in order to establish the accuracy of the compensation functions. Moreover, this approach is compatible with most existing reputation models, as the subjective reputation score is simply a reputation score. The main limitation of our approach is the lack of uniformity between the different reviews from one service to another. Indeed, we do not analyse why there exists a difference between the reviews of two different categories, we simple observe it and reproduce it for new inputs. In other words, as it is often the case with empirical approaches, we cannot guarantee that for a specific hotel, our compensated review will be more accurate than the non compensated one. We can only limit this absence of guarantee by specifying the accuracy of our compensation function over previous datasets. There are three different paths to extend this work. As mentioned in Section 5, it would be interesting to study the behaviour of our model in another context, for instance the reviewing process of a scientific conference. Another extension is to study more refined strategies for the definition of the compensation function, in order to increase its accuracy. For instance, instead of using a simple ratio, we could use a probability distribution function, that would give the probability of the ratio to belong to a particular interval. Finally, this model could allow to detect fake reviews by reversing the approach. Indeed, it might be hard to detect a fake review if it is the only review for a given category. However, we could compensate it into another category with more reviews, and analyze the difference of the compensated review with the other reviews. 7. REFERENCES [1] C. Aperjis and R. Johari. Designing aggregation mechanisms for reputation systems in online marketplaces. SIGecom Exch., 9:3:1 3:4, 2010. [2] C. Aperjis and R. Johari. Optimal windows for aggregating ratings in electronic marketplaces. Management Science, 56(5):864 880, 2010. [3] S. Buchegger and J. Le Boudec. A robust reputation system for mobile ad-hoc networks. Technical report, IC/2003/50 EPFL-IC-LCA, 2003. [4] W. Chen, Q. Zeng, and L. Wenyin. A User Reputation Model for a User-Interactive Question Answering System. In SKG 06, page 40. IEEE, 2006. [5] F. Cornelli, E. Damiani, S. D. C. di Vimercati, S. Paraboschi, and P. Samarati. Choosing reputable servents in a p2p network. In WWW 02, pages 376 386. ACM, 2002. [6] C. Dellarocas. Immunizing online reputation reporting systems against unfair ratings and discriminatory behavior. In ACM Conf. on Electronic Commerce, pages 150 157, 2000. [7] C. Dellarocas and C. A. Wood. The sound of silence in online feedback: Estimating trading risks in the presence of reporting bias. Management Science, 54(3):460 476, 2008. [8] M. Fan, Y. Tan, and A. B. Whinston. Evaluation and design of online cooperative feedback mechanisms for reputation management. IEEE Trans. Knowl. Data Eng., 17(2):244 254, 2005. [9] S. Gong. A Collaborative Filtering Recommendation Algorithm Based on User Clustering and Item Clustering. Journal of Software, 5(7), 2010. [10] J. Gorner, J. Zhang, and R. Cohen. Improving the use of advisor networks for multi-agent trust modelling. In Privacy, Security and Trust, pages 71 78, 2011. [11] N. Griffiths. Task delegation using experience-based multi-dimensional trust. In AAMAS 05, pages 489 496. ACM, 2005. [12] A. Jøsang, R. Ismail, and C. Boyd. A survey of trust and reputation systems for online service provision. Decis. Support Syst., 43:618 644, 2007. [13] Y. Koren. Factor in the neighbors: Scalable and accurate collaborative filtering. ACM Trans. Knowl. Discov. Data, 4:1:1 1:24, 2010. [14] P. Massa and P. Avesani. Trust-aware recommender systems. In Recommender systems, pages 17 24. ACM, 2007. [15] M. C. Pham, Y. Cao, R. Klamma, and M. Jarke. A clustering approach for collaborative filtering recommendation using social network analysis. J. UCS, 17(4):583 604, 2011. [16] F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors. Recommender Systems Handbook. Springer, 2011. [17] A. Saric, M. Hadzikadic, and D. Wilson. Alternative formulas for rating prediction using collaborative filtering. In ISMIS 09, pages 301 310. Springer, 2009. [18] B. Sarwar, G. Karypis, J. Konstan, and J. Reidl. Item-based collaborative filtering recommendation algorithms. In WWW 01, pages 285 295. ACM, 2001. [19] J. Schneider, G. Kortuem, J. Jager, S. Fickas, and Z. Segall. Disseminating trust information in wearable communities. Personal Ubiquitous Comput., 4:245 248, 2000. [20] X. Su and T. M. Khoshgoftaar. A survey of collaborative filtering techniques. Advances in Artificial Intelligence, 2009. [21] T. van Deursen, P. Koster, and M. Petkovic. Hedaquin: A reputation-based health data quality indicator. Electr. Notes Theor. Comput. Sci., 197(2):159 167, 2008. [22] A. Whitby, A. Jøsang, and J. Indulska. Filtering out unfair ratings in bayesian reputation systems. In Workshop on Trust in Agent Societies, 2004. [23] B. Yu and M. P. Singh. Detecting deception in reputation management. In AAMAS 03, pages 73 80. ACM, 2003. [24] G. Zacharia, A. Moukas, and P. Maes. Collaborative reputation mechanisms in electronic marketplaces. In HICSS, 1999. [25] J. Zhang and R. Cohen. Trusting advice from other buyers in e-marketplaces: the problem of unfair ratings. In ICEC, pages 225 234, 2006. [26] C.-N. Ziegler and G. Lausen. Making product recommendations more diverse. IEEE Data Eng. Bull., 32(4):23 32, 2009.