# Booming Up the Long Tails: Discovering Potentially Contributive Users in Community-Based Question Answering Services

Save this PDF as:

Size: px
Start display at page:

## Transcription

4 Step 1 Expertise(u 1 ) u 1 Step 2 Step 3 Step 4 Estimated Expertise(u n+1 ) w 1, w 2, w 4, w 6 WordLevel(w 1 ) w 1, w 3, w 4, w 5 u n+1 Expertise(u 2 ) u 2 Vocabulary w 2, w 4, w 6, w 7 WordLevel(w 2 ) w 3, w 4, w 5, w 8 Estimated Expertise(u n+2 ) u n+2. Expertise(u n ) u n w 1, w 3, w 6, w 8 WordLevel(w 3 ). WordLevel(w n ) w 2, w 3, w 6, w 8. Estimated Expertise(u n+k ) u n+k WordLevel(w i ) Heavy Users U H Light Users U L Figure 3: The overall procedure of predicting the expertise of a light user. Measurement of Availability The second component, availability, is simply the number of a user s answers with their importance proportional to their recency. Figure 4 intuitively describes this concept: each bar represents an answer, and the height indicates its importance. Thus, the availability of a light user u l, denoted by Availability(u l ), should have the property in Property 4. Importance Availability(u a ) < Availability(u b ) u b u a Answers Timeline Present Figure 4: An intuitive illustration of availability. Property 4. The availability of a user becomes higher as the user wrote more answers at the time closer to the present. Detailed Methodology In this section, we formalize our methodology informally presented in the previous section. Table 2 lists the measures that will be formally defined in this section. Table 2: The notation used throughout this paper. Notation Description Affordance(u l ) Expertise(u h ) EstimatedExpertise(u l ) W ordlevel(w i) Availability(u l ) the answer affordance of a light user the expertise of a heavy user the estimation of the expertise of a light user the word level of a usable word the availability of a light user Answer Affordance The answer affordance of a light user u l is defined as Eq. (1), reflecting Property 1. Here, w a (0 w a 1) is a balancing factor, and its suitable value depends on the context and data set. EstimatedExpertise(u l ) is defined in the subsection Expertise, and Availability(u l ) in the subsection Availability. Affordance(u l ) =w a EstimatedExpertise(u l )+ (1 w a) Availability(u l ) Expertise As shown in Figure 3, EstimatedExpertise(u l ) is obtained through four steps. All measures defined here will be made to have the range [0, 1] by min-max normalization unless a measure inherently has that range. Step 1: The expertise of a heavy user u h is defined through Eqs. (2) (4), reflecting Property 2. Here, w e (0 w e 1) is a balancing factor. Expertise(u h ) = w e Entropy(u h ) + (1 w e) Rating(u h ) (2) First, Entropy(u h ) is Eq. (3), which is basically the entropy of a user s interests. The more concentrated a user s answers are, the lower the entropy is. Here, we assume that there are only two categories the target category and one including all others since we are not interested in other than the target category. The entropy E(u h ) starts to decrease at P (c u h ) = 0.5. Hence, the entropy curve for P (c u h ) > 0.5 is flipped vertically such that Entropy(u h ) monotonically increases as P (c u h ) increases. (1 + (1 E(u h ))) if P (c target u h ) > 0.5 Entropy(u h ) = 2 E(u h ) otherwise 2 (3) E(u h ) = P (c u h ) log 2P (c u h ) c {c target,c others } (1) 605

6 We crawled only the question-answer pairs whose answer was selected by the questioner. Unselected answers often have poor quality or may include spam, advertisements, or jokes from abusers. There were about 11 million registered questions in the computers category as of October 2012, and we obtained 3,926,794 question-answer pairs with a selected answer. For the travel category, we gathered 585,316 question-answer pairs from 1.4 million questions. Since our methodology exploits the vocabulary, it is required to refine the set of words used in the answers. We corrected spelling as well as spacing using publicly available software. In addition, we removed the words that appeared only once in the data set since those words are useless in our methodology. The total number of usable words is 422,400; the number of usable words is 191,502 in the computers category and 232,076 in the travel category. During the period of ten years, the number of the users who answered at least once is 465,711 in the computers category and 106,194 in the travel category. However, we did not use all of them since the users with few answers could bring the possibility of distorted results owing to the characteristics of the entropy. Thus, we filtered out the users who answered less than five times in each category. As a result, we obtained 228,369 users in the computers category and 44,866 users in the travel category. Table 3 shows the statistics of the data set for the two categories after preprocessing. Table 3: The statistics of the data set. Computers Travel # of answers 3,926, ,316 # of words 191, ,076 # of users 228,369 44,866 Period Division We divided the entire period into three periods the resource, training, and test periods as shown in Figure 5 to separate the users. In the experiments, the set U H of heavy users were those who joined during the resource period. Because the resource period is very long (almost seven years), many users in that period responded to many questions, though some users did not. On the other hand, the set U L of light users were those who joined during the training period. Because the training period is short (only one year), many users in that period did not have enough answers for us to judge their expertise. the future and used for evaluating the effectiveness of our methodology. Parameter Configuration Our methodology has two balancing factors w a in Eq. (1) and w e in Eq. (2) as well as a scaling factor α in the sigmoid function. The proper values for these parameters depend on the context and data set. Thus, these parameter values were configured empirically. w a and w e were set to be 0.6 and 0.3, respectively. α was set to be 0.01 in Eq. (4) and 0.1 in Eq. (7). Accuracy of Expertise Prediction In this subsection, we show that EstimatedExpertise() precisely predicts the expertise of a light user. Preliminary Test In KiN, the users are allowed to designate their main interest on their own in order to expose their expertise on a specific subject. Using this information, we examined the ratio of the light users who selected the target category as their main interest. Overall, the ratio was 10.5% in the computers category and 24.5% in the travel category. That is, the ratio was low. However, the ratio increases significantly if we confine this test to expert users. To this end, we sorted the light users in the decreasing order of EstimatedExpertise() for each category. Figure 6 shows the ratio of such self-declared experts in the top-k users. When k was set to be 100, 93% of the light users in the computers category set their main interest to the target category, and 90% of those in the travel category did. The top users (i.e., experts) more tend to choose the target category as their main interest. This result implies that our approach accurately measures the expertise of a light user. Ratio of users Top (k x 100) users (a) Computers category. Ratio of users Top (k x 100) users (b) Travel category. Figure 6: The ratio of users who expressed their interests. Resource Period Training Period Test Period Users QA Data Figure 5: Division of the entire period into three periods. Please note that we assume that the present time is the end of the training period. The test period is assumed to be Evaluation Method We sorted the light users in the decreasing order of EstimatedExpertise() and obtained a set E training of the top-k users. In addition, we obtained another set E test of the top-k users from the test period by KiN s ranking scheme and regarded E test as the ground truth. We compared the users in E training with those in E test. This comparison shows how many potential experts selected by our approach will become real experts in the near future. There is no definite way to obtain the ground truth for the expertise in CQA services. We decided to imitate KiN s ranking scheme for this purpose since Naver must have 607

7 elaborated the scheme. KiN s ranking scheme is using a weighted sum of the selection count and the selection ratio. The detailed explanation is omitted here because of the lack of space. We believe that KiN s scheme should work well only if we have a sufficient amount of information. One might think that we could use KiN s scheme also for ranking the light users in the training period. However, the user profiles such as the selection count and the selection ratio used in KiN s scheme are not very reliable when there is no sufficient information, as will be shown later. The test period is long enough (almost three years) to apply KiN s scheme to the users in that period. EstimatedExpertise() is the approach used in our methodology for generating the set E training. In addition to EstimatedExpertise(), we generated the sets of potential experts by using three different alternatives. That is, we sorted the light users in the training period in the decreasing order of the following values. We normalized all these values by using the number of the answers of the corresponding user during the test period. Expertise(): the way of ranking heavy users rather than light users in our methodology SelCount(): the selection count RecommCount(): the recommendation count We used two metrics, and R-precision, to measure the precision of prediction. The metrics are popular in information retrieval and natural language processing. Precision is more important in our evaluation than recall since we are considering only light users. the precision evaluated at a given cut-off rank, considering only the topmost k results R-precision: the precision at R-th position in the ranking of results for a query that has R relevant documents Evaluation Results Tables 4 and 5 show the results of the precision performance. EstimatedExpertise() is what we propose in our methodology. The number of experts selected was 200 in the computers category and 100 in the travel category. In Table 4, EstimatedExpertise() is shown to outperform other alternatives significantly (by up to about 3 times) for the computers category, meaning that the other alternatives based on user profiles are not robust when there is no sufficient information. In contrast, in Table 5, our approach is not superior to the other alternatives, though its performance is reasonably good. The main reason is that our approach exploits the entropy of a user, and this result conforms to the findings by Adamic et al. (Adamic et al. 2008). If we used another technique for calculating Expertise(), we would be able to make our approach outperform others for subjective opinions. We leave it as a topic of future research. Test with Very Small Data Although we chose a oneyear period to obtain a sufficient number of light users, it is worthwhile to further reduce the amount of available information. With this motivation in mind, we selected the users who answered to more than 30 questions in the computers category (1,043 users) and 15 questions in the travel category (313 users). Then, we increased the size of the question-answer set for each user from 5 to 30 (or 15), which was used for estimating his/her expertise. The correlation coefficient between the estimated expert rankings and the ground truth is shown in Figure 7. We found out that the correlation coefficient was sufficiently high (about 0.7) even when only five answers were used. This result implies that our approach indeed mitigates the cold-start problem. Correlation coefficient Computers Travel all The number of QA pairs Figure 7: The performance with very small data. Effectiveness of Answer Affordance Finally, in this subsection, we show that the light users selected by Af f ordance() are likely to become contributive users in the future, proving the primary goal of this paper. Evaluation Method We sorted the light users in the decreasing order of Affordance() and obtained a set E ours of the top-k users. Because KiN manages the rankings of expert users for each category, we could obtain another set E KiN of the top-k users based on the rankings provided by KiN. Since KiN provides only the current rankings, we need to make the time frame comparable. Thus, we calculated Affordance() using the data for the past one year from the time of doing this experiment, i.e., from July 2011 to July k was set to be 200 in the travel category and doubled for the computers category owing to its larger size. Definitions 1 and 2 introduce the metrics used for this experiment. These two metrics were captured for the two sets E ours and E KiN during a four-week period. Definition 1. The user availability of a set U E of users on a given day is defined as the ratio of the number of the users in U E who appeared on the day to the total number of users who appeared on that day. A user is regarded to appear on the day if he/she left at least one answer. Definition 2. The answer possession of a set U E of users on a given day is defined as the ratio of the number of the answers posted by the users in U E on the day to the total number of answers posted on that day. The answering activity of the users selected by our methodology diminishes quickly unless they are encouraged to stay (Nam, Ackerman, and Adamic 2009). However, we are not able to encourage such users since we are not the service provider. That is, the effectiveness of the answer affordance cannot be verified exactly as what we suggested. Instead, we decided to update the set E ours every week to simulate motivation and encouragement. Thus, we recalculated 608

8 Table 4: The precision performance for the computers category. Target Method R-Precision EstimatedExpertise() Only Light Users Expertise() (# of Experts: 200) SelCount() RecommCount() Table 5: The precision performance for the travel category. Target Method R-Precision EstimatedExpertise() Only Light Users Expertise() (# of Experts: 100) SelCount() RecommCount() Af f ordance() with the one-year period shifted towards the present by one week. Meanwhile, the rankings managed by KiN respect the users who have shown steady activity for many years and built their valuable identity. For a fair comparison, KiN s rankings were also updated whenever the answer affordance was recalculated. Evaluation Results Figure 8 shows the user availability of E ours (our method) and E KiN (KiN s method) from July 28, 2012 to August 24, The user availability was consistently much higher in our method than in KiN s method. Since KiN s method is the only one published to date, this comparison is reasonable although it does not seriously consider availability. In Figure 8(a), the average was in our method and in KiN s method. In Figure 8(b), the average was in our method and in KiN s method. Our method outperformed KiN s method by a larger margin in the computers category than in the travel category. User availability Our method KiN s method Days towards the future (a) Computers category. User availability Our method KiN s method Days towards the future (b) Travel category. Figure 8: The result of the user availability. Figure 9 shows the answer possession for the same period. We observed the same trend as in Figure 8. In Figure 9(a), the average was in our method and in KiN s method. In Figure 9(b), the average was in our method and in KiN s method. The gap between the two methods was also larger in the computers category. Overall, these results show that the light users selected by our methodology contribute to the services more than the top rankers selected by the service provider. These light users Answer possession Our method KiN s method Days towards the future (a) Computers category. Answer possession Our method KiN s method Days towards the future (b) Travel category. Figure 9: The result of the answer possession. are expert in an area as well as eager to answer many questions at the beginning. Therefore, they are likely to become contributive users if they get motivated properly. Related Work Expert Finding and Question Routing The work on expert finding and question routing is the closest to our work. Many algorithms have been proposed in the literature: for example, graph-based algorithms (Dom et al. 2003; Zhang, Ackerman, and Adamic 2007) and language-based algorithms (Liu, Croft, and Koll 2005; Zhou et al. 2009). These algorithms were shown to successfully find the experts when there is an abundant amount of information. However, most of them did not pay much attention on the cold-start problem. Pal et al. (Pal and Konstan 2010) tackled the cold-start problem with an observation that experts have a biased pattern when selecting a question to answer. Their model simply distinguishes experts from normal users, as ours provides a sophisticated measure indicating the degree of expertise. In addition, many studies along this direction have neglected the availability of users as opposed to our methodology. Nevertheless, the availability is important since questions should be routed to the users who have spare time. As far as we know, the QuME system (Zhang et al. 2007) first considered the recency of answerers in computing an 609

### Subordinating to the Majority: Factoid Question Answering over CQA Sites

Journal of Computational Information Systems 9: 16 (2013) 6409 6416 Available at http://www.jofcis.com Subordinating to the Majority: Factoid Question Answering over CQA Sites Xin LIAN, Xiaojie YUAN, Haiwei

### Evolution of Experts in Question Answering Communities

Evolution of Experts in Question Answering Communities Aditya Pal, Shuo Chang and Joseph A. Konstan Department of Computer Science University of Minnesota Minneapolis, MN 55455, USA {apal,schang,konstan}@cs.umn.edu

### Joint Relevance and Answer Quality Learning for Question Routing in Community QA

Joint Relevance and Answer Quality Learning for Question Routing in Community QA Guangyou Zhou, Kang Liu, and Jun Zhao National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy

### Question Routing by Modeling User Expertise and Activity in cqa services

Question Routing by Modeling User Expertise and Activity in cqa services Liang-Cheng Lai and Hung-Yu Kao Department of Computer Science and Information Engineering National Cheng Kung University, Tainan,

### LDA Based Security in Personalized Web Search

LDA Based Security in Personalized Web Search R. Dhivya 1 / PG Scholar, B. Vinodhini 2 /Assistant Professor, S. Karthik 3 /Prof & Dean Department of Computer Science & Engineering SNS College of Technology

### Web Content Summarization Using Social Bookmarking Service

ISSN 1346-5597 NII Technical Report Web Content Summarization Using Social Bookmarking Service Jaehui Park, Tomohiro Fukuhara, Ikki Ohmukai, and Hideaki Takeda NII-2008-006E Apr. 2008 Jaehui Park School

### Measurement and Metrics Fundamentals. SE 350 Software Process & Product Quality

Measurement and Metrics Fundamentals Lecture Objectives Provide some basic concepts of metrics Quality attribute metrics and measurements Reliability, validity, error Correlation and causation Discuss

### RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS

ISBN: 978-972-8924-93-5 2009 IADIS RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS Ben Choi & Sumit Tyagi Computer Science, Louisiana Tech University, USA ABSTRACT In this paper we propose new methods for

### To determine vertical angular frequency, we need to express vertical viewing angle in terms of and. 2tan. (degree). (1 pt)

Polytechnic University, Dept. Electrical and Computer Engineering EL6123 --- Video Processing, S12 (Prof. Yao Wang) Solution to Midterm Exam Closed Book, 1 sheet of notes (double sided) allowed 1. (5 pt)

Routing Questions for Collaborative Answering in Community Question Answering Shuo Chang Dept. of Computer Science University of Minnesota Email: schang@cs.umn.edu Aditya Pal IBM Research Email: apal@us.ibm.com

### Procedia Computer Science 00 (2012) 1 21. Trieu Minh Nhut Le, Jinli Cao, and Zhen He. trieule@sgu.edu.vn, j.cao@latrobe.edu.au, z.he@latrobe.edu.

Procedia Computer Science 00 (2012) 1 21 Procedia Computer Science Top-k best probability queries and semantics ranking properties on probabilistic databases Trieu Minh Nhut Le, Jinli Cao, and Zhen He

### Simulating a File-Sharing P2P Network

Simulating a File-Sharing P2P Network Mario T. Schlosser, Tyson E. Condie, and Sepandar D. Kamvar Department of Computer Science Stanford University, Stanford, CA 94305, USA Abstract. Assessing the performance

### So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

### Healthcare Measurement Analysis Using Data mining Techniques

www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik

### Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

, pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and

### Comparing Recommendations Made by Online Systems and Friends

Comparing Recommendations Made by Online Systems and Friends Rashmi Sinha and Kirsten Swearingen SIMS, University of California Berkeley, CA 94720 {sinha, kirstens}@sims.berkeley.edu Abstract: The quality

### International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online http://www.ijoer.

REVIEW ARTICLE ISSN: 2321-7758 UPS EFFICIENT SEARCH ENGINE BASED ON WEB-SNIPPET HIERARCHICAL CLUSTERING MS.MANISHA DESHMUKH, PROF. UMESH KULKARNI Department of Computer Engineering, ARMIET, Department

### Prediction of Stock Performance Using Analytical Techniques

136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University

### CHAPTER - 5 CONCLUSIONS / IMP. FINDINGS

CHAPTER - 5 CONCLUSIONS / IMP. FINDINGS In today's scenario data warehouse plays a crucial role in order to perform important operations. Different indexing techniques has been used and analyzed using

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### WHITE PAPER OCTOBER 2014. Unified Monitoring. A Business Perspective

WHITE PAPER OCTOBER 2014 Unified Monitoring A Business Perspective 2 WHITE PAPER: UNIFIED MONITORING ca.com Table of Contents Introduction 3 Section 1: Today s Emerging Computing Environments 4 Section

### Web Search Engines: Solutions

Web Search Engines: Solutions Problem 1: A. How can the owner of a web site design a spider trap? Answer: He can set up his web server so that, whenever a client requests a URL in a particular directory

### Personalization of Web Search With Protected Privacy

Personalization of Web Search With Protected Privacy S.S DIVYA, R.RUBINI,P.EZHIL Final year, Information Technology,KarpagaVinayaga College Engineering and Technology, Kanchipuram [D.t] Final year, Information

### Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,

### SEO 360: The Essentials of Search Engine Optimization INTRODUCTION CONTENTS. By Chris Adams, Director of Online Marketing & Research

SEO 360: The Essentials of Search Engine Optimization By Chris Adams, Director of Online Marketing & Research INTRODUCTION Effective Search Engine Optimization is not a highly technical or complex task,

### Comparing IPL2 and Yahoo! Answers: A Case Study of Digital Reference and Community Based Question Answering

Comparing and : A Case Study of Digital Reference and Community Based Answering Dan Wu 1 and Daqing He 1 School of Information Management, Wuhan University School of Information Sciences, University of

### Deposit Identification Utility and Visualization Tool

Deposit Identification Utility and Visualization Tool Colorado School of Mines Field Session Summer 2014 David Alexander Jeremy Kerr Luke McPherson Introduction Newmont Mining Corporation was founded in

### What is the current state of mobile recruitment?

What is the current state of mobile recruitment? Colofon December 13, 2013 Mieke Berkhout Kristin Mellink Julia Paskaleva Geert-Jan Waasdorp Tom Wentholt Intelligence Group Maxlead Maaskade 119 Rhijngeesterstraatweg

### The Application Research of Ant Colony Algorithm in Search Engine Jian Lan Liu1, a, Li Zhu2,b

3rd International Conference on Materials Engineering, Manufacturing Technology and Control (ICMEMTC 2016) The Application Research of Ant Colony Algorithm in Search Engine Jian Lan Liu1, a, Li Zhu2,b

American Journal of Engineering Research (AJER) 2013 American Journal of Engineering Research (AJER) e-issn: 2320-0847 p-issn : 2320-0936 Volume-2, Issue-4, pp-39-43 www.ajer.us Research Paper Open Access

### arxiv:1112.0829v1 [math.pr] 5 Dec 2011

How Not to Win a Million Dollars: A Counterexample to a Conjecture of L. Breiman Thomas P. Hayes arxiv:1112.0829v1 [math.pr] 5 Dec 2011 Abstract Consider a gambling game in which we are allowed to repeatedly

### Distributed forests for MapReduce-based machine learning

Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication

### Collecting Polish German Parallel Corpora in the Internet

Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska

### Early Detection of Potential Experts in Question Answering Communities

Early Detection of Potential Experts in Question Answering Communities Aditya Pal 1, Rosta Farzan 2, Joseph A. Konstan 1, and Robert Kraut 2 1 Dept. of Computer Science and Engineering, University of Minnesota

### A Novel Method to Detect Fraud Ranking For Mobile Apps

A Novel Method to Detect Fraud Ranking For Mobile Apps Geedikapally Rajesh Kumar M.Tech Student Department of CSE St.Peter's Engineering College. Abstract: Ranking fraud in the mobile App market refers

### The People Factor in Change Management

The People Factor in Change Management Content Introduction... 3 Know your audience... 3 Address your audience...4 Train your audience... 5 Support your audience... 7 Future trends...8 2 Introduction Have

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

### Comparing Tag Clouds, Term Histograms, and Term Lists for Enhancing Personalized Web Search

Comparing Tag Clouds, Term Histograms, and Term Lists for Enhancing Personalized Web Search Orland Hoeber and Hanze Liu Department of Computer Science, Memorial University St. John s, NL, Canada A1B 3X5

### Automated Collaborative Filtering Applications for Online Recruitment Services

Automated Collaborative Filtering Applications for Online Recruitment Services Rachael Rafter, Keith Bradley, Barry Smyth Smart Media Institute, Department of Computer Science, University College Dublin,

### A Comparison of General Approaches to Multiprocessor Scheduling

A Comparison of General Approaches to Multiprocessor Scheduling Jing-Chiou Liou AT&T Laboratories Middletown, NJ 0778, USA jing@jolt.mt.att.com Michael A. Palis Department of Computer Science Rutgers University

### Performance evaluation of Web Information Retrieval Systems and its application to e-business

Performance evaluation of Web Information Retrieval Systems and its application to e-business Fidel Cacheda, Angel Viña Departament of Information and Comunications Technologies Facultad de Informática,

### Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

### Creating Synthetic Temporal Document Collections for Web Archive Benchmarking

Creating Synthetic Temporal Document Collections for Web Archive Benchmarking Kjetil Nørvåg and Albert Overskeid Nybø Norwegian University of Science and Technology 7491 Trondheim, Norway Abstract. In

### Super-Agent Based Reputation Management with a Practical Reward Mechanism in Decentralized Systems

Super-Agent Based Reputation Management with a Practical Reward Mechanism in Decentralized Systems Yao Wang, Jie Zhang, and Julita Vassileva Department of Computer Science, University of Saskatchewan,

### Building a Data Quality Scorecard for Operational Data Governance

Building a Data Quality Scorecard for Operational Data Governance A White Paper by David Loshin WHITE PAPER Table of Contents Introduction.... 1 Establishing Business Objectives.... 1 Business Drivers...

### Personalizing Image Search from the Photo Sharing Websites

Personalizing Image Search from the Photo Sharing Websites Swetha.P.C, Department of CSE, Atria IT, Bangalore swethapc.reddy@gmail.com Aishwarya.P Professor, Dept.of CSE, Atria IT, Bangalore aishwarya_p27@yahoo.co.in

### SEO Techniques for various Applications - A Comparative Analyses and Evaluation

IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727 PP 20-24 www.iosrjournals.org SEO Techniques for various Applications - A Comparative Analyses and Evaluation Sandhya

### >> BEYOND OUR CONTROL? KINGSLEY WHITE PAPER

>> BEYOND OUR CONTROL? KINGSLEY WHITE PAPER AUTHOR: Phil Mobley Kingsley Associates December 16, 2010 Beyond Our Control? Wrestling with Online Apartment Ratings Services Today's consumer marketplace

### First Steps towards a Frequent Pattern Mining with Nephrology Data in the Medical Domain. - Extended Abstract -

First Steps towards a Frequent Pattern Mining with Nephrology Data in the Medical Domain - Extended Abstract - Matthias Niemann 1, Danilo Schmidt 2, Gabriela Lindemann von Trzebiatowski 3, Carl Hinrichs

### Chapter 6. The stacking ensemble approach

82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

### SEGMENTATION IN WEB ANALYTICS

Online Intelligence Solutions SEGMENTATION IN WEB ANALYTICS A fundamental approach By Jacques Warren WHITE PAPER WHITE PAPER ABOUT JACQUES WARREN Jacques Warren has been working in the online marketing

### Accelerating Audits with Automation: Who s Accessing Your Unstructured Data?

Contents of This Paper Available Tools... 2 The Solution... 2 A Closer Look... 2 Accessing Folders... 3 Who s Authorizing Users... 4 Tracking Key Files... 5 Tracking the Last User of a File... 5 Matching

### Performance Metrics for Graph Mining Tasks

Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical

### Acquiring new customers is 6x- 7x more expensive than retaining existing customers

Automated Retention Marketing Enter ecommerce s new best friend. Retention Science offers a platform that leverages big data and machine learning algorithms to maximize customer lifetime value. We automatically

### Incorporating Participant Reputation in Community-driven Question Answering Systems

Incorporating Participant Reputation in Community-driven Question Answering Systems Liangjie Hong, Zaihan Yang and Brian D. Davison Department of Computer Science and Engineering Lehigh University, Bethlehem,

### A Web Recommender System for Recommending, Predicting and Personalizing Music Playlists

A Web Recommender System for Recommending, Predicting and Personalizing Music Playlists Zeina Chedrawy 1, Syed Sibte Raza Abidi 1 1 Faculty of Computer Science, Dalhousie University, Halifax, Canada {chedrawy,

### Reputation Network Analysis for Email Filtering

Reputation Network Analysis for Email Filtering Jennifer Golbeck, James Hendler University of Maryland, College Park MINDSWAP 8400 Baltimore Avenue College Park, MD 20742 {golbeck, hendler}@cs.umd.edu

### Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION

Brian Lao - bjlao Karthik Jagadeesh - kjag Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND There is a large need for improved access to legal help. For example,

### REVIEW ON QUERY CLUSTERING ALGORITHMS FOR SEARCH ENGINE OPTIMIZATION

Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING

### Cross-Domain Collaborative Recommendation in a Cold-Start Context: The Impact of User Profile Size on the Quality of Recommendation

Cross-Domain Collaborative Recommendation in a Cold-Start Context: The Impact of User Profile Size on the Quality of Recommendation Shaghayegh Sahebi and Peter Brusilovsky Intelligent Systems Program University

### Parallel Computing of Kernel Density Estimates with MPI

Parallel Computing of Kernel Density Estimates with MPI Szymon Lukasik Department of Automatic Control, Cracow University of Technology, ul. Warszawska 24, 31-155 Cracow, Poland Szymon.Lukasik@pk.edu.pl

### Explode Six Direct Marketing Myths

White Paper Explode Six Direct Marketing Myths Maximize Campaign Effectiveness with Analytic Technology Table of contents Introduction: Achieve high-precision marketing with analytics... 2 Myth #1: Simple

### Machine Learning Final Project Spam Email Filtering

Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE

### Computing Load Aware and Long-View Load Balancing for Cluster Storage Systems

215 IEEE International Conference on Big Data (Big Data) Computing Load Aware and Long-View Load Balancing for Cluster Storage Systems Guoxin Liu and Haiying Shen and Haoyu Wang Department of Electrical

### Facilitating Knowledge Intelligence Using ANTOM with a Case Study of Learning Religion

Facilitating Knowledge Intelligence Using ANTOM with a Case Study of Learning Religion Herbert Y.C. Lee 1, Kim Man Lui 1 and Eric Tsui 2 1 Marvel Digital Ltd., Hong Kong {Herbert.lee,kimman.lui}@marvel.com.hk

### Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

### SECURITY METRICS: MEASUREMENTS TO SUPPORT THE CONTINUED DEVELOPMENT OF INFORMATION SECURITY TECHNOLOGY

SECURITY METRICS: MEASUREMENTS TO SUPPORT THE CONTINUED DEVELOPMENT OF INFORMATION SECURITY TECHNOLOGY Shirley Radack, Editor Computer Security Division Information Technology Laboratory National Institute

### Quality-Aware Collaborative Question Answering: Methods and Evaluation

Quality-Aware Collaborative Question Answering: Methods and Evaluation ABSTRACT Maggy Anastasia Suryanto School of Computer Engineering Nanyang Technological University magg0002@ntu.edu.sg Aixin Sun School

### MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of

### Building a Database to Predict Customer Needs

INFORMATION TECHNOLOGY TopicalNet, Inc (formerly Continuum Software, Inc.) Building a Database to Predict Customer Needs Since the early 1990s, organizations have used data warehouses and data-mining tools

### A discussion of Statistical Mechanics of Complex Networks P. Part I

A discussion of Statistical Mechanics of Complex Networks Part I Review of Modern Physics, Vol. 74, 2002 Small Word Networks Clustering Coefficient Scale-Free Networks Erdös-Rényi model cover only parts

### Research of Postal Data mining system based on big data

3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Research of Postal Data mining system based on big data Xia Hu 1, Yanfeng Jin 1, Fan Wang 1 1 Shi Jiazhuang Post & Telecommunication

### Application Performance Testing Basics

Application Performance Testing Basics ABSTRACT Todays the web is playing a critical role in all the business domains such as entertainment, finance, healthcare etc. It is much important to ensure hassle-free

### Automatic Web Page Classification

Automatic Web Page Classification Yasser Ganjisaffar 84802416 yganjisa@uci.edu 1 Introduction To facilitate user browsing of Web, some websites such as Yahoo! (http://dir.yahoo.com) and Open Directory

### A Comparative Approach to Search Engine Ranking Strategies

26 A Comparative Approach to Search Engine Ranking Strategies Dharminder Singh 1, Ashwani Sethi 2 Guru Gobind Singh Collage of Engineering & Technology Guru Kashi University Talwandi Sabo, Bathinda, Punjab

### A Case Study of Value-Added Tax Issues on Customer Loyalty Programs in Korea

International Journal of Economics and Finance; Vol. 6, No. 6; 2014 ISSN 1916-971X E-ISSN 1916-9728 Published by Canadian Center of Science and Education A Case Study of Value-Added Tax Issues on Customer

### The Importance of Cyber Threat Intelligence to a Strong Security Posture

The Importance of Cyber Threat Intelligence to a Strong Security Posture Sponsored by Webroot Independently conducted by Ponemon Institute LLC Publication Date: March 2015 Ponemon Institute Research Report

### Software Engineering 4C03 SPAM

Software Engineering 4C03 SPAM Introduction As the commercialization of the Internet continues, unsolicited bulk email has reached epidemic proportions as more and more marketers turn to bulk email as

### Exploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering

Exploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering Guangyou Zhou, Kang Liu and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese

### Keywords the Most Important Item in SEO

This is one of several guides and instructionals that we ll be sending you through the course of our Management Service. Please read these instructionals so that you can better understand what you can

### Interpreting Web Analytics Data

Interpreting Web Analytics Data Whitepaper 8650 Commerce Park Place, Suite G Indianapolis, Indiana 46268 (317) 875-0910 info@pentera.com www.pentera.com Interpreting Web Analytics Data At some point in

### Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement Jiang Bian College of Computing Georgia Institute of Technology jbian3@mail.gatech.edu Eugene Agichtein

### A UPS Framework for Providing Privacy Protection in Personalized Web Search

A UPS Framework for Providing Privacy Protection in Personalized Web Search V. Sai kumar 1, P.N.V.S. Pavan Kumar 2 PG Scholar, Dept. of CSE, G Pulla Reddy Engineering College, Kurnool, Andhra Pradesh,

### Identifying User Behavior in domainspecific

Identifying User Behavior in domainspecific Repositories Wilko VAN HOEK a,1, Wei SHEN a and Philipp MAYR a a GESIS Leibniz Institute for the Social Sciences, Germany Abstract. This paper presents an analysis

### A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data

A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data Athanasius Zakhary, Neamat El Gayar Faculty of Computers and Information Cairo University, Giza, Egypt

### Automated classification of A/E/C web content

Automated classification of A/E/C web content R. Amor & K. Xu Department of Computer Science, University of Auckland, Auckland, New Zealand ABSTRACT: The amount of useful information available on the web

### How to Develop a Research Protocol

How to Develop a Research Protocol Goals & Objectives: To explain the theory of science To explain the theory of research To list the steps involved in developing and conducting a research protocol Outline:

### FROM THE C ALL CENTER TO THE EXECUTIVE SUITE:

FROM THE C ALL CENTER TO THE EXECUTIVE SUITE: THE VALUE OF CUSTOMER OPERATIONS PERFORMANCE MANAGEMENT Winter, 2008 Enterprise Applications Consulting Joshua Greenbaum, Principal EAC 2303 Spaulding Avenue

### Key Factors for Payers in Fraud and Abuse Prevention. Protect against fraud and abuse with a multi-layered approach to claims management.

White Paper Protect against fraud and abuse with a multi-layered approach to claims management. October 2012 Whether an act is technically labeled health insurance fraud or health insurance abuse, the

IBM Software Business Analytics Social Analytics Social Business Analytics Gaining business value from social media 2 Social Business Analytics Contents 2 Overview 3 Analytics as a competitive advantage

### PPC Campaigns Ad Text Ranking vs CTR

PPC Campaigns Ad Text Ranking vs CTR The analysis of ad texts efficiency in PPC campaigns executed by Sembox agency. Lodz, Poland; July 2010 r. Content Main Conclusions... 3 About Sembox... 6 The relation

### SPATIAL DATA CLASSIFICATION AND DATA MINING

, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

Building A Smart Academic Advising System Using Association Rule Mining Raed Shatnawi +962795285056 raedamin@just.edu.jo Qutaibah Althebyan +962796536277 qaalthebyan@just.edu.jo Baraq Ghalib & Mohammed

### Automatic Synthesis of Trading Systems

Automatic Synthesis of Trading Systems Michael Harris www.priceactionlab.com The process of developing mechanical trading systems often leads to frustration and to a financial disaster if the systems do