# E-commerce in Your Inbox: Product Recommendations at Scale

Save this PDF as:

Size: px
Start display at page:

## Transcription

5 in the following we present non-bagged version of the language model, however we note that it is straightforward to extend the presented methodology to use the bagged version. More specifically, objective of user2vec is to maximize the log-likelihood over the set S of all purchase sequences, L = ( log P(u n p n1 : p nun ) s S u n s + ) (3.5) log P(p ni p n,i c : p n,i+c, u n) p ni u n where c is the length of the context for products in purchase sequence of the n-th user. The probability P(p ni p n,i c : p n,i+c, u n) is defined using a soft-max function, P(p ni p n,i c : p n,i+c, u n) = exp( v v p ni ) V p=1 exp( v v p), (3.6) where v p ni is the output vector representation of p ni, and v is averaged vector representation of the product context including corresponding u n, defined as v = 1 2c + 1 (vun + v pn,i+j ), (3.7) c j c,j 0 where v p is the input vector representation of p. Similarly, the probability P(u n p n1 : p nun ) is defined as P(u n p n1 : p nun ) = exp( v n v u n ) V p=1 exp( v n v p), (3.8) where v u n is the output vector representation of u n, and v n is averaged input vector representation of all the products purchased by user u n, v n = 1 U n U n v pni. (3.9) One of the main advantages of the user2vec model is that the product recommendations are specifically tailored for that user based on his purchase history. However, disadvantage is that the model would need to be updated very frequently. Unlike product-to-product recommendations, which may be relevant for longer time periods, user-to-product recommendations need to change often to account for the most recent user purchases. 4. EXPERIMENTS The experimental section is organized as follows. We first describe data set used in development of our product-toproduct and user-to-product predictors. Next we present valuable insights regarding purchase behavior of different age and gender groups in different US states. This information can be leveraged to improve demo- and geo-targeting of user groups. This is followed by a section on effectiveness of recommendation of popular products in different user cohorts, including age, gender, and US state. Finally, we show comparative results of various baseline recommendation algorithms. We conclude with the description of the system implementation at a large scale and bucket results that preceded our product launch. 4.1 Data set Our data sets included receipts sent to users who voluntarily opted-in for such studies, where we anonymized i=1 Percentage Male users Female users Age buckets (a) Percentage of purchasing users among all online users Average product price Male users Female users Age buckets (b) Average product price Figure 5: Purchasing habits for different demographics user IDs. Message bodies were analyzed by automated systems. Product names and prices were extracted from messages using an in-house extraction tool. Training data set collected for purposes of developing product prediction models comprised of more than million purchases made by N = 29 million users from 172 commercial websites. The product vocabulary included P = 2.1 million most frequently purchased products priced over \$5. More formally, data set D p = {(u n, s n), n = 1,..., N} was derived by forming receipt sequences s n for each user u n, along with their timestamps. Specifically, s n = {(e n1, t n1),..., (e nmn, t nmn )}, where e nm is a receipt comprising one or more purchased products, and t nm is receipt timestamp. Predictions were evaluated on a held-out month of user purchases Dp ts, formed in the same manner as D p. We measured the effectiveness of recommendations using prediction accuracy. In particular, we measured the number of product purchases that were correctly predicted, divided by the total number of purchases. For all models, the accuracy was measured separately on each day, based on the recommendations calculated using prior days. We set a daily budget of distinct recommendations each algorithm is allowed to make for a user to K = 20, based on an estimate of optimal number of ads users can effectively perceive daily [7]. 4.2 Insights from purchase data To get deeper insights into behavior of online users based on their demographic background and geographic location, we segregated users into cohorts based on their age, gender, and location, and looked at their purchasing habits. Such information can be very valuable to marketers when considering how and where to spend their campaign budget. We only considered users from the US, and computed statistics per each state separately. First, we were interested in differences between male and female shopping behavior for different age ranges. In addition, we were interested in the following aspects of purchasing behavior: 1) percentage of online users who shop online; 2) average number of products bought; 4) average spent per user; and 4) average price of a bought item. Figures 5 and 6 illustrate the results. In Figure 5 for each gender and age group we show the percentage of online users who shop online, as well as the average prices of bought items. Expectedly, we see that male and female demographics exhibit different shopping patterns. Throughout the different age buckets percentage of female shoppers is consistently higher than for males, peaking in the age range. On the other hand, percentage of male shoppers reaches its maximum in the 21-24

6 (a) ages (b) ages (c) ages (d) ages (e) ages (f) ages (g) ages (h) ages Figure 6: Purchasing behavior for different cohorts, dark color encodes higher values: (a, b) Percentage of shoppers among online users; (c, d) Average number of purchases per user; (e, f) Average amount spent per user; (g, h) Average product price bucket, and from there drops steadily. In addition, we observe that male users buy more expensive items on average. Furthermore, in Figure 6 we show per-state results, where darker color indicates higher values. Note that we only show results for different locations and age ranges, as we did not see significant differences between male and female purchasing behavior across states. First, in Figures 6(a) and 6(b) we show the percent of online shoppers in each state, for younger (18-20) and medium-age (40-44) populations, where we can see there exist significant differences between the states. In Figures 6(c) and 6(d) we show the average number of purchased products per state. Here, for both age buckets we see that the southern states purchase the least number of items. However, there is a significant difference between younger and older populations, as in northern states younger populations purchase more products, while in the northeast and the western states this holds true for the older populations. If we take a look at Figures 6(e) and 6(f) where we compare the average amount of money spent per capita, we can observe similar patterns, where in states like Alaska and North Dakota younger populations spend more money than peers from other states, and for older populations states with highest spending per user are California, Washington, and again Alaska. Interestingly, users from Alaska are the biggest spenders, irrespective of the age or metric used. Similar holds when we consider average price of a purchased item, shown in Figures 6(g) and 6(h). 4.3 Recommending popular products In this section we evaluate predictive properties of popular products. Recommending popular products to global population is a common baseline due to its strong performance, especially during holidays (e.g., Christmas, Thanksgiving). Such recommendations are intuitive, easy to calculate and implement, and may serve in a cold start scenarios for newly registered users when we have limited information. We considered how far back we need to look when calculating popularity and how long popular products stay relevant. Accuracy (%) Lookback of 5 days Lookback of 10 days Lookback of 20 days Lookback of 30 days Lookahead days Figure 7: Prediction accuracy and longevity of popular products with different lookbacks In Figure 7 we give results for popular products calculated in the previous 5, 10, 20, and 30 days of training data D p, evaluated on the first 1, 3, 7, 15, and 30 days of test data Dp ts. To account for ever-changing user purchase tastes, the results indicate that popular products need to be recalculated at least every 3 days with lookback of at most 5 days. Next, we evaluated the prediction accuracy of popular products computed for different user cohorts, and compared to the accuracy of globally popular products. In Figure 8 we compare accuracies of popular products in different user cohorts, with the lookback fixed at 5 days. We can observe that popular products in gender cohorts give bigger lift in prediction accuracy than popular products in age or state cohorts. Further, jointly considering age and gender when calculating popular products outperforms the popular gender products. Finally, including geographic dimension contributes to further accuracy lift. Overall, the results indicate that in the cold start scenario the best choice of which popular products to recommend are the ones from the user s (age,

7 Accuracy (%) Global Gender Age Age + gender State State + age + gender accuracy Accuracy (%),=1,=0.9,=0.7,= Lookahead days Figure 8: Prediction accuracy of popular products for different user cohorts gender, location) cohort. The results also suggest that popular products should be recalculated often to avoid decrease in accuracy with every passing day. 4.4 Recommending predicted products In this section we experiment with recommending products to users based on neural language models described in Section 3. Specifically, we compare the following algorithms: 1) prod2vec-topk was trained using data set D p, where product vectors were learned by maximizing log-likelihood of observing other products from sequences s, as proposed in (3.1). Recommendations for a given product p i were given by selecting the top K most similar products based on the cosine similarity in the resulting vector space. 2) bagged-prod2vec-topk was trained using D p, where product vectors were learned by maximizing log-likelihood of observing other products from sequences s as proposed in (3.3). Recommendations for a given product p i were given by selecting the top K most similar products based on the cosine similarity in the resulting vector space. 3) bagged-prod2vec-cluster was trained similarly to the bagged-prod2vec model, followed by clustering the product vectors into C clusters and calculating transition probabilities between them. After identifying which cluster p i belongs to (e.g., p i c i), we rank all clusters by their transition probabilities with respect to c i. Then, the products from top clusters are sorted by cosine similarity to p i, where top K c from each cluster are used as recommendations ( K c = K). Example predictions of bagged-prod2vec-cluster compared to predictions of bagged-prod2vec are shown in Table 2. It can be observed that predictions based on the clustering approach are more diverse. 4) user2vec was trained using data set D p where product vectors and user vectors were learned by maximizing log-likelihood proposed in (3.5) (see Figure 4). Recommendations for a given user u n were given by calculating cosine similarity between the user vector u n and all product vectors, and retrieving the top K nearest products. 5) co-purchase. For each product pair (p i, p j) we calculated frequency F (pi,p j ), i = 1,..., P, j = 1,..., P, with which product p j was purchased immediately after product p i. Then, recommendations for a product p i were given by sorting the frequencies F (pi,p j ), j = 1,..., P, and retrieving the top K products lookahead Lookahead days Figure 9: prod2vec accuracy with different decay values Since user u n may have multiple products purchased prior to day t d, separate predictions need to reach a consensus in order to choose the best K products to be shown on that day. To achieve this we propose time-decayed scoring of recommendations, followed by choice of top K products with the highest score. More specifically, given user s products purchased prior to t d along with their timestamps, {(p 1, t 1),... (p Un, t Un )}, for each product we retrieve top K recommendations along with their similarity scores, resulting in the set {(p j, sim j), j = 1,..., KU n}, where sim denotes cosine similarity. Next, we calculate a decayed score for every recommended product, d j = sim j α (t d t i ), (4.1) where (t d t i) is a difference in days between current day t d and the purchase time of product that led to recommendation of p j, and α is a decay factor. Finally, the decayed scores are sorted in descending order and the top K products are chosen as recommendations for day t d. Training details. Neural language models were trained using a machine with 96GB of RAM memory and 24 cores. Dimensionality of the embedding space was set to d = 300, context neighborhood size for all models was set to 5. Finally, we used 10 negative samples in each vector update. Similarly to the approach in [24], most frequent products and users were subsampled during training. To illustrate the performance of the language models, in Table 1 we give examples of product-to-product recommendations computed using bagged-prod2vec, where we see that the neighboring products are highly relevant to the query product (e.g., for despicable me the model retrieved similar cartoons). Evaluation details. Similarly to how popular product accuracy was measured, we assumed a daily budget of K = 20 distinct product recommendations per user. Predictions for day t d are based on prior purchases from previous days, and we did not consider updating predictions for day t d using purchases that happened during that day. Results. In the first set of experiments we evaluated performance of prod2vec for different values of decay factors. In Figure 9 we show prediction accuracy on test data Dp ts when looking 1, 3, 7, 15, and 30 days ahead. Initial prod2vec predictions were based on the last user purchase in the training data set D p. The results show that discounting of old predictions leads to improved recommendation accuracy, with decay factor of α = 0.9 being an optimal choice.

### Invited Applications Paper

Invited Applications Paper - - Thore Graepel Joaquin Quiñonero Candela Thomas Borchert Ralf Herbrich Microsoft Research Ltd., 7 J J Thomson Avenue, Cambridge CB3 0FB, UK THOREG@MICROSOFT.COM JOAQUINC@MICROSOFT.COM

### Online Content Optimization Using Hadoop. Jyoti Ahuja Dec 20 2011

Online Content Optimization Using Hadoop Jyoti Ahuja Dec 20 2011 What do we do? Deliver right CONTENT to the right USER at the right TIME o Effectively and pro-actively learn from user interactions with

### The Need for Training in Big Data: Experiences and Case Studies

The Need for Training in Big Data: Experiences and Case Studies Guy Lebanon Amazon Background and Disclaimer All opinions are mine; other perspectives are legitimate. Based on my experience as a professor

### Tensor Factorization for Multi-Relational Learning

Tensor Factorization for Multi-Relational Learning Maximilian Nickel 1 and Volker Tresp 2 1 Ludwig Maximilian University, Oettingenstr. 67, Munich, Germany nickel@dbs.ifi.lmu.de 2 Siemens AG, Corporate

### Linear programming approach for online advertising

Linear programming approach for online advertising Igor Trajkovski Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, Rugjer Boshkovikj 16, P.O. Box 393, 1000 Skopje,

### A survey on click modeling in web search

A survey on click modeling in web search Lianghao Li Hong Kong University of Science and Technology Outline 1 An overview of web search marketing 2 An overview of click modeling 3 A survey on click models

### A Near Real-Time Personalization for ecommerce Platform Amit Rustagi arustagi@ebay.com

A Near Real-Time Personalization for ecommerce Platform Amit Rustagi arustagi@ebay.com Abstract. In today's competitive environment, you only have a few seconds to help site visitors understand that you

### Driving Results with. Dynamic Creative

Driving Results with Dynamic Creative Introduction In the world of digital advertising, reaching people is no longer the number one problem brands face when they plan a campaign. Ad networks and servers,

### Web Analytics Definitions Approved August 16, 2007

Web Analytics Definitions Approved August 16, 2007 Web Analytics Association 2300 M Street, Suite 800 Washington DC 20037 standards@webanalyticsassociation.org 1-800-349-1070 Licensed under a Creative

### The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of

### Search Result Optimization using Annotators

Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,

### Driving Results with. Dynamic Creative Optimization

Driving Results with Dynamic Creative Optimization Introduction In the world of digital advertising, reaching people is no longer the number one problem brands face when they plan a campaign. Ad networks

### Importance of Online Product Reviews from a Consumer s Perspective

Advances in Economics and Business 1(1): 1-5, 2013 DOI: 10.13189/aeb.2013.010101 http://www.hrpub.org Importance of Online Product Reviews from a Consumer s Perspective Georg Lackermair 1,2, Daniel Kailer

### Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,

### REVIEW ON QUERY CLUSTERING ALGORITHMS FOR SEARCH ENGINE OPTIMIZATION

Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING

### A/B SPLIT TESTING GUIDE How to Best Test for Conversion Success

A/B SPLIT TESTING GUIDE How to Best Test for Conversion Success CONVERSION OPTIMIZATION WHITE PAPER BY STEELHOUSE Q2 2012 TABLE OF CONTENTS Executive Summary...3 Importance of Conversion...4 What is A/B

### Chapter 5 Web Advertising Personalization using Web Data Mining

Chapter 5 Web Advertising Personalization using Web Data Mining 5.1 Introduction The internet has enabled businesses to advertise on web sites and helps to get to customers worldwide. In e-commerce, users

### Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints

Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints Michael Bauer, Srinivasan Ravichandran University of Wisconsin-Madison Department of Computer Sciences {bauer, srini}@cs.wisc.edu

### Web Advertising Personalization using Web Content Mining and Web Usage Mining Combination

8 Web Advertising Personalization using Web Content Mining and Web Usage Mining Combination Ketul B. Patel 1, Dr. A.R. Patel 2, Natvar S. Patel 3 1 Research Scholar, Hemchandracharya North Gujarat University,

### Product recommendations and promotions (couponing and discounts) Cross-sell and Upsell strategies

WHITEPAPER Today, leading companies are looking to improve business performance via faster, better decision making by applying advanced predictive modeling to their vast and growing volumes of data. Business

### Nine Common Types of Data Mining Techniques Used in Predictive Analytics

1 Nine Common Types of Data Mining Techniques Used in Predictive Analytics By Laura Patterson, President, VisionEdge Marketing Predictive analytics enable you to develop mathematical models to help better

### BUY. Mobile Retargeting for Retailers: Recapturing Customers Cross-Platform. February 2014. 1.877.AMPUSH.1 info@ampush.com

BUY Mobile Retargeting for Retailers: Recapturing Customers Cross-Platform February 2014 1.877.AMPUSH.1 info@ampush.com TABLE OF CONTENTS Executive Summary 3 The Mobile Advertising Opportunity 4 E-Commerce

### Data Mining Applications in Higher Education

Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

### Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

### Figure 1. The cloud scales: Amazon EC2 growth [2].

- Chung-Cheng Li and Kuochen Wang Department of Computer Science National Chiao Tung University Hsinchu, Taiwan 300 shinji10343@hotmail.com, kwang@cs.nctu.edu.tw Abstract One of the most important issues

### Similarity Search in a Very Large Scale Using Hadoop and HBase

Similarity Search in a Very Large Scale Using Hadoop and HBase Stanislav Barton, Vlastislav Dohnal, Philippe Rigaux LAMSADE - Universite Paris Dauphine, France Internet Memory Foundation, Paris, France

### Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014

Parallel Data Mining Team 2 Flash Coders Team Research Investigation Presentation 2 Foundations of Parallel Computing Oct 2014 Agenda Overview of topic Analysis of research papers Software design Overview

### So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

### Mining Signatures in Healthcare Data Based on Event Sequences and its Applications

Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Siddhanth Gokarapu 1, J. Laxmi Narayana 2 1 Student, Computer Science & Engineering-Department, JNTU Hyderabad India 1

### Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

, pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and

### Web Browsing Quality of Experience Score

Web Browsing Quality of Experience Score A Sandvine Technology Showcase Contents Executive Summary... 1 Introduction to Web QoE... 2 Sandvine s Web Browsing QoE Metric... 3 Maintaining a Web Page Library...

### Introduction. A. Bellaachia Page: 1

Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

### A High Scale Deep Learning Pipeline for Identifying Similarities in Online Communities Robert Elwell, Tristan Chong, John Kuner, Wikia, Inc.

A High Scale Deep Learning Pipeline for Identifying Similarities in Online Communities Robert Elwell, Tristan Chong, John Kuner, Wikia, Inc. Abstract: We outline a multi step text processing pipeline employed

### Customer Analytics. Turn Big Data into Big Value

Turn Big Data into Big Value All Your Data Integrated in Just One Place BIRT Analytics lets you capture the value of Big Data that speeds right by most enterprises. It analyzes massive volumes of data

### Introduction to Web Analytics Terms

Introduction to Web Analytics Terms N10014 Introduction N40002 This glossary provides definitions for common web analytics terms and discusses their use in Unica Web Analytics. The definitions are arranged

### The 8 Key Metrics That Define Your AdWords Performance. A WordStream Guide

The 8 Key Metrics That Define Your AdWords Performance A WordStream Guide The 8 Key Metrics That Define Your Adwords Performance WordStream Customer Success As anyone who has ever managed a Google AdWords

### Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

### The Evolvement of Big Data Systems

The Evolvement of Big Data Systems From the Perspective of an Information Security Application 2015 by Gang Chen, Sai Wu, Yuan Wang presented by Slavik Derevyanko Outline Authors and Netease Introduction

### Business Intelligence Solutions for Gaming and Hospitality

Business Intelligence Solutions for Gaming and Hospitality Prepared by: Mario Perkins Qualex Consulting Services, Inc. Suzanne Fiero SAS Objective Summary 2 Objective Summary The rise in popularity and

### A Web Prefetching Model Based on Content Analysis

A Web Prefetching Model Based on Content Analysis O Kit Hong, Fiona Robert P. Biuk-Aghai Faculty of Science and Technology University of Macau {csb.fiona fst.robert}@umac.mo Abstract Web-accessible resources

### A QoS-Aware Web Service Selection Based on Clustering

International Journal of Scientific and Research Publications, Volume 4, Issue 2, February 2014 1 A QoS-Aware Web Service Selection Based on Clustering R.Karthiban PG scholar, Computer Science and Engineering,

Volume 4, Issue 6, June 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Improving Web Image

### A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 Yannis Stavrakas Vassilis Plachouras IMIS / RC ATHENA Athens, Greece {yannis, vplachouras}@imis.athena-innovation.gr Abstract.

### Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,

### WEB PAGE CATEGORISATION BASED ON NEURONS

WEB PAGE CATEGORISATION BASED ON NEURONS Shikha Batra Abstract: Contemporary web is comprised of trillions of pages and everyday tremendous amount of requests are made to put more web pages on the WWW.

### Successful Steps and Simple Ideas to Maximise your Direct Marketing Return On Investment

Successful Steps and Simple Ideas to Maximise your Direct Marketing Return On Investment By German Sacristan, X1 Head of Marketing and Customer Experience, UK and author of The Digital & Direct Marketing

### Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies

Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,

### Distributed Framework for Data Mining As a Service on Private Cloud

RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &

### DIGITAL MARKETING. be seen. CHANNELADVISOR DIGITAL MARKETING ENABLES RETAILERS TO: And many more

DATA SHEET be seen. DIGITAL MARKETING With more and more digital marketing channels introduced each year, it can be hard to keep track of every place you re advertising let alone how each of your products

### Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment

Analysing Large Files in a Hadoop Distributed Cluster Environment S Saravanan, B Uma Maheswari Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham,

### Machine Learning using MapReduce

Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

### STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT

### Collaborative Filtering. Radek Pelánek

Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains

### Microblog Sentiment Analysis with Emoticon Space Model

Microblog Sentiment Analysis with Emoticon Space Model Fei Jiang, Yiqun Liu, Huanbo Luan, Min Zhang, and Shaoping Ma State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory

### Recommendations in Mobile Environments. Professor Hui Xiong Rutgers Business School Rutgers University. Rutgers, the State University of New Jersey

1 Recommendations in Mobile Environments Professor Hui Xiong Rutgers Business School Rutgers University ADMA-2014 Rutgers, the State University of New Jersey Big Data 3 Big Data Application Requirements

### Architectural Design

Software Engineering Architectural Design 1 Software architecture The design process for identifying the sub-systems making up a system and the framework for sub-system control and communication is architectural

### to Boost SEO Growth Services To learn more, go to: teletech.com

10 Killer Tips to Boost SEO Brands that want to improve their online marketing performance must move away from old school SEO and adopt next-generation optimization strategies and techniques. We offer

### INTERNET MARKETING. SEO Course Syllabus Modules includes: COURSE BROCHURE

AWA offers a wide-ranging yet comprehensive overview into the world of Internet Marketing and Social Networking, examining the most effective methods for utilizing the power of the internet to conduct

### Big Data Big Deal? Salford Systems www.salford-systems.com

Big Data Big Deal? Salford Systems www.salford-systems.com 2015 Copyright Salford Systems 2010-2015 Big Data Is The New In Thing Google trends as of September 24, 2015 Difficult to read trade press without

### The Scientific Data Mining Process

Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

### Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

### Distributed forests for MapReduce-based machine learning

Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication

### Achieve Better Ranking Accuracy Using CloudRank Framework for Cloud Services

Achieve Better Ranking Accuracy Using CloudRank Framework for Cloud Services Ms. M. Subha #1, Mr. K. Saravanan *2 # Student, * Assistant Professor Department of Computer Science and Engineering Regional

### Predict the Popularity of YouTube Videos Using Early View Data

000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

### Response prediction using collaborative filtering with hierarchies and side-information

Response prediction using collaborative filtering with hierarchies and side-information Aditya Krishna Menon 1 Krishna-Prasad Chitrapura 2 Sachin Garg 2 Deepak Agarwal 3 Nagaraj Kota 2 1 UC San Diego 2

### Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

### USER INTENT PREDICTION FROM ACCESS LOG IN ONLINE SHOP

IADIS International Journal on WWW/Internet Vol. 12, No. 1, pp. 52-64 ISSN: 1645-7641 USER INTENT PREDICTION FROM ACCESS LOG IN ONLINE SHOP Hidekazu Yanagimoto. Osaka Prefecture University. 1-1, Gakuen-cho,

### Computational Advertising Andrei Broder Yahoo! Research. SCECR, May 30, 2009

Computational Advertising Andrei Broder Yahoo! Research SCECR, May 30, 2009 Disclaimers This talk presents the opinions of the author. It does not necessarily reflect the views of Yahoo! Inc or any other

### Automated Collaborative Filtering Applications for Online Recruitment Services

Automated Collaborative Filtering Applications for Online Recruitment Services Rachael Rafter, Keith Bradley, Barry Smyth Smart Media Institute, Department of Computer Science, University College Dublin,

### Towards a Transparent Proactive User Interface for a Shopping Assistant

Towards a Transparent Proactive User Interface for a Shopping Assistant Michael Schneider Department of Computer Science, Saarland University, Stuhlsatzenhausweg, Bau 36.1, 66123 Saarbrücken, Germany mschneid@cs.uni-sb.de

### Concept and Project Objectives

3.1 Publishable summary Concept and Project Objectives Proactive and dynamic QoS management, network intrusion detection and early detection of network congestion problems among other applications in the

### SPATIAL DATA CLASSIFICATION AND DATA MINING

, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

### Email Spam Detection Using Customized SimHash Function

International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email

### Large-Scale Data Sets Clustering Based on MapReduce and Hadoop

Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE

### Clustering Technique in Data Mining for Text Documents

Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor

### Query Recommendation employing Query Logs in Search Optimization

1917 Query Recommendation employing Query Logs in Search Optimization Neha Singh Department of Computer Science, Shri Siddhi Vinayak Group of Institutions, Bareilly Email: singh26.neha@gmail.com Dr Manish

### Our SEO services use only ethical search engine optimization techniques. We use only practices that turn out into lasting results in search engines.

Scope of work We will bring the information about your services to the target audience. We provide the fullest possible range of web promotion services like search engine optimization, PPC management,

### RANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING

= + RANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING Stefan Savev Berlin Buzzwords June 2015 KEYWORD-BASED SEARCH Document Data 300 unique words per document 300 000 words in vocabulary Data sparsity:

### Profile Based Personalized Web Search and Download Blocker

Profile Based Personalized Web Search and Download Blocker 1 K.Sheeba, 2 G.Kalaiarasi Dhanalakshmi Srinivasan College of Engineering and Technology, Mamallapuram, Chennai, Tamil nadu, India Email: 1 sheebaoec@gmail.com,

### InfiniteGraph: The Distributed Graph Database

A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086

### DISPLAY ADVERTISING: WHAT YOU RE MISSING. Written by: Darryl Chenoweth, Digital Marketing Expert

Display Advertising: What You re // 1 2 4.ch DISPLAY ADVERTISING: WHAT YOU RE MISSING Written by: Darryl Chenoweth, Digital Marketing Expert Display Advertising: What You re // 2 Table of Contents Introduction...

### The objective setting phase will then help you define other aspects of the project including:

Web design At the start of a new or redesign web project, an important first step is to define the objectives for the web site. What actions do you want visitors to take when they land on the web site?

### International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online http://www.ijoer.

REVIEW ARTICLE ISSN: 2321-7758 UPS EFFICIENT SEARCH ENGINE BASED ON WEB-SNIPPET HIERARCHICAL CLUSTERING MS.MANISHA DESHMUKH, PROF. UMESH KULKARNI Department of Computer Engineering, ARMIET, Department

### It is clear the postal mail is still very relevant in today's marketing environment.

Email and Mobile Digital channels have many strengths, but they also have weaknesses. For example, many companies routinely send out emails as a part of their marketing campaigns. But people receive hundreds

### Mining Navigation Histories for User Need Recognition

Mining Navigation Histories for User Need Recognition Fabio Gasparetti and Alessandro Micarelli and Giuseppe Sansonetti Roma Tre University, Via della Vasca Navale 79, Rome, 00146 Italy {gaspare,micarel,gsansone}@dia.uniroma3.it

### Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

### The. biddible. Guide to AdWords at Christmas

The biddible. Guide to AdWords at Christmas CONTENTS. Page 2 Important Dates Page 3 & 4 Search Campaigns Page 5 Shopping Campaigns Page 6 Display Campaigns Page 7 & 8 Remarketing Campaigns Page 9 About

### Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining

### IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS

IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS V.Sudhakar 1 and G. Draksha 2 Abstract:- Collective behavior refers to the behaviors of individuals

### "SEO vs. PPC The Final Round"

"SEO vs. PPC The Final Round" A Research Study by Engine Ready, Inc. Examining The Role Traffic Source Plays in Visitor Purchase Behavior January 2008 Table of Contents Introduction 3 Definitions 4 Methodology

### Automatic Web Page Classification

Automatic Web Page Classification Yasser Ganjisaffar 84802416 yganjisa@uci.edu 1 Introduction To facilitate user browsing of Web, some websites such as Yahoo! (http://dir.yahoo.com) and Open Directory

### Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications

Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Ahmed Abdulhakim Al-Absi, Dae-Ki Kang and Myong-Jong Kim Abstract In Hadoop MapReduce distributed file system, as the input

### A Comparative Study on Sentiment Classification and Ranking on Product Reviews

A Comparative Study on Sentiment Classification and Ranking on Product Reviews C.EMELDA Research Scholar, PG and Research Department of Computer Science, Nehru Memorial College, Putthanampatti, Bharathidasan

Web News Apps Videos Images More Search Tools How to Use Google AdWords A Beginner s Guide to PPC Advertising How to Use Google AdWords offers.hubspot.com/google-adwords-ppc Learn how to use Google AdWords

### Analyzing Big Data with AWS

Analyzing Big Data with AWS Peter Sirota, General Manager, Amazon Elastic MapReduce @petersirota What is Big Data? Computer generated data Application server logs (web sites, games) Sensor data (weather,

### Dynamical Clustering of Personalized Web Search Results

Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC xshen@cs.uiuc.edu Hong Cheng CS Dept, UIUC hcheng3@uiuc.edu Abstract Most current search engines present the user a ranked

### Web analytics: Data Collected via the Internet

Database Marketing Fall 2016 Web analytics (incl real-time data) Collaborative filtering Facebook advertising Mobile marketing Slide set 8 1 Web analytics: Data Collected via the Internet Customers can

### Using In-Memory Computing to Simplify Big Data Analytics

SCALEOUT SOFTWARE Using In-Memory Computing to Simplify Big Data Analytics by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T he big data revolution is upon us, fed