BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE"

Transcription

1 BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE Alex Lin Senior Architect Intelligent Mining

2 Outline Predictive modeling methodology k-nearest Neighbor (knn) algorithm Singular value decomposition (SVD) method for dimensionality reduction Using a synthetic data set to test and improve your model Experiment and results 2

3 The Business Problem Design product recommender solution that will increase revenue. $$ 3

4 How Do We Increase Revenue? Increase Revenue Increase Conversion Increase Avg. Order Value Increase Unit Price Increase Units / Order 4

5 Example Is this recommendation effective? Increase Unit Price Increase Units / Order 5

6 What am I going to do? 6

7 Predictive Model Framework Data Features ML Algorithm Prediction Output What data? What feature? Which Algorithm? Cross-sell & Up-sell Recommendation 7

8 What Data to Use? Explicit data Ratings Comments Implicit data Order history / Return history Cart events Page views Click-thru Search log In today s talk we only use Order history and Cart events 8

9 Predictive Model Data Features ML Algorithm Prediction Output Order History Cart Events What feature? Which Algorithm? Cross-sell & Up-sell Recommendation 9

10 What Features to Use? We know that a given product tends to get purchased by customers with similar tastes or needs. Use user engagement data to describe a product. users n item user engagement vector 0

11 Data Representation / Features When we merge every item s user engagement vector, we got a m x n item-user matrix users n items m

12 Data Normalization Ensure the magnitudes of the entries in the dataset matrix are appropriate users n items m.52.8 Remove column average so frequent buyers don t dominate the model 2

13 Data Normalization Different engagement data points (Order / Cart / Page View) should have different weights Common normalization strategies: Remove column average Remove row average Remove global mean Z-score Fill-in the null values 3

14 Predictive Model Data Features ML Algorithm Prediction Output Order History Cart Events User engagement vector Which Algorithm? Cross-sell & Up-sell Recommendation Data Normalization 4

15 Which Algorithm? How do we find the items that have similar user engagement data? users n.25 2 items m.25 We can find the items that have similar user engagement vectors with knn algorithm 5

16 k-nearest Neighbor (knn) Find the k items that have the most similar user engagement vectors users n.5 items m.5 Nearest Neighbors of Item 4 = [2,3,] 6

17 Similarity Measure for knn users n items 2 4 Jaccard coefficient: Cosine similarity: sim(a,b) = cos(a,b) = Pearson Correlation: corr(a,b) = = a.5 sim(a,b) = i a b 2 b 2 =.5 (+) (++) + (+++) (+) (r ai r a )(r bi r b ) (r ai r a ) 2 (r i bi r b ) 2 i (*+ 0.5 *) ( ) * ( ) = m a i b i a i b i m a 2 i ( a i ) 2 m b 2 i ( b i ) 2 match _ cols* Dotprod(a,b) sum(a) * sum(b) match _cols* sum(a 2 ) (sum(a)) 2 match _ cols* sum(b 2 ) (sum(b)) 2 7

18 k-nearest Neighbor (knn) feature space 9 7 Item Similarity Measure (cosine similarity) knn k=5 Nearest Neighbors(8) = [9,6,3,,2] 8

19 Predictive Model Ver. : knn Data Features ML Algorithm Prediction Output Order History Cart Events User engagement vector k-nearest Neighbor (knn) Cross-sell & Up-sell Recommendation Data Normalization 9

20 Cosine Similarity Code fragment long i_cnt = 00000; // number of items 00K long u_cnt = ; // number of users 2M double data[i_cnt][u_cnt]; // 00K by 2M dataset matrix (in reality, it needs to be malloc allocation) double norm[i_cnt]; // assume data matrix is loaded // calculate vector norm for each user engagement vector for (i=0; i<i_cnt; i++) { norm[i] = 0; for (f=0; f<u_cnt; f++) { norm[i] += data[i][f] * data [i][f]; }. 00K rows x 00K rows x 2M features --> scalability problem norm[i] = sqrt(norm[i]); kd-tree, Locality sensitive hashing, } MapReduce/Hadoop, Multicore/Threading, Stream Processors // cosine similarity calculation 2. data[i] is high-dimensional and sparse, similarity measures for (i=0; i<i_cnt; i++) { // loop thru 00K are not reliable --> accuracy problem for (j=0; j<i_cnt; j++) { // loop thru 00K This leads us to The SVD dimensionality reduction! dot_product = 0; for (f=0; f<u_cnt; f++) { // loop thru entire user space 2M dot_product += data[i][f] * data[j][f]; } printf( %d %d %lf\n, i, j, dot_product/(norm[i] * norm[j])); } 20 // find the Top K nearest neighbors here.

21 Singular Value Decomposition (SVD) A = U S V T A m x n matrix U m x r matrix S r x r matrix V T r x n matrix items items rank = k k < r users A k = U k S k V k T users users Low rank approx. Item profile is Low rank approx. User profile is U k * S k S k *V k T items 2 Low rank approx. Item-User matrix is U k * S k * S k *V k T

22 Reduced SVD A k = U k S k V k T A k 00K x 2M matrix U k 00K x 3 matrix S k 3 x 3 matrix V k T 3 x 2M matrix items items 0 0 rank = 3 users Descending Singular Values users Low rank approx. Item profile is U k * S k 22

23 SVD Factor Interpretation Singular values plot (rank=52) S 3 x 3 matrix Descending Singular Values More Significant Latent Factors Noises + Others 23 Less Significant

24 SVD Dimensionality Reduction U k * <----- latent factors -----> S k # of users items 3 rank Need to find the most optimal low rank!! 0 24

25 Missing values Difference between 0 and unknown Missing values do NOT appear randomly. Value = (Preference Factors) + (Availability) (Purchased elsewhere) (Navigation inefficiency) etc. Approx. Value = (Preference Factors) +/- (Noise) Modeling missing values correctly will help us make good recommendations, especially when working with an extremely sparse data set 25

26 Singular Value Decomposition (SVD) Use SVD to reduce dimensionality, so neighborhood formation happens in reduced user space SVD helps model to find the low rank approx. dataset matrix, while retaining the critical latent factors and ignoring noise. Optimal low rank needs to be tuned SVD is computationally expensive SVD Libraries: Matlab [U, S, V] = svds(a,256); SVDPACKC SVDLIBC GHAPACK 26

27 Predictive Model Ver. 2: SVD+kNN Data Features ML Algorithm Prediction Output Order History Cart Events User engagement vector k-nearest Neighbors (knn) in reduced space Cross-sell & Up-sell Recommendation Data Normalization SVD 27

28 Synthetic Data Set Why do we use synthetic data set? So we can test our new model in a controlled environment 28

29 Synthetic Data Set 6 latent factors synthetic e-commerce data set Dimension:,000 (items) by 20,000 (users) 6 user preference factors 6 item property factors (non-negative) Txn Set: n = 55,360 sparsity = % Txn+Cart Set: n = 92,985 sparsity = 99.03% Download: user_id item_id type

30 Synthetic Data Set Item property factors K x 6 matrix a b c User preference factors 6 x 20K matrix x y z items Purchase Likelihood score K x 20K matrix X X 2 X 3 X 4 X 5 X 6 X 2 X 22 X 2 X 24 X 25 X 26 X 3 X 32 X 33 X 34 X 35 X 36 X 4 X 42 X 43 X 44 X 45 X 46 X 5 X 52 X 53 X 54 X 55 X 56 users X 32 = (a, b, c). (x, y, z) = a * x + b * y + c * z X 32 = Likelihood of Item 3 being purchased by User 2 30

31 Synthetic Data Set X X 2 X 3 X 4 Based on the distribution, pre-determine # of items purchased by an user (# of item=2) X 5 X 4 X 3 X 4 Sort by Purchase likelihood Score X 2 X 5 From the top, select and skip certain items to create data sparsity. X 3 X 2 X 5 X X User purchased Item 4 and Item 3

32 Experiment Setup Each model (Random / knn / SVD+kNN) will generate top 20 recommendations for each item. Compare model output to the actual top 20 provided by synthetic data set Evaluation Metrics : Precision %: Overlapping of the top 20 between model output and actual (higher the better) Precision = {Found _ Top20_items} {Actual _ Top20_items} {Found _Top20_items} Quality metric: Average of the actual ranking in the model output (lower the better)

33 Experimental Result knn vs. Random (Control) Precision % (higher is better) Quality (Lower is better) 33

34 Experimental Result Precision % of SVD+kNN Recall % (higher is better) Improvement SVD Rank 34

35 Experimental Result Quality of SVD+kNN Quality (Lower is better) Improvement 35 SVD Rank

36 Experimental Result The effect of using Cart data Precision % (higher is better) SVD Rank 36

37 Experimental Result The effect of using Cart data Quality (Lower is better) SVD Rank 37

38 Outline Predictive modeling methodology k-nearest Neighbor (knn) algorithm Singular value decomposition (SVD) method for dimensionality reduction Using a synthetic data set to test and improve your model Experiment and results 38

39 References J.S. Breese, D. Heckerman and C. Kadie, "Empirical Analysis of Predictive Algorithms for Collaborative Filtering," in Proceedings of the Fourteenth Conference on Uncertainity in Artificial Intelligence (UAI 998), 998. B. Sarwar, G. Karypis, J. Konstan and J. Riedl, "Item-based collaborative filtering recommendation algorithms," in Proceedings of the Tenth International Conference on the World Wide Web (WWW 0), pp , 200. B. Sarwar, G. Karypis, J. Konstan, and J. Riedl "Application of Dimensionality Reduction in Recommender System A Case Study" In ACM WebKDD 2000 Web Mining for E-Commerce Workshop Apache Lucene Mahout Cofi: A Java-Based Collaborative Filtering Library 39

40 Thank you Any question or comment? 40

! E6893 Big Data Analytics Lecture 5:! Big Data Analytics Algorithms -- II

! E6893 Big Data Analytics Lecture 5:! Big Data Analytics Algorithms -- II ! E6893 Big Data Analytics Lecture 5:! Big Data Analytics Algorithms -- II Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and

More information

Recommender Systems for Large-scale E-Commerce: Scalable Neighborhood Formation Using Clustering

Recommender Systems for Large-scale E-Commerce: Scalable Neighborhood Formation Using Clustering Recommender Systems for Large-scale E-Commerce: Scalable Neighborhood Formation Using Clustering Badrul M Sarwar,GeorgeKarypis, Joseph Konstan, and John Riedl {sarwar, karypis, konstan, riedl}@csumnedu

More information

Application of Dimensionality Reduction in Recommender System -- A Case Study

Application of Dimensionality Reduction in Recommender System -- A Case Study Application of Dimensionality Reduction in Recommender System -- A Case Study Badrul M. Sarwar, George Karypis, Joseph A. Konstan, John T. Riedl GroupLens Research Group / Army HPC Research Center Department

More information

RANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING

RANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING = + RANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING Stefan Savev Berlin Buzzwords June 2015 KEYWORD-BASED SEARCH Document Data 300 unique words per document 300 000 words in vocabulary Data sparsity:

More information

PREA: Personalized Recommendation Algorithms Toolkit

PREA: Personalized Recommendation Algorithms Toolkit Journal of Machine Learning Research 13 (2012) 2699-2703 Submitted 7/11; Revised 4/12; Published 9/12 PREA: Personalized Recommendation Algorithms Toolkit Joonseok Lee Mingxuan Sun Guy Lebanon College

More information

Generating Top-N Recommendations from Binary Profile Data

Generating Top-N Recommendations from Binary Profile Data Generating Top-N Recommendations from Binary Profile Data Michael Hahsler Marketing Research and e-business Adviser Hall Financial Group, Frisco, Texas, USA Hall Wines, St. Helena, California, USA Berufungsvortrag

More information

Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments

Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science August 2001 Probabilistic Models for Unified Collaborative and Content-Based Recommendation

More information

Automated Collaborative Filtering Applications for Online Recruitment Services

Automated Collaborative Filtering Applications for Online Recruitment Services Automated Collaborative Filtering Applications for Online Recruitment Services Rachael Rafter, Keith Bradley, Barry Smyth Smart Media Institute, Department of Computer Science, University College Dublin,

More information

IPTV Recommender Systems. Paolo Cremonesi

IPTV Recommender Systems. Paolo Cremonesi IPTV Recommender Systems Paolo Cremonesi Agenda 2 IPTV architecture Recommender algorithms Evaluation of different algorithms Multi-model systems Valentino Rossi 3 IPTV architecture 4 Live TV Set-top-box

More information

Collaborative Filtering. Radek Pelánek

Collaborative Filtering. Radek Pelánek Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains

More information

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC 1. Introduction A popular rule of thumb suggests that

More information

Predicting User Preference for Movies using NetFlix database

Predicting User Preference for Movies using NetFlix database Predicting User Preference for Movies using NetFlix database Dhiraj Goel and Dhruv Batra Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, PA 15213 {dgoel,dbatra}@ece.cmu.edu

More information

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step

More information

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large

More information

Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

More information

Similarity Search in a Very Large Scale Using Hadoop and HBase

Similarity Search in a Very Large Scale Using Hadoop and HBase Similarity Search in a Very Large Scale Using Hadoop and HBase Stanislav Barton, Vlastislav Dohnal, Philippe Rigaux LAMSADE - Universite Paris Dauphine, France Internet Memory Foundation, Paris, France

More information

Recommender Systems. User-Facing Decision Support Systems. Michael Hahsler

Recommender Systems. User-Facing Decision Support Systems. Michael Hahsler Recommender Systems User-Facing Decision Support Systems Michael Hahsler Intelligent Data Analysis Lab (IDA@SMU) CSE, Lyle School of Engineering Southern Methodist University EMIS 5/7357: Decision Support

More information

arxiv:1505.07900v1 [cs.ir] 29 May 2015

arxiv:1505.07900v1 [cs.ir] 29 May 2015 A Faster Algorithm to Build New Users Similarity List in Neighbourhood-based Collaborative Filtering Zhigang Lu and Hong Shen arxiv:1505.07900v1 [cs.ir] 29 May 2015 School of Computer Science, The University

More information

Achieve Better Ranking Accuracy Using CloudRank Framework for Cloud Services

Achieve Better Ranking Accuracy Using CloudRank Framework for Cloud Services Achieve Better Ranking Accuracy Using CloudRank Framework for Cloud Services Ms. M. Subha #1, Mr. K. Saravanan *2 # Student, * Assistant Professor Department of Computer Science and Engineering Regional

More information

Recommendation Tool Using Collaborative Filtering

Recommendation Tool Using Collaborative Filtering Recommendation Tool Using Collaborative Filtering Aditya Mandhare 1, Soniya Nemade 2, M.Kiruthika 3 Student, Computer Engineering Department, FCRIT, Vashi, India 1 Student, Computer Engineering Department,

More information

Face Recognition using SIFT Features

Face Recognition using SIFT Features Face Recognition using SIFT Features Mohamed Aly CNS186 Term Project Winter 2006 Abstract Face recognition has many important practical applications, like surveillance and access control.

More information

International Journal of Innovative Research in Computer and Communication Engineering

International Journal of Innovative Research in Computer and Communication Engineering Achieve Ranking Accuracy Using Cloudrank Framework for Cloud Services R.Yuvarani 1, M.Sivalakshmi 2 M.E, Department of CSE, Syed Ammal Engineering College, Ramanathapuram, India ABSTRACT: Building high

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

Mammoth Scale Machine Learning!

Mammoth Scale Machine Learning! Mammoth Scale Machine Learning! Speaker: Robin Anil, Apache Mahout PMC Member! OSCON"10! Portland, OR! July 2010! Quick Show of Hands!# Are you fascinated about ML?!# Have you used ML?!# Do you have Gigabytes

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

A NOVEL RESEARCH PAPER RECOMMENDATION SYSTEM

A NOVEL RESEARCH PAPER RECOMMENDATION SYSTEM International Journal of Advanced Research in Engineering and Technology (IJARET) Volume 7, Issue 1, Jan-Feb 2016, pp. 07-16, Article ID: IJARET_07_01_002 Available online at http://www.iaeme.com/ijaret/issues.asp?jtype=ijaret&vtype=7&itype=1

More information

Determining optimum insurance product portfolio through predictive analytics BADM Final Project Report

Determining optimum insurance product portfolio through predictive analytics BADM Final Project Report 2012 Determining optimum insurance product portfolio through predictive analytics BADM Final Project Report Dinesh Ganti(61310071), Gauri Singh(61310560), Ravi Shankar(61310210), Shouri Kamtala(61310215),

More information

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

More information

CHAPTER VII CONCLUSIONS

CHAPTER VII CONCLUSIONS CHAPTER VII CONCLUSIONS To do successful research, you don t need to know everything, you just need to know of one thing that isn t known. -Arthur Schawlow In this chapter, we provide the summery of the

More information

Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek

Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek Recommender Systems: Content-based, Knowledge-based, Hybrid Radek Pelánek 2015 Today lecture, basic principles: content-based knowledge-based hybrid, choice of approach,... critiquing, explanations,...

More information

Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights

Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights Seventh IEEE International Conference on Data Mining Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights Robert M. Bell and Yehuda Koren AT&T Labs Research 180 Park

More information

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Mobile Phone APP Software Browsing Behavior using Clustering Analysis Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

More information

Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

More information

Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model

Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model AI TERM PROJECT GROUP 14 1 Anti-Spam Filter Based on,, and model Yun-Nung Chen, Che-An Lu, Chao-Yu Huang Abstract spam email filters are a well-known and powerful type of filters. We construct different

More information

A Platform to Support Web Site Adaptation and Monitoring of its Effects: A Case Study

A Platform to Support Web Site Adaptation and Monitoring of its Effects: A Case Study A Platform to Support Web Site Adaptation and Monitoring of its Effects: A Case Study Marcos A. Domingues Fac. de Ciências, U. Porto LIAAD-INESC Porto L.A., Portugal marcos@liaad.up.pt José Paulo Leal

More information

Paper Classification for Recommendation on Research Support System Papits

Paper Classification for Recommendation on Research Support System Papits IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.5A, May 006 17 Paper Classification for Recommendation on Research Support System Papits Tadachika Ozono, and Toramatsu Shintani,

More information

The Need for Training in Big Data: Experiences and Case Studies

The Need for Training in Big Data: Experiences and Case Studies The Need for Training in Big Data: Experiences and Case Studies Guy Lebanon Amazon Background and Disclaimer All opinions are mine; other perspectives are legitimate. Based on my experience as a professor

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

arxiv: v1 [cs.ir] 20 Dec 2016

arxiv: v1 [cs.ir] 20 Dec 2016 Classification and Learning-to-rank Approaches for Cross-Device Matching at CIKM Cup 2016 Nam Khanh Tran L3S Research Center - Leibniz Universität Hannover ntran@l3s.de arxiv:1612.07117v1 [cs.ir] 20 Dec

More information

A Comparison Framework of Similarity Metrics Used for Web Access Log Analysis

A Comparison Framework of Similarity Metrics Used for Web Access Log Analysis A Comparison Framework of Similarity Metrics Used for Web Access Log Analysis Yusuf Yaslan and Zehra Cataltepe Istanbul Technical University, Computer Engineering Department, Maslak 34469 Istanbul, Turkey

More information

The Application of Data-Mining to Recommender Systems

The Application of Data-Mining to Recommender Systems The Application of Data-Mining to Recommender Systems J. Ben Schafer, Ph.D. University of Northern Iowa INTRODUCTION In a world where the number of choices can be overwhelming, recommender systems help

More information

Entropy based Graph Clustering: Application to Biological and Social Networks

Entropy based Graph Clustering: Application to Biological and Social Networks Entropy based Graph Clustering: Application to Biological and Social Networks Edward C Kenley Young-Rae Cho Department of Computer Science Baylor University Complex Systems Definition Dynamically evolving

More information

Link Prediction in Social Networks

Link Prediction in Social Networks Link Prediction in Social Networks 2/17/2014 Outline Link Prediction Problems Social Network Recommender system Algorithms of Link Prediction Supervised Methods Collaborative Filtering Recommender System

More information

Accurate is not always good: How Accuracy Metrics have hurt Recommender Systems

Accurate is not always good: How Accuracy Metrics have hurt Recommender Systems Accurate is not always good: How Accuracy Metrics have hurt Recommender Systems Sean M. McNee mcnee@cs.umn.edu John Riedl riedl@cs.umn.edu Joseph A. Konstan konstan@cs.umn.edu Copyright is held by the

More information

Comparison of Standard and Zipf-Based Document Retrieval Heuristics

Comparison of Standard and Zipf-Based Document Retrieval Heuristics Comparison of Standard and Zipf-Based Document Retrieval Heuristics Benjamin Hoffmann Universität Stuttgart, Institut für Formale Methoden der Informatik Universitätsstr. 38, D-70569 Stuttgart, Germany

More information

lop Building Machine Learning Systems with Python en source

lop Building Machine Learning Systems with Python en source Building Machine Learning Systems with Python Master the art of machine learning with Python and build effective machine learning systems with this intensive handson guide Willi Richert Luis Pedro Coelho

More information

A Collaborative Filtering Recommendation Algorithm Based On User Clustering And Item Clustering

A Collaborative Filtering Recommendation Algorithm Based On User Clustering And Item Clustering A Collaborative Filtering Recommendation Algorithm Based On User Clustering And Item Clustering GRADUATE PROJECT TECHNICAL REPORT Submitted to the Faculty of The School of Engineering & Computing Sciences

More information

Recommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1

Recommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Recommender Systems Seminar Topic : Application Tung Do 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Agenda Google news personalization : Scalable Online Collaborative Filtering Algorithm, System Components

More information

Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

Investigation of Latent Semantic Analysis for Clustering of Czech News Articles

Investigation of Latent Semantic Analysis for Clustering of Czech News Articles Investigation of Latent Semantic Analysis for Clustering of Czech News Articles Michal Rott, Petr Cerva Institute of Information Technology and Electronics Technical University of Liberec Studentska 2,

More information

User Data Analytics and Recommender System for Discovery Engine

User Data Analytics and Recommender System for Discovery Engine User Data Analytics and Recommender System for Discovery Engine Yu Wang Master of Science Thesis Stockholm, Sweden 2013 TRITA- ICT- EX- 2013: 88 User Data Analytics and Recommender System for Discovery

More information

Personalized advertising services through hybrid recommendation methods: the case of digital interactive television

Personalized advertising services through hybrid recommendation methods: the case of digital interactive television Personalized advertising services through hybrid recommendation methods: the case of digital interactive television George Lekakos Department of Informatics Cyprus University glekakos@cs.ucy.ac.cy Abstract

More information

MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

More information

Predictive Indexing for Fast Search

Predictive Indexing for Fast Search Predictive Indexing for Fast Search Sharad Goel Yahoo! Research New York, NY 10018 goel@yahoo-inc.com John Langford Yahoo! Research New York, NY 10018 jl@yahoo-inc.com Alex Strehl Yahoo! Research New York,

More information

RECOMMENDATION SYSTEM

RECOMMENDATION SYSTEM RECOMMENDATION SYSTEM October 8, 2013 Team Members: 1) Duygu Kabakcı, 1746064, duygukabakci@gmail.com 2) Işınsu Katırcıoğlu, 1819432, isinsu.katircioglu@gmail.com 3) Sıla Kaya, 1746122, silakaya91@gmail.com

More information

A Survey on Challenges and Methods in News Recommendation

A Survey on Challenges and Methods in News Recommendation A Survey on Challenges and Methods in News Recommendation Özlem Özgöbek 1 2, Jon Atle Gulla 1 and R. Cenk Erdur 2 1 Department of Computer and Information Science, NTNU, Trondheim, Norway 2 Department

More information

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree

More information

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577 T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or

More information

Recommending News Articles using Cosine Similarity Function Rajendra LVN 1, Qing Wang 2 and John Dilip Raj 1

Recommending News Articles using Cosine Similarity Function Rajendra LVN 1, Qing Wang 2 and John Dilip Raj 1 Paper 1886-2014 Recommending News s using Cosine Similarity Function Rajendra LVN 1, Qing Wang 2 and John Dilip Raj 1 1 GE Capital Retail Finance, 2 Warwick Business School ABSTRACT Predicting news articles

More information

Lecture #2. Algorithms for Big Data

Lecture #2. Algorithms for Big Data Additional Topics: Big Data Lecture #2 Algorithms for Big Data Joseph Bonneau jcb82@cam.ac.uk April 30, 2012 Today's topic: algorithms Do we need new algorithms? Quantity is a quality of its own Joseph

More information

Masters Courses Recommendation: Exploring Collaborative Filtering and Singular Value Decomposition with Student Profiling

Masters Courses Recommendation: Exploring Collaborative Filtering and Singular Value Decomposition with Student Profiling Masters Courses Recommendation: Exploring Collaborative Filtering and Singular Value Decomposition with Student Profiling Fábio Carballo fabio.carballo@tecnico.ulisboa.pt Instituto Superior Técnico, Universidade

More information

Probabilistic Latent Semantic Analysis (plsa)

Probabilistic Latent Semantic Analysis (plsa) Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} References

More information

A NEW HYBRID ALGORITHM FOR BUSINESS INTELLIGENCE RECOMMENDER SYSTEM

A NEW HYBRID ALGORITHM FOR BUSINESS INTELLIGENCE RECOMMENDER SYSTEM A NEW HYBRID ALGORITHM FOR BUSINESS INTELLIGENCE RECOMMENDER SYSTEM PPrabhu 1 and NAnbazhagan 2 1 Directorate of Distance Education, Alagappa University, Karaikudi, Tamilnadu, INDIA 2 Department of Mathematics,

More information

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

More information

Big Data Analytics Verizon Lab, Palo Alto

Big Data Analytics Verizon Lab, Palo Alto Spark Meetup Big Data Analytics Verizon Lab, Palo Alto July 28th, 2015 Copyright 2015 Verizon. All Rights Reserved. Information contained herein is provided AS IS and subject to change without notice.

More information

USER INTENT PREDICTION FROM ACCESS LOG IN ONLINE SHOP

USER INTENT PREDICTION FROM ACCESS LOG IN ONLINE SHOP IADIS International Journal on WWW/Internet Vol. 12, No. 1, pp. 52-64 ISSN: 1645-7641 USER INTENT PREDICTION FROM ACCESS LOG IN ONLINE SHOP Hidekazu Yanagimoto. Osaka Prefecture University. 1-1, Gakuen-cho,

More information

ISSN: 2321-7782 (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies

ISSN: 2321-7782 (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Paper / Case Study Available online at: www.ijarcsms.com

More information

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015 W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H-1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction

More information

Clustering Big Data. Efficient Data Mining Technologies. J Singh and Teresa Brooks. June 4, 2015

Clustering Big Data. Efficient Data Mining Technologies. J Singh and Teresa Brooks. June 4, 2015 Clustering Big Data Efficient Data Mining Technologies J Singh and Teresa Brooks June 4, 2015 Hello Bulgaria (http://hello.bg/) A website with thousands of pages... Some pages identical to other pages

More information

Ordering Sentences According to Topicality

Ordering Sentences According to Topicality Ordering Sentences According to Topicality Ilana Bromberg The Ohio State University bromberg@ling.ohio-state.edu Abstract This paper addresses the problem of finding or producing the best ordering of the

More information

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online http://www.ijoer.

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online http://www.ijoer. REVIEW ARTICLE ISSN: 2321-7758 UPS EFFICIENT SEARCH ENGINE BASED ON WEB-SNIPPET HIERARCHICAL CLUSTERING MS.MANISHA DESHMUKH, PROF. UMESH KULKARNI Department of Computer Engineering, ARMIET, Department

More information

Lecture 5: Singular Value Decomposition SVD (1)

Lecture 5: Singular Value Decomposition SVD (1) EEM3L1: Numerical and Analytical Techniques Lecture 5: Singular Value Decomposition SVD (1) EE3L1, slide 1, Version 4: 25-Sep-02 Motivation for SVD (1) SVD = Singular Value Decomposition Consider the system

More information

WEEK #3, Lecture 1: Sparse Systems, MATLAB Graphics

WEEK #3, Lecture 1: Sparse Systems, MATLAB Graphics WEEK #3, Lecture 1: Sparse Systems, MATLAB Graphics Visualization of Matrices Good visuals anchor any presentation. MATLAB has a wide variety of ways to display data and calculation results that can be

More information

Recommender System for Online Dating Service

Recommender System for Online Dating Service Recommender System for Online Dating Service Lukáš Brožovský 1 and Václav Petříček 1 KSI MFF UK Malostranské nám. 25, Prague 1, Czech Republic lbrozovsky@centrum.cz, petricek@acm.org Abstract. Users of

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

A Novel Classification Framework for Evaluating Individual and Aggregate Diversity in Top-N Recommendations

A Novel Classification Framework for Evaluating Individual and Aggregate Diversity in Top-N Recommendations A Novel Classification Framework for Evaluating Individual and Aggregate Diversity in Top-N Recommendations JENNIFER MOODY, University of Ulster DAVID H. GLASS, University of Ulster The primary goal of

More information

Search Engines. Stephen Shaw <stesh@netsoc.tcd.ie> 18th of February, 2014. Netsoc

Search Engines. Stephen Shaw <stesh@netsoc.tcd.ie> 18th of February, 2014. Netsoc Search Engines Stephen Shaw Netsoc 18th of February, 2014 Me M.Sc. Artificial Intelligence, University of Edinburgh Would recommend B.A. (Mod.) Computer Science, Linguistics, French,

More information

CLASSIFICATION AND CLUSTERING. Anveshi Charuvaka

CLASSIFICATION AND CLUSTERING. Anveshi Charuvaka CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training

More information

! E6893 Big Data Analytics:! Demo Session II: Mahout working with Eclipse and Maven for Collaborative Filtering

! E6893 Big Data Analytics:! Demo Session II: Mahout working with Eclipse and Maven for Collaborative Filtering E6893 Big Data Analytics: Demo Session II: Mahout working with Eclipse and Maven for Collaborative Filtering Aonan Zhang Dept. of Electrical Engineering 1 October 9th, 2014 Mahout Brief Review The Apache

More information

Using multiple models: Bagging, Boosting, Ensembles, Forests

Using multiple models: Bagging, Boosting, Ensembles, Forests Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or

More information

Exposing commercial value in social networks: matching online communities and businesses

Exposing commercial value in social networks: matching online communities and businesses Exposing commercial value in social networks: matching online communities and businesses Murali Narasimhan muralina Camelia Simoiu csimoiu December 13, 2014 Anthony Ward tonyward Abstract This paper explores

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Data Science Center Eindhoven. Big Data: Challenges and Opportunities for Mathematicians. Alessandro Di Bucchianico

Data Science Center Eindhoven. Big Data: Challenges and Opportunities for Mathematicians. Alessandro Di Bucchianico Data Science Center Eindhoven Big Data: Challenges and Opportunities for Mathematicians Alessandro Di Bucchianico Dutch Mathematical Congress April 15, 2015 Contents 1. Big Data terminology 2. Various

More information

Data Mining: A Preprocessing Engine

Data Mining: A Preprocessing Engine Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,

More information

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace

More information

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in

More information

The Wisdom of the Few

The Wisdom of the Few The Wisdom of the Few A Collaborative Filtering Approach Based on Expert Opinions from the Web Xavier Amatriain Telefonica Research Via Augusta, 177 Barcelona 08021, Spain xar@tid.es Haewoon Kwak KAIST

More information

Mining an Online Auctions Data Warehouse

Mining an Online Auctions Data Warehouse Proceedings of MASPLAS'02 The Mid-Atlantic Student Workshop on Programming Languages and Systems Pace University, April 19, 2002 Mining an Online Auctions Data Warehouse David Ulmer Under the guidance

More information

Contact Recommendations from Aggegrated On-Line Activity

Contact Recommendations from Aggegrated On-Line Activity Contact Recommendations from Aggegrated On-Line Activity Abigail Gertner, Justin Richer, and Thomas Bartee The MITRE Corporation 202 Burlington Road, Bedford, MA 01730 {gertner,jricher,tbartee}@mitre.org

More information

Scientific Report. BIDYUT KUMAR / PATRA INDIAN VTT Technical Research Centre of Finland, Finland. Raimo / Launonen. First name / Family name

Scientific Report. BIDYUT KUMAR / PATRA INDIAN VTT Technical Research Centre of Finland, Finland. Raimo / Launonen. First name / Family name Scientific Report First name / Family name Nationality Name of the Host Organisation First Name / family name of the Scientific Coordinator BIDYUT KUMAR / PATRA INDIAN VTT Technical Research Centre of

More information

2. K-Nearest Neighbors Classifier. 1. Introduction. Paper

2. K-Nearest Neighbors Classifier. 1. Introduction. Paper Paper A k-nearest Neighbors Method for Classifying User Sessions in E-Commerce Scenario Grażyna Suchacka 1, Magdalena Skolimowska-Kulig 1, and Aneta Potempa 2 1 Institute of Mathematics and Informatics,

More information

Dr. Antony Selvadoss Thanamani, Head & Associate Professor, Department of Computer Science, NGM College, Pollachi, India.

Dr. Antony Selvadoss Thanamani, Head & Associate Professor, Department of Computer Science, NGM College, Pollachi, India. Enhanced Approach on Web Page Classification Using Machine Learning Technique S.Gowri Shanthi Research Scholar, Department of Computer Science, NGM College, Pollachi, India. Dr. Antony Selvadoss Thanamani,

More information

Big Data and Scripting Systems build on top of Hadoop

Big Data and Scripting Systems build on top of Hadoop Big Data and Scripting Systems build on top of Hadoop 1, 2, Pig/Latin high-level map reduce programming platform interactive execution of map reduce jobs Pig is the name of the system Pig Latin is the

More information

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University

More information

recommenderlab: A Framework for Developing and Testing Recommendation Algorithms

recommenderlab: A Framework for Developing and Testing Recommendation Algorithms recommenderlab: A Framework for Developing and Testing Recommendation Algorithms Michael Hahsler Southern Methodist University Abstract The problem of creating recommendations given a large data base from

More information

Model Selection. Introduction. Model Selection

Model Selection. Introduction. Model Selection Model Selection Introduction This user guide provides information about the Partek Model Selection tool. Topics covered include using a Down syndrome data set to demonstrate the usage of the Partek Model

More information

Vocabulary Problem in Internet Resource Discovery. Shih-Hao Li and Peter B. Danzig. University of Southern California. fshli, danzigg@cs.usc.

Vocabulary Problem in Internet Resource Discovery. Shih-Hao Li and Peter B. Danzig. University of Southern California. fshli, danzigg@cs.usc. Vocabulary Problem in Internet Resource Discovery Technical Report USC-CS-94-594 Shih-Hao Li and Peter B. Danzig Computer Science Department University of Southern California Los Angeles, California 90089-0781

More information

Analyzing The Role Of Dimension Arrangement For Data Visualization in Radviz

Analyzing The Role Of Dimension Arrangement For Data Visualization in Radviz Analyzing The Role Of Dimension Arrangement For Data Visualization in Radviz Luigi Di Caro 1, Vanessa Frias-Martinez 2, and Enrique Frias-Martinez 2 1 Department of Computer Science, Universita di Torino,

More information