Hybrid model rating prediction with Linked Open Data for Recommender Systems
|
|
|
- Terence Underwood
- 10 years ago
- Views:
Transcription
1 Hybrid model rating prediction with Linked Open Data for Recommender Systems Andrés Moreno 12 Christian Ariza-Porras 1, Paula Lago 1, Claudia Jiménez-Guarín 1, Harold Castro 1, and Michel Riveill 2 1 School of Engineering, Universidad de los Andes, Bogotá, Colombia 2 Univ. Nice Sophia Antipolis, CNRS, I3S, UMR 7271, Sophia Antipolis, France {dar-more, cf.ariza975, pa.lago52,cjimenez, hcastro}@uniandes.edu.co, [email protected] Abstract. We detail the solution of team uniandes1 to the ESWC 2014 Linked Open Data-enabled Recommender Systems Challenge Task 1 (rating prediction on a cold start situation). In these situations, there are few ratings per item and user and thus collaborative filtering techniques may not be suitable. In order to be able to use a content-based solution, linked-open data from DBPedia was used to obtain a set of descriptive features for each item. We compare the performance (measured as RMSE) of three models on this cold-start situation: contentbased (using min-count sketches), collaborative filtering (SVD++) and rule-based switched hybrid models. Experimental results show that the hybrid system outperforms each of the models that compose it. Since features taken from DBPedia were sparse, we clustered items in order to reduce the dimensionality of the item and user profiles. Keywords: semantic web, recommender systems 1 Introduction Recommender systems (RS) are automatic agents that attempt to suggest new or interesting items to users. A number of different algorithms have been proposed to improve the performance of recommender systems, which can be classified in two groups: collaborative-filtering techniques and content-based filtering techniques. Collaborative-filtering techniques (CF) are based on the fact that similar users like similar items and thus base their predictions in the ratings provided by similar users. Content-based techniques (CB) build a user profile of interests based on the features of the items the user has rated. On cold-start situations, when items have few ratings, neither system can perform well. This is because they don t have the amount of data needed to find either true similarities among users (CF) or to construct the user profiles (CB). In these circumstances, more data is needed, either to describe the items or the users. Thanks to linked open data initiatives, information about items can be found on the web. Task 1 of the linked open data enabled recommender systems challenge purpose was to predict the rating a user would give to an item in a
2 cold-start situation. In order to be able to use a CB solution, linked-open data is used to obtain features that describe items in machine-readable format. The paper is organized as follows: we describe the provided dataset, the performance metric used to evaluate the predictions, give an overview of the proposed solution and discuss the obtained results. Dataset description The DBbook dataset contains ratings of 6166 books by 6181 users. The possible ratings that a user can assign to an item are O = {0, 1, 2, 3, 4, 5}. The ratings file has 3 fields: a user id, an item id, and the rating. Each item has been rated by at least one user, but the evaluation set includes some books not rated in the training set, representing a cold-start situation. The dataset also provides a mapping of each item id to a DBPedia URI which gives access to a semantic description of items. Given this description, we can define each book with a set of concepts C i taken from DBPedia. We use the following concepts to describe a book: author, categories, literary genres, and subject. Figure 1 depicts the feature extraction process. The feature space size is concepts. Each book has an average of features with standard deviation (std) of Each feature appears in an average of 9.62 books with std of 118, and a max of Fig. 1: Semantic Features Extraction 2 Prediction Model Burke [1] describes different ways in which recommender system models can be combined. The switched strategy maintains different models in parallel and reports to the user the prediction of the model with higher confidence. We use as base model a widely known CF algorithm (SVD++) (Section 2.2). However, since traditional CF systems usually make incorrect predictions when no previous
3 ratings about the item are known, the prediction of the switched hybrid model on cold-start situations is delegated to the CB model explained in Section 2.1. The measure used to evaluate the predictive performance of the system is the Root Mean Square Error (RMSE). Let T be the rating set of a hold-out set (test set), T ui, the rating that the user u gave to item i and rˆ u i the model prediction, 1 the RMSE is defined as RMSE := T T (ˆr ui T ui T ui ) 2. In the remainder of this section, we will describe the models that take part in our system. 2.1 Content Based model On a CB model, a user u has a profile with a list of non duplicate concepts C u and a set of O vectors w o R Cu, o O. For each example of user-item interaction, each of the concepts that are related to the item (C i ) are considered for addition into the user s list C u. We use an inclusion policy using a sliding window min-count sketch structure [2] based on the work developed in [4]: All concepts seen by the user at least N times during the window duration of the sketch are present in the user s list, and the size of the vectors w o is updated. After modifying the list and the w o vectors length, the weights of the vector are adjusted using a stochastic logistic regression strategy. Let r ui O the rating user u gives to item i and m ui = meta(c i C u ) R Cu a function that takes the concept set of an item and converts it into a binary vector where each coordinate is 1 if the user s concept belong to the items list (m ui [f] = 1 Cu[f] C i ). For each vector w o O, we predict σ( w o, m ui ) and update each of the vectors as in wu o wu o γ(σ( w o, m ui ) 1 rui=o)m ui, where σ(c) is the sigmoid function. The rating prediction under this model is calculated as in ˆr ui = o O σ( wo,m ui ) o o O σ( wo,m ui ) Feature Generation and evaluation We use DBPedia to retrieve book features as described in section 1. Using all the retrieved features the predictor performance was lower than expected and, as shown in Figure 2, if we increase the minimum inclusion rate, the performance declines. A quick evaluation of the features shows that some of them are highly correlated, which led us to consider that clusters of features may provide more information to the predictor. We created clusters of features by co-occurrence, using k-means with cosine distance, convergence delta of 0.01 and 200 iterations. Figure 3 depicts the dataset generation process. We vary the number of clusters (k) and measure the performance against the test set. In Figure 4a, we can see that the predictor performance using clusters is better than using all the extracted features. Although with 23 clusters we have a slightly better result against the test set, we use the 50 clusters because this had better performance using the evaluation tool. With these 50 clusters as features, each book has an average of 2.1 features with a std of Each feature appears on average in 344 books, with a std of When trained with these new features, the predictor improves its performance notably with a min inclusion rate of 2, as shown in Figure 4b. The best RMSE with the content-based predictor on the evaluation tool was
4 Fig. 2: RMSE vs inclusion rate for book features Fig. 3: Content Based Dataset Generation (a) All features vs different cluster s size (b) With 50 clusters changing the minimum inclusion rate Fig. 4: Content Based predictor performance using clusters 2.2 Collaborative Filtering model The SVD++ algorithm [3] prediction rule uses the global average of ratings (µ) and the bias or deviation from the mean for each user (b u ) and each item (b i ) as model parameters. In order to account for the user-item interaction the SVD++ model represents each user as a vector x u and each item as an vector y i R k. Each item is represented by an extra vector z i R k that is used by the prediction rule to represent the items the user has rated into her profile. Let R(u) the set of items the user u has rated, ( the prediction under the SVD++ model is given by ˆr ui = µ + b i + b u + yi T x u + R (u) 1 2 z j ). When an j R(u) item has not been seen by the system, the prediction rule only uses the sum of the global mean and the user bias. Parameters of the model are learned using a regularized stochastic gradient descent strategy. 3 Model validation To test the performance of the hybrid model, we generated 5 datasets with approximately 80% for training and the 20% for testing, each of these datasets had a different percentage of cold-start ratings varying from 5% to 25%. The model delegates the prediction to the CB model only when it has not seen the item before. Fig. 5 shows the RMSE of the hybrid model as the number of new items in the test set increases. The results show that the hybrid model outperforms CF for a low number of cold-start items.
5 Fig. 5: RMSE of the hybrid model vs SVD++ on cold-start 4 Conclusions We have described our approach to improve the performance of recommender systems using linked open-data that is freely available on the web. Open data provides descriptions of items that help the recommender system understand better why a user likes an item (a user may like a book because of its author, its literary genre, its main subject, etc). This approach can help alleviate the new item cold-start problem. However, users may like items based on subjective features such as tone which are not provided in the open-data repositories used. For this reason, we proposed an hybrid model based on rules that uses a pure collaborative approach when enough ratings are present, and uses a contentbased approach in the other cases. Our model had a RMSE of against the quiz set provided by the challenge. Open-data such as data from social-networks can also be used to describe users and calculate similarities of new users based on this data. This could further improve the performance of recommender systems under a new-user cold start problem. References 1. Burke, R.: Hybrid recommender systems: Survey and experiments. User Modeling and User-Adapted Interaction 12(4), (Nov 2002), 2. Dimitropoulos, X., Stoecklin, M., Hurley, P., Kind, A.: The eternal sunshine of the sketch data structure. Comput. Netw. 52(17), (Dec 2008), 3. Koren, Y.: Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. pp KDD 08, ACM, New York, NY, USA (2008), 4. McMahan, H.B., Holt, G., Sculley, D., Young, M., Ebner, D., Grady, J., Nie, L., Phillips, T., Davydov, E., Golovin, D., Chikkerur, S., Liu, D., Wattenberg, M., Hrafnkelsson, A.M., Boulos, T., Kubica, J.: Ad click prediction: A view from the trenches. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp ACM, New York, NY, USA (2013)
Collaborative Filtering. Radek Pelánek
Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains
A Logistic Regression Approach to Ad Click Prediction
A Logistic Regression Approach to Ad Click Prediction Gouthami Kondakindi [email protected] Satakshi Rana [email protected] Aswin Rajkumar [email protected] Sai Kaushik Ponnekanti [email protected] Vinit Parakh
IPTV Recommender Systems. Paolo Cremonesi
IPTV Recommender Systems Paolo Cremonesi Agenda 2 IPTV architecture Recommender algorithms Evaluation of different algorithms Multi-model systems Valentino Rossi 3 IPTV architecture 4 Live TV Set-top-box
Machine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
Advanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
Factorization Machines
Factorization Machines Steffen Rendle Department of Reasoning for Intelligence The Institute of Scientific and Industrial Research Osaka University, Japan [email protected] Abstract In this
Big Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Going For Large Scale Application Scenario: Recommender
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
QASM: a Q&A Social Media System Based on Social Semantics
QASM: a Q&A Social Media System Based on Social Semantics Zide Meng, Fabien Gandon, Catherine Faron-Zucker To cite this version: Zide Meng, Fabien Gandon, Catherine Faron-Zucker. QASM: a Q&A Social Media
The Need for Training in Big Data: Experiences and Case Studies
The Need for Training in Big Data: Experiences and Case Studies Guy Lebanon Amazon Background and Disclaimer All opinions are mine; other perspectives are legitimate. Based on my experience as a professor
Addressing Cold Start in Recommender Systems: A Semi-supervised Co-training Algorithm
Addressing Cold Start in Recommender Systems: A Semi-supervised Co-training Algorithm Mi Zhang,2 Jie Tang 3 Xuchen Zhang,2 Xiangyang Xue,2 School of Computer Science, Fudan University 2 Shanghai Key Laboratory
Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014
Parallel Data Mining Team 2 Flash Coders Team Research Investigation Presentation 2 Foundations of Parallel Computing Oct 2014 Agenda Overview of topic Analysis of research papers Software design Overview
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Cross-Domain Collaborative Recommendation in a Cold-Start Context: The Impact of User Profile Size on the Quality of Recommendation
Cross-Domain Collaborative Recommendation in a Cold-Start Context: The Impact of User Profile Size on the Quality of Recommendation Shaghayegh Sahebi and Peter Brusilovsky Intelligent Systems Program University
How To Cluster On A Search Engine
Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING
Bayesian Factorization Machines
Bayesian Factorization Machines Christoph Freudenthaler, Lars Schmidt-Thieme Information Systems & Machine Learning Lab University of Hildesheim 31141 Hildesheim {freudenthaler, schmidt-thieme}@ismll.de
How To Make A Credit Risk Model For A Bank Account
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző [email protected] 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
Logistic Regression for Spam Filtering
Logistic Regression for Spam Filtering Nikhila Arkalgud February 14, 28 Abstract The goal of the spam filtering problem is to identify an email as a spam or not spam. One of the classic techniques used
Standardization and Its Effects on K-Means Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03
Content-Based Recommendation
Content-Based Recommendation Content-based? Item descriptions to identify items that are of particular interest to the user Example Example Comparing with Noncontent based Items User-based CF Searches
How To Identify A Churner
2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE
STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LIX, Number 1, 2014 A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE DIANA HALIŢĂ AND DARIUS BUFNEA Abstract. Then
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
CS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
Scalable Machine Learning - or what to do with all that Big Data infrastructure
- or what to do with all that Big Data infrastructure TU Berlin blog.mikiobraun.de Strata+Hadoop World London, 2015 1 Complex Data Analysis at Scale Click-through prediction Personalized Spam Detection
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
User Behavior Analysis Based On Predictive Recommendation System for E-Learning Portal
Abstract ISSN: 2348 9510 User Behavior Analysis Based On Predictive Recommendation System for E-Learning Portal Toshi Sharma Department of CSE Truba College of Engineering & Technology Indore, India [email protected]
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
A Web Recommender System for Recommending, Predicting and Personalizing Music Playlists
A Web Recommender System for Recommending, Predicting and Personalizing Music Playlists Zeina Chedrawy 1, Syed Sibte Raza Abidi 1 1 Faculty of Computer Science, Dalhousie University, Halifax, Canada {chedrawy,
Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2
Advanced Engineering Forum Vols. 6-7 (2012) pp 82-87 Online: 2012-09-26 (2012) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/aef.6-7.82 Research on Clustering Analysis of Big Data
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
Utility of Distrust in Online Recommender Systems
Utility of in Online Recommender Systems Capstone Project Report Uma Nalluri Computing & Software Systems Institute of Technology Univ. of Washington, Tacoma [email protected] Committee: nkur Teredesai
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Rating Prediction with Informative Ensemble of Multi-Resolution Dynamic Models
JMLR: Workshop and Conference Proceedings 75 97 Rating Prediction with Informative Ensemble of Multi-Resolution Dynamic Models Zhao Zheng Hong Kong University of Science and Technology, Hong Kong Tianqi
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
Chapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
Robust Outlier Detection Technique in Data Mining: A Univariate Approach
Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,
Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
Steven C.H. Hoi. School of Computer Engineering Nanyang Technological University Singapore
Steven C.H. Hoi School of Computer Engineering Nanyang Technological University Singapore Acknowledgments: Peilin Zhao, Jialei Wang, Hao Xia, Jing Lu, Rong Jin, Pengcheng Wu, Dayong Wang, etc. 2 Agenda
Java Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
Factorization Machines
Factorization Machines Factorized Polynomial Regression Models Christoph Freudenthaler, Lars Schmidt-Thieme and Steffen Rendle 2 Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim,
SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL
SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL Krishna Kiran Kattamuri 1 and Rupa Chiramdasu 2 Department of Computer Science Engineering, VVIT, Guntur, India
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
Recommendation Tool Using Collaborative Filtering
Recommendation Tool Using Collaborative Filtering Aditya Mandhare 1, Soniya Nemade 2, M.Kiruthika 3 Student, Computer Engineering Department, FCRIT, Vashi, India 1 Student, Computer Engineering Department,
Designing a learning system
Lecture Designing a learning system Milos Hauskrecht [email protected] 539 Sennott Square, x4-8845 http://.cs.pitt.edu/~milos/courses/cs750/ Design of a learning system (first vie) Application or Testing
Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,
IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS
IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS V.Sudhakar 1 and G. Draksha 2 Abstract:- Collective behavior refers to the behaviors of individuals
Advances in Collaborative Filtering
Advances in Collaborative Filtering Yehuda Koren and Robert Bell Abstract The collaborative filtering (CF) approach to recommenders has recently enjoyed much interest and progress. The fact that it played
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
K-means Clustering Technique on Search Engine Dataset using Data Mining Tool
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 6 (2013), pp. 505-510 International Research Publications House http://www. irphouse.com /ijict.htm K-means
Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin
Data Mining for Customer Service Support Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Traditional Hotline Services Problem Traditional Customer Service Support (manufacturing)
A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries
A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries Aida Mustapha *1, Farhana M. Fadzil #2 * Faculty of Computer Science and Information Technology, Universiti Tun Hussein
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
Clustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
Big & Personal: data and models behind Netflix recommendations
Big & Personal: data and models behind Netflix recommendations Xavier Amatriain Netflix [email protected] ABSTRACT Since the Netflix $1 million Prize, announced in 2006, our company has been known to
arxiv:1506.04135v1 [cs.ir] 12 Jun 2015
Reducing offline evaluation bias of collaborative filtering algorithms Arnaud de Myttenaere 1,2, Boris Golden 1, Bénédicte Le Grand 3 & Fabrice Rossi 2 arxiv:1506.04135v1 [cs.ir] 12 Jun 2015 1 - Viadeo
How I won the Chess Ratings: Elo vs the rest of the world Competition
How I won the Chess Ratings: Elo vs the rest of the world Competition Yannis Sismanis November 2010 Abstract This article discusses in detail the rating system that won the kaggle competition Chess Ratings:
Data Mining in Web Search Engine Optimization and User Assisted Rank Results
Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management
Prediction Model for Crude Oil Price Using Artificial Neural Networks
Applied Mathematical Sciences, Vol. 8, 2014, no. 80, 3953-3965 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.43193 Prediction Model for Crude Oil Price Using Artificial Neural Networks
Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
Spark: Cluster Computing with Working Sets
Spark: Cluster Computing with Working Sets Outline Why? Mesos Resilient Distributed Dataset Spark & Scala Examples Uses Why? MapReduce deficiencies: Standard Dataflows are Acyclic Prevents Iterative Jobs
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca [email protected] Spain Manuel Martín-Merino Universidad
Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Impact of Feature Selection on the Performance of ireless Intrusion Detection Systems
Lecture #2. Algorithms for Big Data
Additional Topics: Big Data Lecture #2 Algorithms for Big Data Joseph Bonneau [email protected] April 30, 2012 Today's topic: algorithms Do we need new algorithms? Quantity is a quality of its own Joseph
PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA
PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA Prakash Singh 1, Aarohi Surya 2 1 Department of Finance, IIM Lucknow, Lucknow, India 2 Department of Computer Science, LNMIIT, Jaipur,
ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS
ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS Michael Affenzeller (a), Stephan M. Winkler (b), Stefan Forstenlechner (c), Gabriel Kronberger (d), Michael Kommenda (e), Stefan
Qi Liu Rutgers Business School ISACA New York 2013
Qi Liu Rutgers Business School ISACA New York 2013 1 What is Audit Analytics The use of data analysis technology in Auditing. Audit analytics is the process of identifying, gathering, validating, analyzing,
Ensembles and PMML in KNIME
Ensembles and PMML in KNIME Alexander Fillbrunn 1, Iris Adä 1, Thomas R. Gabriel 2 and Michael R. Berthold 1,2 1 Department of Computer and Information Science Universität Konstanz Konstanz, Germany [email protected]
Bisecting K-Means for Clustering Web Log data
Bisecting K-Means for Clustering Web Log data Ruchika R. Patil Department of Computer Technology YCCE Nagpur, India Amreen Khan Department of Computer Technology YCCE Nagpur, India ABSTRACT Web usage mining
Email Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
Query Recommendation employing Query Logs in Search Optimization
1917 Query Recommendation employing Query Logs in Search Optimization Neha Singh Department of Computer Science, Shri Siddhi Vinayak Group of Institutions, Bareilly Email: [email protected] Dr Manish
Recommending News Articles using Cosine Similarity Function Rajendra LVN 1, Qing Wang 2 and John Dilip Raj 1
Paper 1886-2014 Recommending News s using Cosine Similarity Function Rajendra LVN 1, Qing Wang 2 and John Dilip Raj 1 1 GE Capital Retail Finance, 2 Warwick Business School ABSTRACT Predicting news articles
Scalable Hands-Free Transfer Learning for Online Advertising
Scalable Hands-Free Transfer Learning for Online Advertising Brian Dalessandro, Daizhuo Chen Dstillery 470 Park Ave S. New York, NY 10016 briand,[email protected] Troy Raeder, Claudia Perlich, Melinda
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Server Load Prediction
Server Load Prediction Suthee Chaidaroon ([email protected]) Joon Yeong Kim ([email protected]) Jonghan Seo ([email protected]) Abstract Estimating server load average is one of the methods that
Ensemble Learning Better Predictions Through Diversity. Todd Holloway ETech 2008
Ensemble Learning Better Predictions Through Diversity Todd Holloway ETech 2008 Outline Building a classifier (a tutorial example) Neighbor method Major ideas and challenges in classification Ensembles
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
Environmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
Predict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
A QoS-Aware Web Service Selection Based on Clustering
International Journal of Scientific and Research Publications, Volume 4, Issue 2, February 2014 1 A QoS-Aware Web Service Selection Based on Clustering R.Karthiban PG scholar, Computer Science and Engineering,
Distributed forests for MapReduce-based machine learning
Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
Mining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMiner Petar Ristoski, Christian Bizer, and Heiko Paulheim University of Mannheim, Germany Data and Web Science Group {petar.ristoski,heiko,chris}@informatik.uni-mannheim.de
Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report
Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 69 Class Project Report Junhua Mao and Lunbo Xu University of California, Los Angeles [email protected] and lunbo
Lecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
Manjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India
Volume 5, Issue 6, June 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Multiple Pheromone
A Social Network-Based Recommender System (SNRS)
A Social Network-Based Recommender System (SNRS) Jianming He and Wesley W. Chu Computer Science Department University of California, Los Angeles, CA 90095 [email protected], [email protected] Abstract. Social
Parallel & Distributed Optimization. Based on Mark Schmidt s slides
Parallel & Distributed Optimization Based on Mark Schmidt s slides Motivation behind using parallel & Distributed optimization Performance Computational throughput have increased exponentially in linear
Automated Process for Generating Digitised Maps through GPS Data Compression
Automated Process for Generating Digitised Maps through GPS Data Compression Stewart Worrall and Eduardo Nebot University of Sydney, Australia {s.worrall, e.nebot}@acfr.usyd.edu.au Abstract This paper
