# A Logistic Regression Approach to Ad Click Prediction

Save this PDF as:

Size: px
Start display at page:

## Transcription

3 of having data because it can be easily input into any classification algorithm without any further modifications. 2.2 Classification Naive Bayes We started with a simple approach of Naive Bayes [5], to get a sense of how the data looks like. Naive Bayes, has an underlying assumption of independence between the features. This is a strong assumption and might not work for clicks data because certain features like location where ad is displayed, device on which user sees the ad are clearly related to each other. But, we wanted to implement Naive Bayes to see if our understanding about the data given to us is right. For this approach we assumed the entire data to be categorical and calculated the counts for all the unique values in each feature with respect to whether the ad is clicked or not. Once we had the counts, the probabilities for the test data were calculated as follows: P (Y = 1 X) = P (X Y = 1)P (Y = 1) P (X Y = 1) + P (X Y = 0) (1) As expected, Naive Bayes gave results which were bad. We got a log loss of However, this served as a good starting point Vowpal Wabbit Our second approach was to use the machine learning tool Vowpal Wabbit [1] from Yahoo/Microsoft Research. It is a fast and efficient online machine learning algorithm. It supports different loss functions and optimizations. We first converted the labels of the training data to {-1,1} and converted the comma-separated training file to vowpal-wabbit vw format. Except for stripping off the year, month and day information from the timestamp, the rest of the fields were left untouched. All the features were treated as categorical features. We then chose logarithmic loss function from Vowpal Wabbit. After training using our train data, we tested the model on our test data converted into vw format and obtained predictions. These predictions were sent through a sigmoid function to get the final probabilistic output, which we submitted and achieved a reasonable logloss score of Next, we did feature selection and ignored the fields which had too many unique values like device id and device ip. We trained Vowpal Wabbit with this new data and achieved a better logloss score of Next, we randomized the rows of the training data using the command line. By shuffling the data, we were able to lower the test error to This is due to the fact that Vowpal Wabbit is an online learning algorithm and since the training data was initially sorted according to time, the model was skewed. After randomization, Vowpal Wabbit was able to predict the outcome better. We tried various optimization parameters of Vowpal Wabbit like quadratic and cubic features, eliminated the constant feature and automatic feature selection. We also increased the bit precision from the default value of 18 using the -b option. We used Vowpal Wabbit utilities like vw varinfo and vw hypersearch to fine tune the model. We also randomized the initial weights and varied the learning rate and tested the output. Since clicks are of lower distribution in the data, we assigned higher weights for clicks than no clicks. But in spite of all these optimizations and experimentation, we were unable to cross the logloss barrier of that we achieved with Vowpal Wabbit. This gave us a useful insight into how a classifier can only increase accuracy upto a certain point beyond which no matter how hard we try to fine tune the hyperparameters, it will not give any significant increase in accuracy. Figure 2 shows the optimizations that we tried to do and the logloss value we achieved with Vowpal Wabbit. Logloss value does not decrease after a certain point. 3

4 Figure 2: Vowpal Wabbit Logloss Final Approach: Logistic Regression Logistic Regression states that Where P (y = 1 x; w) = σ(g(x)) (2) g(x) = d w d x d = w T x (3) 1 σ(a) = 1 + e a (4) We used L2 Regularized Logistic Regression [2]. The regularization parameter (λ) is estimated using cross validation. Figure 3 shows all the values of λ that we had experimented with and the corresponding values of Log Loss that we got. It can be seen that the value of has given us the least Log Loss and thus it is chosen as the optimum λ value for this data. It could be possible that this value of lambda is a local optimum and there might be some global optimum that could give us better results. But, this is the best value we could get by using greedy approach. Because the data is so huge, it is not possible to load the entire data into memory. So, batch gradient descent and newton method were not an option for us. Hence, we used Stochastic Gradient Descent approach for updating the weights. The learning rate η for the Stochastic Gradient Descent has also been experimented upon. We have tried a few values for the learning rate and chose the optimum value as We implemented the above approach by taking initial weights to be 0. We also tried initializing the weights with values other than 0. The reason why we tried to do this is, there might be some values in the features from test data which were never seen in the training data. Initializing all weights to 0 means that we are giving a 0 weight to these unseen instances. That might lead to classification error. But giving a small initial weight to features did not seem to improve the accuracy in anyway. So, we went ahead with initializing the weights to 0. 3 Results Figure 1 is a visualization of the clicks in training data. It can be seen that the number of training instances for class Click=0 are a lot more than the number of training instance for class Click=1. This is a useful insight into the distribution of data and the properties of clicks in real world. Click through rate is usually expected to be low In Figure 2, we plot a graph for different log-loss scores we achieved by adopting different approaches in Vowpal Wabbit. Figure 3 plots the different log-loss scores for different value of regularization parameter λ for logistic regression. We chose the value of λ that gave us the best results i.e In Table 1, we summarize the scores we achieved for the different techniques that we implemented in this project. It can be seen that logistic regression with proper data pre-processing gave us the best score and good ranking in the Kaggle competition. 4

5 Figure 3: L2 Regularisation Parameter Estimation Approach Score Naive Bayes Vowpal Wabbit 0.47 Vowpal Wabbit(Feature Selection) Vowpal Wabbit(Shuffled Data) Vowpal Wabbit(Optimization) Logistic Regression Logistic Regression (Feature Selection) Table 1: Different Approaches Implemented and Scores 4 Conclusion We investigated methods for data reduction using feature selection. We intelligently removed few fields from the data which improved our ranking in Kaggle competition by a decent range. It is evident that unnecessary features act as noise and degrade the performance of the system. Therefore, it is essential to remove these fields. We tried several supervised classification algorithms to see which one will work best for our dataset. Logistic regression with regularization gave us a logloss that took us to top 100 in Kaggle rankings. The main challenge that we faced in the project was to find ways to efficiently handle huge data with limited resources. The main advantage of using Logistic Regression with stochastic gradient was that we could do computation by reading the data one line at a time. We could not implement Support Vector Machines because the output that SVM gives is not a probability and Kaggle needs output in that form. So, we leave the investigation of converting SVM output into probability for future work. We could not implement Random Forest due to infrastructure limitations. We plan to try implementing it in the future. References [1] Vowpal Wabbit open tool. vw/. Accessed: [2] John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research, 12: , [3] Zhipeng Fang, Kun Yue, Jixian Zhang, Dehai Zhang, and Weiyi Liu. Predicting click-through rates of new advertisements based on the bayesian network. Mathematical Problems in Engineering, 2014, [4] Dejian Lai. Principal component analysis on human development indicators of china. Social Indicators Research, 61(3): , [5] David D Lewis. Naive (bayes) at forty: The independence assumption in information retrieval. In Machine learning: ECML-98, pages Springer,

6 [6] H Brendan McMahan, Gary Holt, David Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, et al. Ad click prediction: a view from the trenches. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages ACM, [7] Steffen Rendle. Scaling factorization machines to relational data. In Proceedings of the VLDB Endowment, volume 6, pages VLDB Endowment,

### CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

### Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network

Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network Anthony Lai (aslai), MK Li (lilemon), Foon Wang Pong (ppong) Abstract Algorithmic trading, high frequency trading (HFT)

### Hybrid model rating prediction with Linked Open Data for Recommender Systems

Hybrid model rating prediction with Linked Open Data for Recommender Systems Andrés Moreno 12 Christian Ariza-Porras 1, Paula Lago 1, Claudia Jiménez-Guarín 1, Harold Castro 1, and Michel Riveill 2 1 School

### Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre Pinto alexcp@mlsecproject.org @alexcpsec @MLSecProject

Defending Networks with Incomplete Information: A Machine Learning Approach Alexandre Pinto alexcp@mlsecproject.org @alexcpsec @MLSecProject Agenda Security Monitoring: We are doing it wrong Machine Learning

### A Lightweight Solution to the Educational Data Mining Challenge

A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn

### BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

### Invited Applications Paper

Invited Applications Paper - - Thore Graepel Joaquin Quiñonero Candela Thomas Borchert Ralf Herbrich Microsoft Research Ltd., 7 J J Thomson Avenue, Cambridge CB3 0FB, UK THOREG@MICROSOFT.COM JOAQUINC@MICROSOFT.COM

### 1 Maximum likelihood estimation

COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

Server Load Prediction Suthee Chaidaroon (unsuthee@stanford.edu) Joon Yeong Kim (kim64@stanford.edu) Jonghan Seo (jonghan@stanford.edu) Abstract Estimating server load average is one of the methods that

### Predict Influencers in the Social Network

Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

### An Introduction to Data Mining

An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

### Predicting borrowers chance of defaulting on credit loans

Predicting borrowers chance of defaulting on credit loans Junjie Liang (junjie87@stanford.edu) Abstract Credit score prediction is of great interests to banks as the outcome of the prediction algorithm

### Predict the Popularity of YouTube Videos Using Early View Data

000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

### Chapter 6. The stacking ensemble approach

82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

### Sibyl: a system for large scale machine learning

Sibyl: a system for large scale machine learning Tushar Chandra, Eugene Ie, Kenneth Goldman, Tomas Lloret Llinares, Jim McFadden, Fernando Pereira, Joshua Redstone, Tal Shaked, Yoram Singer Machine Learning

### Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

### Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

### Large Scale Learning to Rank

Large Scale Learning to Rank D. Sculley Google, Inc. dsculley@google.com Abstract Pairwise learning to rank methods such as RankSVM give good performance, but suffer from the computational burden of optimizing

### Vowpal Wabbit 7 Tutorial. Stephane Ross

Vowpal Wabbit 7 Tutorial Stephane Ross Binary Classification and Regression Input Format Data in text file (can be gziped), one example/line Label [weight] Namespace Feature... Feature Namespace... Namespace:

### Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

### Automatic Web Page Classification

Automatic Web Page Classification Yasser Ganjisaffar 84802416 yganjisa@uci.edu 1 Introduction To facilitate user browsing of Web, some websites such as Yahoo! (http://dir.yahoo.com) and Open Directory

### CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

### Logistic Regression for Spam Filtering

Logistic Regression for Spam Filtering Nikhila Arkalgud February 14, 28 Abstract The goal of the spam filtering problem is to identify an email as a spam or not spam. One of the classic techniques used

### Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

### Document Image Retrieval using Signatures as Queries

Document Image Retrieval using Signatures as Queries Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Harish Srinivasan, Chen Huang CEDAR, University at Buffalo(SUNY) Amherst, New York 14228 Gady Agam and

### T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or

### Data Mining III: Numeric Estimation

Data Mining III: Numeric Estimation Computer Science 105 Boston University David G. Sullivan, Ph.D. Review: Numeric Estimation Numeric estimation is like classification learning. it involves learning a

### Machine Learning Final Project Spam Email Filtering

Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE

### WEB PAGE CATEGORISATION BASED ON NEURONS

WEB PAGE CATEGORISATION BASED ON NEURONS Shikha Batra Abstract: Contemporary web is comprised of trillions of pages and everyday tremendous amount of requests are made to put more web pages on the WWW.

### Machine Learning Logistic Regression

Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.

### Fast Analytics on Big Data with H20

Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,

### General Naïve Bayes. CS 188: Artificial Intelligence Fall Example: Spam Filtering. Example: OCR. Generalization and Overfitting

CS 88: Artificial Intelligence Fall 7 Lecture : Perceptrons //7 General Naïve Bayes A general naive Bayes model: C x E n parameters C parameters n x E x C parameters E E C E n Dan Klein UC Berkeley We

### Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

### Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

### Predicting Flight Delays

Predicting Flight Delays Dieterich Lawson jdlawson@stanford.edu William Castillo will.castillo@stanford.edu Introduction Every year approximately 20% of airline flights are delayed or cancelled, costing

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### lop Building Machine Learning Systems with Python en source

Building Machine Learning Systems with Python Master the art of machine learning with Python and build effective machine learning systems with this intensive handson guide Willi Richert Luis Pedro Coelho

### Maximize Revenues on your Customer Loyalty Program using Predictive Analytics

Maximize Revenues on your Customer Loyalty Program using Predictive Analytics 27 th Feb 14 Free Webinar by Before we begin... www Q & A? Your Speakers @parikh_shachi Technical Analyst @tatvic Loves js

### E-commerce Transaction Anomaly Classification

E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce

### Application of Machine Learning to Link Prediction

Application of Machine Learning to Link Prediction Kyle Julian (kjulian3), Wayne Lu (waynelu) December 6, 6 Introduction Real-world networks evolve over time as new nodes and links are added. Link prediction

### BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an

### MHI3000 Big Data Analytics for Health Care Final Project Report

MHI3000 Big Data Analytics for Health Care Final Project Report Zhongtian Fred Qiu (1002274530) http://gallery.azureml.net/details/81ddb2ab137046d4925584b5095ec7aa 1. Data pre-processing The data given

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

### Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

### Machine learning for algo trading

Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with

### Introduction to Machine Learning Using Python. Vikram Kamath

Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

### Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

### Predicting outcome of soccer matches using machine learning

Saint-Petersburg State University Mathematics and Mechanics Faculty Albina Yezus Predicting outcome of soccer matches using machine learning Term paper Scientific adviser: Alexander Igoshkin, Yandex Mobile

### Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

### A crash course in probability and Naïve Bayes classification

Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s

### Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014

Parallel Data Mining Team 2 Flash Coders Team Research Investigation Presentation 2 Foundations of Parallel Computing Oct 2014 Agenda Overview of topic Analysis of research papers Software design Overview

### Football Match Winner Prediction

Football Match Winner Prediction Kushal Gevaria 1, Harshal Sanghavi 2, Saurabh Vaidya 3, Prof. Khushali Deulkar 4 Department of Computer Engineering, Dwarkadas J. Sanghvi College of Engineering, Mumbai,

### MS1b Statistical Data Mining

MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

### Introduction to Machine Learning

Introduction to Machine Learning Prof. Alexander Ihler Prof. Max Welling icamp Tutorial July 22 What is machine learning? The ability of a machine to improve its performance based on previous results:

### The Scientific Data Mining Process

Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

### INTERNATIONAL JOURNAL OF ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY An International online open access peer reviewed journal

INTERNATIONAL JOURNAL OF ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY An International online open access peer reviewed journal Research Article ISSN 2277 9140 ABSTRACT Web page categorization based

### Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

### Financial Trading System using Combination of Textual and Numerical Data

Financial Trading System using Combination of Textual and Numerical Data Shital N. Dange Computer Science Department, Walchand Institute of Rajesh V. Argiddi Assistant Prof. Computer Science Department,

### Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

### Microsoft Azure Machine learning Algorithms

Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql Tomaz.kastrun@gmail.com http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation

ADHAWK WORKS ADVERTISING ANALTICS ON A DASHBOARD Mrs. Vijayalaxmi M. 1, Anagha Kelkar 2, Neha Puthran 2, Sailee Devne 2 Vice Principal 1, B.E. Students 2, Department of Information Technology V.E.S Institute

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

### DATA ANALYTICS USING R

DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data

### Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n

Principles of Data Mining Pham Tho Hoan hoanpt@hnue.edu.vn References [1] David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT press, 2002 [2] Jiawei Han and Micheline Kamber,

### Personalized Expedia Hotel Searches

Personalized Expedia Hotel Searches Xinxing Jiang, Yao Xiao, and Shunji Li Stanford University December 13, 2013 Abstract In this paper, we propose machine learning algorithms with search data of Expedia

### A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center

### A survey on click modeling in web search

A survey on click modeling in web search Lianghao Li Hong Kong University of Science and Technology Outline 1 An overview of web search marketing 2 An overview of click modeling 3 A survey on click models

### Face Recognition using SIFT Features

Face Recognition using SIFT Features Mohamed Aly CNS186 Term Project Winter 2006 Abstract Face recognition has many important practical applications, like surveillance and access control.

### Scoring the Data Using Association Rules

Scoring the Data Using Association Rules Bing Liu, Yiming Ma, and Ching Kian Wong School of Computing National University of Singapore 3 Science Drive 2, Singapore 117543 {liub, maym, wongck}@comp.nus.edu.sg

10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

### Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model

AI TERM PROJECT GROUP 14 1 Anti-Spam Filter Based on,, and model Yun-Nung Chen, Che-An Lu, Chao-Yu Huang Abstract spam email filters are a well-known and powerful type of filters. We construct different

### Using News Articles to Predict Stock Price Movements

Using News Articles to Predict Stock Price Movements Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 9237 gyozo@cs.ucsd.edu 21, June 15,

### Data, Measurements, Features

Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

### CS229 Titanic Machine Learning From Disaster

CS229 Titanic Machine Learning From Disaster Eric Lam Stanford University Chongxuan Tang Stanford University Abstract In this project, we see how we can use machine-learning techniques to predict survivors

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### Employer Health Insurance Premium Prediction Elliott Lui

Employer Health Insurance Premium Prediction Elliott Lui 1 Introduction The US spends 15.2% of its GDP on health care, more than any other country, and the cost of health insurance is rising faster than

### Web Document Clustering

Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

### STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT

### How can we discover stocks that will

Algorithmic Trading Strategy Based On Massive Data Mining Haoming Li, Zhijun Yang and Tianlun Li Stanford University Abstract We believe that there is useful information hiding behind the noisy and massive

### Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

### CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

Optimizing Display Advertisements Based on Historic User Trails Neha Gupta, Udayan Sandeep Nawathe Khurana, Tak Yeon Lee Tumri Inc. Department of Computer San Mateo, CA Science snawathe@tumri.com University

### PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

### Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a

### Final Project Report

CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

### Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

### Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product

Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Sagarika Prusty Web Data Mining (ECT 584),Spring 2013 DePaul University,Chicago sagarikaprusty@gmail.com Keywords:

### Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing

CS 188: Artificial Intelligence Lecture 20: Dynamic Bayes Nets, Naïve Bayes Pieter Abbeel UC Berkeley Slides adapted from Dan Klein. Part III: Machine Learning Up until now: how to reason in a model and

Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains

### Programming Exercise 3: Multi-class Classification and Neural Networks

Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks

### Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

### BizPro: Extracting and Categorizing Business Intelligence Factors from News

BizPro: Extracting and Categorizing Business Intelligence Factors from News Wingyan Chung, Ph.D. Institute for Simulation and Training wchung@ucf.edu Definitions and Research Highlights BI Factor: qualitative

### Research of Postal Data mining system based on big data

3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Research of Postal Data mining system based on big data Xia Hu 1, Yanfeng Jin 1, Fan Wang 1 1 Shi Jiazhuang Post & Telecommunication

### Lecture 8 February 4

ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt

### Supervised Feature Selection & Unsupervised Dimensionality Reduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or