Factor Models for Gender Prediction Based on E-commerce Data

Size: px
Start display at page:

Download "Factor Models for Gender Prediction Based on E-commerce Data"

Transcription

1 Factor Models for Gender Prediction Based on E-commerce Data Data Mining Competition PAKDD 2015, HoChiMinh City, Vietnam

2 Outline Hierarchical Basket Model Modeling Autocorrelation Sequential Block Voting Results & Implementation

3 Outline Hierarchical Basket Model Tree Encoding Factorization Machine Modeling Autocorrelation Sequential Block Voting Results & Implementation

4 Product Hierarchy u1, , , A01/B01/C01/D01/ u2, , , A02/B02/C02/D02/;A02/B02/C03/D03/; u3, , , A01/B01/C01/D02/;A01/B04/C05/D98/; A01 B01 B04 C01 C02 C05 D01 D02 D06 D22 D45 D98 D21 D89 D15

5 Path Encoding u3, , , A01/B01/C01/D02/;A01/B04/C05/D98/; A01 B01 B04 C01 C02 C05 D01 D02 D06 D22 D45 D98 D21 D89 D15 x i = {2, 0,, 1, 0, 0, 1, 0, 1,, 1, 0, } } {{ } } {{ } } {{ } A B D

6 Factorization Machine FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + x j x j v j, v j j=1 j=1 j =j+1 w 0 R, w R p, V R p k are the model parameters k N is the size/ dimensionality of the latent space the model has one feature vector v i for each variable x i [Rendle, TIST 2012]

7 Factorization Machine FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + x j x j v j, v j j=1 j=1 j =j+1 w 0 R, w R p, V R p k are the model parameters k N is the size/ dimensionality of the latent space the model has one feature vector v i for each variable x i [Rendle, TIST 2012]

8 Factorization Machine FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + x j x j v j, v j j=1 j=1 j =j+1 w 0 R, w R p, V R p k are the model parameters k N is the size/ dimensionality of the latent space the model has one feature vector v i for each variable x i [Rendle, TIST 2012]

9 Factorization Machine FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + x j x j v j, v j j=1 j=1 j =j+1 w 0 R, w R p, V R p k are the model parameters k N is the size/ dimensionality of the latent space the model has one feature vector v i for each variable x i [Rendle, TIST 2012]

10 Linear Part a FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + j=1 x j x j v j, v j j=1 j =j+1 A02 x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 0) } {{ } } {{ } } {{ } A B D B11 D55 p(female x i ) p(female A02) + p(female B11) + p(female D55)

11 Linear Part a FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + j=1 x j x j v j, v j j=1 j =j+1 A02 x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 0) } {{ } } {{ } } {{ } A B D B11 D55 p(female x i ) p(female A02) + p(female B11) + p(female D55)

12 Linear Part a FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + j=1 x j x j v j, v j j=1 j =j+1 A02 x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 0) } {{ } } {{ } } {{ } A B D B11 D55 p(female x i ) p(female A02) + p(female B11) + p(female D55)

13 Pairwise Interactions a FM model of order d = 2 ŷ FM (x) := w 0 + A02 w j x j + j=1 B11 j=1 j =j+1 x j x j v j, v j x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 1,, 0) } {{ } } {{ } } {{ } A B D Example: V =, Summer } Swimming {{ } j=d55,, Summer D55 } Swimming {{ } j =D95 D95,, V R p k

14 Pairwise Interactions a FM model of order d = 2 ŷ FM (x) := w 0 + A02 w j x j + j=1 B11 j=1 j =j+1 x j x j v j, v j x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 1,, 0) } {{ } } {{ } } {{ } A B D Example: V =, Summer } Swimming {{ } j=d55,, Summer D55 } Swimming {{ } j =D95 D95,, V R p k

15 Pairwise Interactions a FM model of order d = 2 ŷ FM (x) := w 0 + A02 w j x j + j=1 B11 j=1 j =j+1 x j x j v j, v j x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 1,, 0) } {{ } } {{ } } {{ } A B D Example: V =, Summer } Swimming {{ } j=d55,, Summer D55 } Swimming {{ } j =D95 D95,, V R p k

16 Pairwise Interactions a FM model of order d = 2 ŷ FM (x) := w 0 + A02 w j x j + j=1 B11 j=1 j =j+1 x j x j v j, v j x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 1,, 0) } {{ } } {{ } } {{ } A B D Example: V =, Summer } Swimming {{ } j=d55,, Summer D55 } Swimming {{ } j =D95 D95,, V R p k

17 Outline Hierarchical Basket Model Modeling Autocorrelation Sequential Block Voting Results & Implementation

18 Factoring Joint Probabilities 100 Autocorrelation Lag We can factorize the joint probability by conditioning on features that describe the related samples n p(y 0,, y n x 0,, x n ) := p(y i xi r, x i ) 0

19 Relational Features u3, , , A01/B01/C05/D11/ u4, , , A02/B01/C01/D02/;A05/B04/C05/D98/; u5, , , A05/B04/C05/D98/; u6, , , A04/B03/C06/D22/;A05/B14/C45/D68/; u7, , , A01/B01/C01/D03/;A01/B04/C05/D78/; A2 A4 A5 x a1 = [0, 1, 0, 1, 2, ] A1 A2 A4 A5 x a1:2 = [ 3, 1, 0, 1, 2, ] Combining different lags and categories we can describe the sample neighborhood with: x u5 = [x a1, x a1:2, x b1:3, x d1 ]

20 Relational Features u3, , , A01/B01/C05/D11/ u4, , , A02/B01/C01/D02/;A05/B04/C05/D98/; u5, , , A05/B04/C05/D98/; u6, , , A04/B03/C06/D22/;A05/B14/C45/D68/; u7, , , A01/B01/C01/D03/;A01/B04/C05/D78/; A2 A4 A5 x a1 = [0, 1, 0, 1, 2, ] A1 A2 A4 A5 x a1:2 = [ 3, 1, 0, 1, 2, ] Combining different lags and categories we can describe the sample neighborhood with: x u5 = [x a1, x a1:2, x b1:3, x d1 ]

21 Relational Features u3, , , A01/B01/C05/D11/ u4, , , A02/B01/C01/D02/;A05/B04/C05/D98/; u5, , , A05/B04/C05/D98/; u6, , , A04/B03/C06/D22/;A05/B14/C45/D68/; u7, , , A01/B01/C01/D03/;A01/B04/C05/D78/; A2 A4 A5 x a1 = [0, 1, 0, 1, 2, ] A1 A2 A4 A5 x a1:2 = [ 3, 1, 0, 1, 2, ] Combining different lags and categories we can describe the sample neighborhood with: x u5 = [x a1, x a1:2, x b1:3, x d1 ]

22 Outline Hierarchical Basket Model Modeling Autocorrelation Sequential Block Voting Results & Implementation

23 Identifying Sequential Blocks u1, , , A01/B01/C01/D01/ u2, , , A02/B02/C02/D02/;A02/B02/C03/D03/; u3, , , A02/B02/C02/D02/;A02/B02/C03/D04/; 1: blockid[:] 0 2: count 0 3: for i 1, n do 4: if endtime(i) endtime(i-1) then 5: count ++ 6: end if 7: blockid[i] count 8: end for

24 # wrong labels in block block size

25 Block based Voting 1: if blocksize(i) 10 AND (median(i) 6 OR median(i) 9) then 2: if median(i) 9 then 3: predict female 4: else if median(i) 6 then 5: predict male 6: end if 7: else per sample threshold 8: if y i 82 then 9: predict female 10: else 11: predict male 12: end if 13: end if

26 Outline Hierarchical Basket Model Modeling Autocorrelation Sequential Block Voting Results & Implementation

27 Results & Implementation Score Place Final Result Full Competition Source Code: Factorization Machine Implementation:

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

Role of Social Networking in Marketing using Data Mining

Role of Social Networking in Marketing using Data Mining Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015 RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering

More information

MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

More information

Data Mining Techniques in CRM

Data Mining Techniques in CRM Data Mining Techniques in CRM Inside Customer Segmentation Konstantinos Tsiptsis CRM 6- Customer Intelligence Expert, Athens, Greece Antonios Chorianopoulos Data Mining Expert, Athens, Greece WILEY A John

More information

A Basic Guide to Modeling Techniques for All Direct Marketing Challenges

A Basic Guide to Modeling Techniques for All Direct Marketing Challenges A Basic Guide to Modeling Techniques for All Direct Marketing Challenges Allison Cornia Database Marketing Manager Microsoft Corporation C. Olivia Rud Executive Vice President Data Square, LLC Overview

More information

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING Practical Applications of DATA MINING Sang C Suh Texas A&M University Commerce r 3 JONES & BARTLETT LEARNING Contents Preface xi Foreword by Murat M.Tanik xvii Foreword by John Kocur xix Chapter 1 Introduction

More information

1. Overall, how satisfied are you working for The Company? Extremely Dissatisfied. Very Dissatisfied. Somewhat Dissatisfied.

1. Overall, how satisfied are you working for The Company? Extremely Dissatisfied. Very Dissatisfied. Somewhat Dissatisfied. Gathering information on employee satisfaction, this survey focuses on how employees feel about their job description, position within the company, relationships with colleagues and superiors, advancement

More information

Data Mining. Dr. Saed Sayad. University of Toronto 2010 [email protected]. http://chem-eng.utoronto.ca/~datamining/

Data Mining. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/ Data Mining Dr. Saed Sayad University of Toronto 2010 [email protected] http://chem-eng.utoronto.ca/~datamining/ 1 Data Mining Data mining is about explaining the past and predicting the future by

More information

Enrollment Data Undergraduate Programs by Race/ethnicity and Gender (Fall 2008) Summary Data Undergraduate Programs by Race/ethnicity

Enrollment Data Undergraduate Programs by Race/ethnicity and Gender (Fall 2008) Summary Data Undergraduate Programs by Race/ethnicity Enrollment Data Undergraduate Programs by Race/ethnicity and Gender (Fall 8) Summary Data Undergraduate Programs by Race/ethnicity The following tables and figures depict 8, 7, and 6 enrollment data for

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

MS1b Statistical Data Mining

MS1b Statistical Data Mining MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

More information

Blood Type Probability O 0.42 A 0.43 B 0.11 AB 0.04

Blood Type Probability O 0.42 A 0.43 B 0.11 AB 0.04 PubH 6414 Worksheet 5a: Probability Principles 1 of 6 Example 1: Distribution of Blood Types in the US Blood Type Probability O 0.42 A 0.43 B 0.11 AB 0.04 Use the distribution of blood types in the US

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

Principles of Dat Da a t Mining Pham Tho Hoan [email protected] [email protected]. n

Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n Principles of Data Mining Pham Tho Hoan [email protected] References [1] David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT press, 2002 [2] Jiawei Han and Micheline Kamber,

More information

Data Mining is the process of knowledge discovery involving finding

Data Mining is the process of knowledge discovery involving finding using analytic services data mining framework for classification predicting the enrollment of students at a university a case study Data Mining is the process of knowledge discovery involving finding hidden

More information

Web 3.0 image search: a World First

Web 3.0 image search: a World First Web 3.0 image search: a World First The digital age has provided a virtually free worldwide digital distribution infrastructure through the internet. Many areas of commerce, government and academia have

More information

EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA

EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA Andreas Christmann Department of Mathematics homepages.vub.ac.be/ achristm Talk: ULB, Sciences Actuarielles, 17/NOV/2006 Contents 1. Project: Motor vehicle

More information

KPIs and Scorecards using OBIEE 11g Mark Rittman, Rittman Mead Consulting Collaborate 11, Orlando, Florida, April 2011

KPIs and Scorecards using OBIEE 11g Mark Rittman, Rittman Mead Consulting Collaborate 11, Orlando, Florida, April 2011 KPIs and Scorecards using OBIEE 11g Mark Rittman, Rittman Mead Consulting Collaborate 11, Orlando, Florida, April 2011 A key new feature within Oracle Business Intelligence 11g is a new product called

More information

Data Mining Analytics for Business Intelligence and Decision Support

Data Mining Analytics for Business Intelligence and Decision Support Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing

More information

The Economist/YouGov Poll

The Economist/YouGov Poll Interviewing: Sample: 2000 Adults nationwide online 1345 registered voters nationwide online Weekly Tracking For immediate release 2 1. Approval of Obama as President Historical Do you approve or disapprove

More information

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in

More information

WHAT IS A SITE MAP. Types of Site Maps. vertical. horizontal. A site map (or sitemap) is a

WHAT IS A SITE MAP. Types of Site Maps. vertical. horizontal. A site map (or sitemap) is a WHAT IS A SITE MAP A site map (or sitemap) is a list of pages of a web site accessible to crawlers or users. It can be either a document in any form used as a planning tool for Web design, or a Web page

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati [email protected], [email protected]

More information

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

More information

CSci 538 Articial Intelligence (Machine Learning and Data Analysis)

CSci 538 Articial Intelligence (Machine Learning and Data Analysis) CSci 538 Articial Intelligence (Machine Learning and Data Analysis) Course Syllabus Fall 2015 Instructor Derek Harter, Ph.D., Associate Professor Department of Computer Science Texas A&M University - Commerce

More information

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for

More information

Class 10. Data Mining and Artificial Intelligence. Data Mining. We are in the 21 st century So where are the robots?

Class 10. Data Mining and Artificial Intelligence. Data Mining. We are in the 21 st century So where are the robots? Class 1 Data Mining Data Mining and Artificial Intelligence We are in the 21 st century So where are the robots? Data mining is the one really successful application of artificial intelligence technology.

More information

Analytics on Big Data

Analytics on Big Data Analytics on Big Data Riccardo Torlone Università Roma Tre Credits: Mohamed Eltabakh (WPI) Analytics The discovery and communication of meaningful patterns in data (Wikipedia) It relies on data analysis

More information

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

More information

TURKISH ORACLE USER GROUP

TURKISH ORACLE USER GROUP TURKISH ORACLE USER GROUP Data Mining in 30 Minutes Husnu Sensoy Global Maksimum Data & Information Tech. Founder VLDB Expert Agenda Who am I? Different problems of Data Mining In database data mining?!?

More information

Data Mining + Business Intelligence. Integration, Design and Implementation

Data Mining + Business Intelligence. Integration, Design and Implementation Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution

More information

Factorization Machines

Factorization Machines Factorization Machines Steffen Rendle Department of Reasoning for Intelligence The Institute of Scientific and Industrial Research Osaka University, Japan [email protected] Abstract In this

More information

Machine Learning Logistic Regression

Machine Learning Logistic Regression Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.

More information

Data Preprocessing. Week 2

Data Preprocessing. Week 2 Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

More information

Predicting borrowers chance of defaulting on credit loans

Predicting borrowers chance of defaulting on credit loans Predicting borrowers chance of defaulting on credit loans Junjie Liang ([email protected]) Abstract Credit score prediction is of great interests to banks as the outcome of the prediction algorithm

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.7 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Linear Regression Other Regression Models References Introduction Introduction Numerical prediction is

More information

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations

More information

Neural Network Add-in

Neural Network Add-in Neural Network Add-in Version 1.5 Software User s Guide Contents Overview... 2 Getting Started... 2 Working with Datasets... 2 Open a Dataset... 3 Save a Dataset... 3 Data Pre-processing... 3 Lagging...

More information

Data Mining: An Overview. David Madigan http://www.stat.columbia.edu/~madigan

Data Mining: An Overview. David Madigan http://www.stat.columbia.edu/~madigan Data Mining: An Overview David Madigan http://www.stat.columbia.edu/~madigan Overview Brief Introduction to Data Mining Data Mining Algorithms Specific Eamples Algorithms: Disease Clusters Algorithms:

More information

Review of Modern Techniques of Qualitative Data Clustering

Review of Modern Techniques of Qualitative Data Clustering Review of Modern Techniques of Qualitative Data Clustering Sergey Cherevko and Andrey Malikov The North Caucasus Federal University, Institute of Information Technology and Telecommunications [email protected],

More information

Data Warehousing und Data Mining

Data Warehousing und Data Mining Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing Grid-Files Kd-trees Ulf Leser: Data

More information

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Enhanced Boosted Trees Technique for Customer Churn Prediction Model IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction

More information

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

More information

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

More information

The Predictive Data Mining Revolution in Scorecards:

The Predictive Data Mining Revolution in Scorecards: January 13, 2013 StatSoft White Paper The Predictive Data Mining Revolution in Scorecards: Accurate Risk Scoring via Ensemble Models Summary Predictive modeling methods, based on machine learning algorithms

More information

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs 1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be

More information

Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm

Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi et al Int. Journal of Engineering Research and Applications RESEARCH ARTICLE OPEN ACCESS Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi,*

More information

Bayesian Factorization Machines

Bayesian Factorization Machines Bayesian Factorization Machines Christoph Freudenthaler, Lars Schmidt-Thieme Information Systems & Machine Learning Lab University of Hildesheim 31141 Hildesheim {freudenthaler, schmidt-thieme}@ismll.de

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

Joseph Twagilimana, University of Louisville, Louisville, KY

Joseph Twagilimana, University of Louisville, Louisville, KY ST14 Comparing Time series, Generalized Linear Models and Artificial Neural Network Models for Transactional Data analysis Joseph Twagilimana, University of Louisville, Louisville, KY ABSTRACT The aim

More information

Best Practices in Data Visualizations. Vihao Pham January 29, 2014

Best Practices in Data Visualizations. Vihao Pham January 29, 2014 Best Practices in Data Visualizations Vihao Pham January 29, 2014 Agenda Best Practices in Data Visualizations Why We Visualize Understanding Data Visualizations Enhancing Visualizations Visualization

More information

Best Practices in Data Visualizations. Vihao Pham 2014

Best Practices in Data Visualizations. Vihao Pham 2014 Best Practices in Data Visualizations Vihao Pham 2014 Agenda Best Practices in Data Visualizations Why We Visualize Understanding Data Visualizations Enhancing Visualizations Visualization Considerations

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

Anomaly Detection and Predictive Maintenance

Anomaly Detection and Predictive Maintenance Anomaly Detection and Predictive Maintenance Rosaria Silipo Iris Adae Christian Dietz Phil Winters [email protected] [email protected] [email protected] [email protected]

More information

CSC 411: Lecture 07: Multiclass Classification

CSC 411: Lecture 07: Multiclass Classification CSC 411: Lecture 07: Multiclass Classification Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 1, 2016 Urtasun, Zemel, Fidler (UofT) CSC 411: 07-Multiclass

More information

Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

More information

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures ~ Spring~r Table of Contents 1. Introduction.. 1 1.1. What is the World Wide Web? 1 1.2. ABrief History of the Web

More information

Automatic Network Protocol Analysis

Automatic Network Protocol Analysis Gilbert Wondracek, Paolo M ilani C omparetti, C hristopher Kruegel and E ngin Kirda {gilbert,pmilani}@ seclab.tuwien.ac.at chris@ cs.ucsb.edu engin.kirda@ eurecom.fr Reverse Engineering Network Protocols

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Sanjeev Kumar. contribute

Sanjeev Kumar. contribute RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 [email protected] 1. Introduction The field of data mining and knowledgee discovery is emerging as a

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

More information

A Survey on Pre-processing and Post-processing Techniques in Data Mining

A Survey on Pre-processing and Post-processing Techniques in Data Mining , pp. 99-128 http://dx.doi.org/10.14257/ijdta.2014.7.4.09 A Survey on Pre-processing and Post-processing Techniques in Data Mining Divya Tomar and Sonali Agarwal Indian Institute of Information Technology,

More information

DTREG. Predictive Modeling Software. Phillip H. Sherrod. Copyright 2003-2014 All rights reserved. www.dtreg.com

DTREG. Predictive Modeling Software. Phillip H. Sherrod. Copyright 2003-2014 All rights reserved. www.dtreg.com DTREG Predictive Modeling Software Phillip H. Sherrod Copyright 2003-2014 All rights reserved www.dtreg.com DTREG (pronounced D-T-Reg) builds classification and regression decision trees, neural networks,

More information

Adaptive Anomaly Detection for Network Security

Adaptive Anomaly Detection for Network Security International Journal of Computer and Internet Security. ISSN 0974-2247 Volume 5, Number 1 (2013), pp. 1-9 International Research Publication House http://www.irphouse.com Adaptive Anomaly Detection for

More information

Online Ensembles for Financial Trading

Online Ensembles for Financial Trading Online Ensembles for Financial Trading Jorge Barbosa 1 and Luis Torgo 2 1 MADSAD/FEP, University of Porto, R. Dr. Roberto Frias, 4200-464 Porto, Portugal [email protected] 2 LIACC-FEP, University of

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht [email protected] 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht [email protected] 539 Sennott

More information

SOLiD System accuracy with the Exact Call Chemistry module

SOLiD System accuracy with the Exact Call Chemistry module WHITE PPER 55 Series SOLiD System SOLiD System accuracy with the Exact all hemistry module ONTENTS Principles of Exact all hemistry Introduction Encoding of base sequences with Exact all hemistry Demonstration

More information

Public Information for ACBSP Accredited Programs at Florida State College at Jacksonville

Public Information for ACBSP Accredited Programs at Florida State College at Jacksonville Public Information for ACBSP Accredited Programs at Florida State College at Jacksonville Accreditation Council for Business Schools and Programs (ACBSP) As part of our ACBSP accreditation, we are required

More information

EXTENDED CENTROID BASED CLUSTERING TECHNIQUE FOR ONLINE SHOPPING FRAUD DETECTION

EXTENDED CENTROID BASED CLUSTERING TECHNIQUE FOR ONLINE SHOPPING FRAUD DETECTION EXTENDED CENTROID BASED CLUSTERING TECHNIQUE FOR ONLINE SHOPPING FRAUD DETECTION Priya J Rana 1, Jwalant Baria 2 1 ME IT, Department of IT, Parul institute of engineering & Technology, Gujarat, India 2

More information

Neural Networks Lesson 5 - Cluster Analysis

Neural Networks Lesson 5 - Cluster Analysis Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm [email protected] Rome, 29

More information

Heritage Provider Network Health Prize Round 3 Milestone: Team crescendo s Solution

Heritage Provider Network Health Prize Round 3 Milestone: Team crescendo s Solution Heritage Provider Network Health Prize Round 3 Milestone: Team crescendo s Solution Rie Johnson Tong Zhang 1 Introduction This document describes our entry nominated for the second prize of the Heritage

More information

Appendix K: Responses to Selected Survey Results by Gender

Appendix K: Responses to Selected Survey Results by Gender Appendix K: Responses to Selected Survey Results by Gender Page 2 Citywide Customer Survey Results Tables Table 1: Index Scores by Gender of Respondent...2 Table 2: Quality of Life by Gender of Respondent...2

More information

Qn: # Mark Score 1 20 2 20 3 20 4 20 5 20 Total 100

Qn: # Mark Score 1 20 2 20 3 20 4 20 5 20 Total 100 DEPARTMENT OF MATHEMATICS University of Toronto at Mississauga MAT 33Y, Test October 20, 2003 Time 6.0pm.8.00 pm Fill in the following information in INK! Last Name:. Given Name:. Student #:. Tutor s Name

More information

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM KATE GLEASON COLLEGE OF ENGINEERING John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE (KGCOE- CQAS- 747- Principles of

More information

5. Correlation. Open HeightWeight.sav. Take a moment to review the data file.

5. Correlation. Open HeightWeight.sav. Take a moment to review the data file. 5. Correlation Objectives Calculate correlations Calculate correlations for subgroups using split file Create scatterplots with lines of best fit for subgroups and multiple correlations Correlation The

More information

The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA

The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA Paper 156-2010 The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA Abstract JMP has a rich set of visual displays that can help you see the information

More information

Big Data and Marketing

Big Data and Marketing Big Data and Marketing Professor Venky Shankar Coleman Chair in Marketing Director, Center for Retailing Studies Mays Business School Texas A&M University http://www.venkyshankar.com [email protected]

More information

A New Approach for Evaluation of Data Mining Techniques

A New Approach for Evaluation of Data Mining Techniques 181 A New Approach for Evaluation of Data Mining s Moawia Elfaki Yahia 1, Murtada El-mukashfi El-taher 2 1 College of Computer Science and IT King Faisal University Saudi Arabia, Alhasa 31982 2 Faculty

More information

ANALYTICS CENTER LEARNING PROGRAM

ANALYTICS CENTER LEARNING PROGRAM Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals

More information

Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine

Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine 2 - Manova 4.3.05 25 Multivariate Analysis of Variance What Multivariate Analysis of Variance is The general purpose of multivariate analysis of variance (MANOVA) is to determine whether multiple levels

More information