Factor Models for Gender Prediction Based on E-commerce Data
|
|
|
- Irma Douglas
- 10 years ago
- Views:
Transcription
1 Factor Models for Gender Prediction Based on E-commerce Data Data Mining Competition PAKDD 2015, HoChiMinh City, Vietnam
2 Outline Hierarchical Basket Model Modeling Autocorrelation Sequential Block Voting Results & Implementation
3 Outline Hierarchical Basket Model Tree Encoding Factorization Machine Modeling Autocorrelation Sequential Block Voting Results & Implementation
4 Product Hierarchy u1, , , A01/B01/C01/D01/ u2, , , A02/B02/C02/D02/;A02/B02/C03/D03/; u3, , , A01/B01/C01/D02/;A01/B04/C05/D98/; A01 B01 B04 C01 C02 C05 D01 D02 D06 D22 D45 D98 D21 D89 D15
5 Path Encoding u3, , , A01/B01/C01/D02/;A01/B04/C05/D98/; A01 B01 B04 C01 C02 C05 D01 D02 D06 D22 D45 D98 D21 D89 D15 x i = {2, 0,, 1, 0, 0, 1, 0, 1,, 1, 0, } } {{ } } {{ } } {{ } A B D
6 Factorization Machine FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + x j x j v j, v j j=1 j=1 j =j+1 w 0 R, w R p, V R p k are the model parameters k N is the size/ dimensionality of the latent space the model has one feature vector v i for each variable x i [Rendle, TIST 2012]
7 Factorization Machine FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + x j x j v j, v j j=1 j=1 j =j+1 w 0 R, w R p, V R p k are the model parameters k N is the size/ dimensionality of the latent space the model has one feature vector v i for each variable x i [Rendle, TIST 2012]
8 Factorization Machine FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + x j x j v j, v j j=1 j=1 j =j+1 w 0 R, w R p, V R p k are the model parameters k N is the size/ dimensionality of the latent space the model has one feature vector v i for each variable x i [Rendle, TIST 2012]
9 Factorization Machine FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + x j x j v j, v j j=1 j=1 j =j+1 w 0 R, w R p, V R p k are the model parameters k N is the size/ dimensionality of the latent space the model has one feature vector v i for each variable x i [Rendle, TIST 2012]
10 Linear Part a FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + j=1 x j x j v j, v j j=1 j =j+1 A02 x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 0) } {{ } } {{ } } {{ } A B D B11 D55 p(female x i ) p(female A02) + p(female B11) + p(female D55)
11 Linear Part a FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + j=1 x j x j v j, v j j=1 j =j+1 A02 x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 0) } {{ } } {{ } } {{ } A B D B11 D55 p(female x i ) p(female A02) + p(female B11) + p(female D55)
12 Linear Part a FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + j=1 x j x j v j, v j j=1 j =j+1 A02 x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 0) } {{ } } {{ } } {{ } A B D B11 D55 p(female x i ) p(female A02) + p(female B11) + p(female D55)
13 Pairwise Interactions a FM model of order d = 2 ŷ FM (x) := w 0 + A02 w j x j + j=1 B11 j=1 j =j+1 x j x j v j, v j x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 1,, 0) } {{ } } {{ } } {{ } A B D Example: V =, Summer } Swimming {{ } j=d55,, Summer D55 } Swimming {{ } j =D95 D95,, V R p k
14 Pairwise Interactions a FM model of order d = 2 ŷ FM (x) := w 0 + A02 w j x j + j=1 B11 j=1 j =j+1 x j x j v j, v j x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 1,, 0) } {{ } } {{ } } {{ } A B D Example: V =, Summer } Swimming {{ } j=d55,, Summer D55 } Swimming {{ } j =D95 D95,, V R p k
15 Pairwise Interactions a FM model of order d = 2 ŷ FM (x) := w 0 + A02 w j x j + j=1 B11 j=1 j =j+1 x j x j v j, v j x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 1,, 0) } {{ } } {{ } } {{ } A B D Example: V =, Summer } Swimming {{ } j=d55,, Summer D55 } Swimming {{ } j =D95 D95,, V R p k
16 Pairwise Interactions a FM model of order d = 2 ŷ FM (x) := w 0 + A02 w j x j + j=1 B11 j=1 j =j+1 x j x j v j, v j x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 1,, 0) } {{ } } {{ } } {{ } A B D Example: V =, Summer } Swimming {{ } j=d55,, Summer D55 } Swimming {{ } j =D95 D95,, V R p k
17 Outline Hierarchical Basket Model Modeling Autocorrelation Sequential Block Voting Results & Implementation
18 Factoring Joint Probabilities 100 Autocorrelation Lag We can factorize the joint probability by conditioning on features that describe the related samples n p(y 0,, y n x 0,, x n ) := p(y i xi r, x i ) 0
19 Relational Features u3, , , A01/B01/C05/D11/ u4, , , A02/B01/C01/D02/;A05/B04/C05/D98/; u5, , , A05/B04/C05/D98/; u6, , , A04/B03/C06/D22/;A05/B14/C45/D68/; u7, , , A01/B01/C01/D03/;A01/B04/C05/D78/; A2 A4 A5 x a1 = [0, 1, 0, 1, 2, ] A1 A2 A4 A5 x a1:2 = [ 3, 1, 0, 1, 2, ] Combining different lags and categories we can describe the sample neighborhood with: x u5 = [x a1, x a1:2, x b1:3, x d1 ]
20 Relational Features u3, , , A01/B01/C05/D11/ u4, , , A02/B01/C01/D02/;A05/B04/C05/D98/; u5, , , A05/B04/C05/D98/; u6, , , A04/B03/C06/D22/;A05/B14/C45/D68/; u7, , , A01/B01/C01/D03/;A01/B04/C05/D78/; A2 A4 A5 x a1 = [0, 1, 0, 1, 2, ] A1 A2 A4 A5 x a1:2 = [ 3, 1, 0, 1, 2, ] Combining different lags and categories we can describe the sample neighborhood with: x u5 = [x a1, x a1:2, x b1:3, x d1 ]
21 Relational Features u3, , , A01/B01/C05/D11/ u4, , , A02/B01/C01/D02/;A05/B04/C05/D98/; u5, , , A05/B04/C05/D98/; u6, , , A04/B03/C06/D22/;A05/B14/C45/D68/; u7, , , A01/B01/C01/D03/;A01/B04/C05/D78/; A2 A4 A5 x a1 = [0, 1, 0, 1, 2, ] A1 A2 A4 A5 x a1:2 = [ 3, 1, 0, 1, 2, ] Combining different lags and categories we can describe the sample neighborhood with: x u5 = [x a1, x a1:2, x b1:3, x d1 ]
22 Outline Hierarchical Basket Model Modeling Autocorrelation Sequential Block Voting Results & Implementation
23 Identifying Sequential Blocks u1, , , A01/B01/C01/D01/ u2, , , A02/B02/C02/D02/;A02/B02/C03/D03/; u3, , , A02/B02/C02/D02/;A02/B02/C03/D04/; 1: blockid[:] 0 2: count 0 3: for i 1, n do 4: if endtime(i) endtime(i-1) then 5: count ++ 6: end if 7: blockid[i] count 8: end for
24 # wrong labels in block block size
25 Block based Voting 1: if blocksize(i) 10 AND (median(i) 6 OR median(i) 9) then 2: if median(i) 9 then 3: predict female 4: else if median(i) 6 then 5: predict male 6: end if 7: else per sample threshold 8: if y i 82 then 9: predict female 10: else 11: predict male 12: end if 13: end if
26 Outline Hierarchical Basket Model Modeling Autocorrelation Sequential Block Voting Results & Implementation
27 Results & Implementation Score Place Final Result Full Competition Source Code: Factorization Machine Implementation:
Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental
Introduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
Role of Social Networking in Marketing using Data Mining
Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:
International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
MACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
Data Mining Techniques in CRM
Data Mining Techniques in CRM Inside Customer Segmentation Konstantinos Tsiptsis CRM 6- Customer Intelligence Expert, Athens, Greece Antonios Chorianopoulos Data Mining Expert, Athens, Greece WILEY A John
A Basic Guide to Modeling Techniques for All Direct Marketing Challenges
A Basic Guide to Modeling Techniques for All Direct Marketing Challenges Allison Cornia Database Marketing Manager Microsoft Corporation C. Olivia Rud Executive Vice President Data Square, LLC Overview
Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING
Practical Applications of DATA MINING Sang C Suh Texas A&M University Commerce r 3 JONES & BARTLETT LEARNING Contents Preface xi Foreword by Murat M.Tanik xvii Foreword by John Kocur xix Chapter 1 Introduction
1. Overall, how satisfied are you working for The Company? Extremely Dissatisfied. Very Dissatisfied. Somewhat Dissatisfied.
Gathering information on employee satisfaction, this survey focuses on how employees feel about their job description, position within the company, relationships with colleagues and superiors, advancement
Data Mining. Dr. Saed Sayad. University of Toronto 2010 [email protected]. http://chem-eng.utoronto.ca/~datamining/
Data Mining Dr. Saed Sayad University of Toronto 2010 [email protected] http://chem-eng.utoronto.ca/~datamining/ 1 Data Mining Data mining is about explaining the past and predicting the future by
Enrollment Data Undergraduate Programs by Race/ethnicity and Gender (Fall 2008) Summary Data Undergraduate Programs by Race/ethnicity
Enrollment Data Undergraduate Programs by Race/ethnicity and Gender (Fall 8) Summary Data Undergraduate Programs by Race/ethnicity The following tables and figures depict 8, 7, and 6 enrollment data for
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
MS1b Statistical Data Mining
MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to
Blood Type Probability O 0.42 A 0.43 B 0.11 AB 0.04
PubH 6414 Worksheet 5a: Probability Principles 1 of 6 Example 1: Distribution of Blood Types in the US Blood Type Probability O 0.42 A 0.43 B 0.11 AB 0.04 Use the distribution of blood types in the US
Part 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
Principles of Dat Da a t Mining Pham Tho Hoan [email protected] [email protected]. n
Principles of Data Mining Pham Tho Hoan [email protected] References [1] David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT press, 2002 [2] Jiawei Han and Micheline Kamber,
Data Mining is the process of knowledge discovery involving finding
using analytic services data mining framework for classification predicting the enrollment of students at a university a case study Data Mining is the process of knowledge discovery involving finding hidden
Web 3.0 image search: a World First
Web 3.0 image search: a World First The digital age has provided a virtually free worldwide digital distribution infrastructure through the internet. Many areas of commerce, government and academia have
EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA
EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA Andreas Christmann Department of Mathematics homepages.vub.ac.be/ achristm Talk: ULB, Sciences Actuarielles, 17/NOV/2006 Contents 1. Project: Motor vehicle
KPIs and Scorecards using OBIEE 11g Mark Rittman, Rittman Mead Consulting Collaborate 11, Orlando, Florida, April 2011
KPIs and Scorecards using OBIEE 11g Mark Rittman, Rittman Mead Consulting Collaborate 11, Orlando, Florida, April 2011 A key new feature within Oracle Business Intelligence 11g is a new product called
Data Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
The Economist/YouGov Poll
Interviewing: Sample: 2000 Adults nationwide online 1345 registered voters nationwide online Weekly Tracking For immediate release 2 1. Approval of Obama as President Historical Do you approve or disapprove
Clustering. Data Mining. Abraham Otero. Data Mining. Agenda
Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in
WHAT IS A SITE MAP. Types of Site Maps. vertical. horizontal. A site map (or sitemap) is a
WHAT IS A SITE MAP A site map (or sitemap) is a list of pages of a web site accessible to crawlers or users. It can be either a document in any form used as a planning tool for Web design, or a Web page
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati [email protected], [email protected]
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
CSci 538 Articial Intelligence (Machine Learning and Data Analysis)
CSci 538 Articial Intelligence (Machine Learning and Data Analysis) Course Syllabus Fall 2015 Instructor Derek Harter, Ph.D., Associate Professor Department of Computer Science Texas A&M University - Commerce
COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments
Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for
Class 10. Data Mining and Artificial Intelligence. Data Mining. We are in the 21 st century So where are the robots?
Class 1 Data Mining Data Mining and Artificial Intelligence We are in the 21 st century So where are the robots? Data mining is the one really successful application of artificial intelligence technology.
Analytics on Big Data
Analytics on Big Data Riccardo Torlone Università Roma Tre Credits: Mohamed Eltabakh (WPI) Analytics The discovery and communication of meaningful patterns in data (Wikipedia) It relies on data analysis
WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat
Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano
BIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
TURKISH ORACLE USER GROUP
TURKISH ORACLE USER GROUP Data Mining in 30 Minutes Husnu Sensoy Global Maksimum Data & Information Tech. Founder VLDB Expert Agenda Who am I? Different problems of Data Mining In database data mining?!?
Data Mining + Business Intelligence. Integration, Design and Implementation
Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution
Factorization Machines
Factorization Machines Steffen Rendle Department of Reasoning for Intelligence The Institute of Scientific and Industrial Research Osaka University, Japan [email protected] Abstract In this
Machine Learning Logistic Regression
Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.
Data Preprocessing. Week 2
Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.
CSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
Predicting borrowers chance of defaulting on credit loans
Predicting borrowers chance of defaulting on credit loans Junjie Liang ([email protected]) Abstract Credit score prediction is of great interests to banks as the outcome of the prediction algorithm
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.7 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Linear Regression Other Regression Models References Introduction Introduction Numerical prediction is
SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING
AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations
Neural Network Add-in
Neural Network Add-in Version 1.5 Software User s Guide Contents Overview... 2 Getting Started... 2 Working with Datasets... 2 Open a Dataset... 3 Save a Dataset... 3 Data Pre-processing... 3 Lagging...
Data Mining: An Overview. David Madigan http://www.stat.columbia.edu/~madigan
Data Mining: An Overview David Madigan http://www.stat.columbia.edu/~madigan Overview Brief Introduction to Data Mining Data Mining Algorithms Specific Eamples Algorithms: Disease Clusters Algorithms:
Review of Modern Techniques of Qualitative Data Clustering
Review of Modern Techniques of Qualitative Data Clustering Sergey Cherevko and Andrey Malikov The North Caucasus Federal University, Institute of Information Technology and Telecommunications [email protected],
Data Warehousing und Data Mining
Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing Grid-Files Kd-trees Ulf Leser: Data
Enhanced Boosted Trees Technique for Customer Churn Prediction Model
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction
Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
The Predictive Data Mining Revolution in Scorecards:
January 13, 2013 StatSoft White Paper The Predictive Data Mining Revolution in Scorecards: Accurate Risk Scoring via Ensemble Models Summary Predictive modeling methods, based on machine learning algorithms
Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs
1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be
Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm
R. Sridevi et al Int. Journal of Engineering Research and Applications RESEARCH ARTICLE OPEN ACCESS Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi,*
Bayesian Factorization Machines
Bayesian Factorization Machines Christoph Freudenthaler, Lars Schmidt-Thieme Information Systems & Machine Learning Lab University of Hildesheim 31141 Hildesheim {freudenthaler, schmidt-thieme}@ismll.de
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Joseph Twagilimana, University of Louisville, Louisville, KY
ST14 Comparing Time series, Generalized Linear Models and Artificial Neural Network Models for Transactional Data analysis Joseph Twagilimana, University of Louisville, Louisville, KY ABSTRACT The aim
Best Practices in Data Visualizations. Vihao Pham January 29, 2014
Best Practices in Data Visualizations Vihao Pham January 29, 2014 Agenda Best Practices in Data Visualizations Why We Visualize Understanding Data Visualizations Enhancing Visualizations Visualization
Best Practices in Data Visualizations. Vihao Pham 2014
Best Practices in Data Visualizations Vihao Pham 2014 Agenda Best Practices in Data Visualizations Why We Visualize Understanding Data Visualizations Enhancing Visualizations Visualization Considerations
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
Anomaly Detection and Predictive Maintenance
Anomaly Detection and Predictive Maintenance Rosaria Silipo Iris Adae Christian Dietz Phil Winters [email protected] [email protected] [email protected] [email protected]
CSC 411: Lecture 07: Multiclass Classification
CSC 411: Lecture 07: Multiclass Classification Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 1, 2016 Urtasun, Zemel, Fidler (UofT) CSC 411: 07-Multiclass
Question 2 Naïve Bayes (16 points)
Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the
Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r
Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures ~ Spring~r Table of Contents 1. Introduction.. 1 1.1. What is the World Wide Web? 1 1.2. ABrief History of the Web
Automatic Network Protocol Analysis
Gilbert Wondracek, Paolo M ilani C omparetti, C hristopher Kruegel and E ngin Kirda {gilbert,pmilani}@ seclab.tuwien.ac.at chris@ cs.ucsb.edu engin.kirda@ eurecom.fr Reverse Engineering Network Protocols
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Sanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 [email protected] 1. Introduction The field of data mining and knowledgee discovery is emerging as a
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Classification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
A Survey on Pre-processing and Post-processing Techniques in Data Mining
, pp. 99-128 http://dx.doi.org/10.14257/ijdta.2014.7.4.09 A Survey on Pre-processing and Post-processing Techniques in Data Mining Divya Tomar and Sonali Agarwal Indian Institute of Information Technology,
DTREG. Predictive Modeling Software. Phillip H. Sherrod. Copyright 2003-2014 All rights reserved. www.dtreg.com
DTREG Predictive Modeling Software Phillip H. Sherrod Copyright 2003-2014 All rights reserved www.dtreg.com DTREG (pronounced D-T-Reg) builds classification and regression decision trees, neural networks,
Adaptive Anomaly Detection for Network Security
International Journal of Computer and Internet Security. ISSN 0974-2247 Volume 5, Number 1 (2013), pp. 1-9 International Research Publication House http://www.irphouse.com Adaptive Anomaly Detection for
Online Ensembles for Financial Trading
Online Ensembles for Financial Trading Jorge Barbosa 1 and Luis Torgo 2 1 MADSAD/FEP, University of Porto, R. Dr. Roberto Frias, 4200-464 Porto, Portugal [email protected] 2 LIACC-FEP, University of
CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht [email protected] 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht [email protected] 539 Sennott
SOLiD System accuracy with the Exact Call Chemistry module
WHITE PPER 55 Series SOLiD System SOLiD System accuracy with the Exact all hemistry module ONTENTS Principles of Exact all hemistry Introduction Encoding of base sequences with Exact all hemistry Demonstration
Public Information for ACBSP Accredited Programs at Florida State College at Jacksonville
Public Information for ACBSP Accredited Programs at Florida State College at Jacksonville Accreditation Council for Business Schools and Programs (ACBSP) As part of our ACBSP accreditation, we are required
EXTENDED CENTROID BASED CLUSTERING TECHNIQUE FOR ONLINE SHOPPING FRAUD DETECTION
EXTENDED CENTROID BASED CLUSTERING TECHNIQUE FOR ONLINE SHOPPING FRAUD DETECTION Priya J Rana 1, Jwalant Baria 2 1 ME IT, Department of IT, Parul institute of engineering & Technology, Gujarat, India 2
Neural Networks Lesson 5 - Cluster Analysis
Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm [email protected] Rome, 29
Heritage Provider Network Health Prize Round 3 Milestone: Team crescendo s Solution
Heritage Provider Network Health Prize Round 3 Milestone: Team crescendo s Solution Rie Johnson Tong Zhang 1 Introduction This document describes our entry nominated for the second prize of the Heritage
Appendix K: Responses to Selected Survey Results by Gender
Appendix K: Responses to Selected Survey Results by Gender Page 2 Citywide Customer Survey Results Tables Table 1: Index Scores by Gender of Respondent...2 Table 2: Quality of Life by Gender of Respondent...2
Qn: # Mark Score 1 20 2 20 3 20 4 20 5 20 Total 100
DEPARTMENT OF MATHEMATICS University of Toronto at Mississauga MAT 33Y, Test October 20, 2003 Time 6.0pm.8.00 pm Fill in the following information in INK! Last Name:. Given Name:. Student #:. Tutor s Name
KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics
ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM KATE GLEASON COLLEGE OF ENGINEERING John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE (KGCOE- CQAS- 747- Principles of
5. Correlation. Open HeightWeight.sav. Take a moment to review the data file.
5. Correlation Objectives Calculate correlations Calculate correlations for subgroups using split file Create scatterplots with lines of best fit for subgroups and multiple correlations Correlation The
The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA
Paper 156-2010 The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA Abstract JMP has a rich set of visual displays that can help you see the information
Big Data and Marketing
Big Data and Marketing Professor Venky Shankar Coleman Chair in Marketing Director, Center for Retailing Studies Mays Business School Texas A&M University http://www.venkyshankar.com [email protected]
A New Approach for Evaluation of Data Mining Techniques
181 A New Approach for Evaluation of Data Mining s Moawia Elfaki Yahia 1, Murtada El-mukashfi El-taher 2 1 College of Computer Science and IT King Faisal University Saudi Arabia, Alhasa 31982 2 Faculty
ANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine
2 - Manova 4.3.05 25 Multivariate Analysis of Variance What Multivariate Analysis of Variance is The general purpose of multivariate analysis of variance (MANOVA) is to determine whether multiple levels
