IT Applications in Business Analytics SS2016 / Lecture 07 Use Case 1 (Two Class Classification) Thomas Zeutschler
|
|
|
- Annis Craig
- 9 years ago
- Views:
Transcription
1 Hochschule Düsseldorf University of Applied Scienses Fachbereich Wirtschaftswissenschaften W Business Analytics (M.Sc.) IT in Business Analytics IT Applications in Business Analytics SS2016 / Lecture 07 Use Case 1 (Two Class Classification) SS IT Applications in Business Analytics - 6. Analytical Use Case 1 1
2 Let s get started be a business analytics consultant! SS IT Applications in Business Analytics - 6. Analytical Use Case 1 2
3 Case 1 Bike Sales SS IT Applications in Business Analytics - 6. Analytical Use Case 1 3
4 Point of Departure 2016 Polygon Whether you're making a go at XC mountain bike racing or simply looking to upgrade your confidence level on the trail, the Polygon hardtail mountain bike proves to be the perfect choice. The Polygon feature sour race-proven 29er geometry with a low-slung bottom bracket and incredibly short chainstays for a planted sensation, snappy handling, and efficient power transfer. It's the obvious mountain bike for anyone who demands speed and reliability. SS IT Applications in Business Analytics - 6. Analytical Use Case 1 4
5 Point of Departure Bike Shop We run a bike shop, both stationary and online. Based on an online competition we collected a couple of new customer records. We want to send an to the most promising new customers to advertise our new 2016 mountain bike model, the Polygon. Who are they? SS IT Applications in Business Analytics - 6. Analytical Use Case 1 5
6 The best team will win 4x Teams volunteer to deliver the best proposal for the campaign. Main Deliverable Proposal for list of new customers to send an . Evaluate the best prediction model Use the ROC AUC (area under curve) value Present your results (next week) What have you done and why? (use your Knime workflows to explain) What is your conclusion and proposal? Compile a few slides, max. 10 minutes presentation SS IT Applications in Business Analytics - 6. Analytical Use Case 1 6
7 CRISP DM Phases and Tasks Business Understanding Determine Business Objectives Background. Business Objectives. Business Success Criteria. Assess Situation Inventory of Resources, Requirements, Assumptions and Constraints. Risks and Contingencies Terminology. Costs and Benefits. Determine Data Mining Goals Data Mining Goals. Data Mining Success Criteria. Data Understanding Collect Initial Data Initial Data Collection Report. Describe Data Data Description Report. Explore Data Data Exploration Report. Verify Data Quality Data Quality Report. Data Preparation Select Data Rationale for Inclusion/ Exclusion. Clean Data Data Cleaning Report. Construct Data Derived Attributes. Generated Records. Integrate Data Merged Data. Format Data Reformatted Data. Dataset Dataset Description. Modelling Select Modelling Technique Modelling Technique. Modelling Assumptions. Generate Test Design Test Design. Build Model Parameter Settings Models. Model Description. Assess Model Model Assessment. Revised Parameter Settings. Evaluation Evaluate Results Assessment of Data. Mining Results w.r.t. Business Success Criteria. Approved Models. Review Process Review of Process. Determine Next Steps List of Possible Actions. Decision. Deployment Plan Deployment Deployment Plan. Plan Monitoring and Maintenance Monitoring and Maintenance Plan. Produce Final Report Final Report. Final Presentation. Review Project Experience Documentation. Produce Project Plan Project Plan. Initial Assessment of Tools and Techniques. SS IT Applications in Business Analytics - 6. Analytical Use Case 1 7
8 Available Data Sheet: ExistingCustomers >>> Use for model training and test. Sheet: NewCustomers >>> Select promising s receivers. SS IT Applications in Business Analytics - 6. Analytical Use Case 1 8
9 Knime Sample Implementation Beat the teacher. Area Under Curve = 0,756 chler/seiten/default.aspx Receiver Operating Characteristic (ROC), is a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied. SS IT Applications in Business Analytics - 6. Analytical Use Case 1 9
10 Want to beat your teacher? (AUC 0,756) Do you have a full understanding of the business problem? What is about data quality? Do we need further data preparation? What is the class of the problem to solve (tip: cheat-sheet)? How to select the right / best prediction model? SS IT Applications in Business Analytics - 6. Analytical Use Case 1 10
11 Cheating SS IT Applications in Business Analytics - 6. Analytical Use Case 1 11
12 Two Class Classification SS IT Applications in Business Analytics - 6. Analytical Use Case 1 12
13 Two Class Classification Introduction Also called Binary Classification Statistical Problem: Classify the elements of a given set into two groups by applying a certain classification method. Application in economies: Customer selection, e.g. Whom to send an ? Portfolio decisions, e.g. What stocks or products to buy? Any kind of Yes/No assignment Application in medical testing: Has a patient a certain disease or not? SS IT Applications in Business Analytics - 6. Analytical Use Case 1 13
14 Two Class Classification Similar Problems Super-Problem: Statistical Classification One Class (unary) Classification Identify specific elements among others. Application: outlier detection, anomaly detection, novelty detection Multi-Class (multinomial) Classification Classify the elements of a given set into more than two groups by applying a certain classification method. Application: clustering, attribute assignment, just more then 2 classes SS IT Applications in Business Analytics - 6. Analytical Use Case 1 14
15 Two Class Classification Confusion Matrix Purpose: Evaluate the performance of a certain classification algorithm. Biker Buyer? Yes Predicted Class No Actual Class Yes No SS IT Applications in Business Analytics - 6. Analytical Use Case 1 15
16 Two Class Classification Confusion Matrix Purpose: Evaluate the performance of a certain classification algorithm. Biker Buyer? Yes Predicted Class No Actual Class Yes No true positives false positive true negatives false negatives error correct SS IT Applications in Business Analytics - 6. Analytical Use Case 1 16
17 Two Class Classification Confusion Matrix Purpose: Evaluate the performance of a certain classification algorithm. Biker Buyer? Population = Yes Predicted Class No Actual Class Yes No SS IT Applications in Business Analytics - 6. Analytical Use Case 1 17
18 Two Class Classification Confusion Matrix Purpose: Evaluate the performance of a certain classification algorithm. Total Population predicted condition positive negative Prevalence = Σ Condition positive / Σ Total population real condition positive negative true positive false positive (type I error) false negative (type II error) true negative True Positive Rate (TPR) = Σ True positive / Σ Condition positive (also called Sensitivity, Recall) False Positive Rate (FPR) = Σ False positive / Σ Condition negative (also called Fall-out) False Negative Rate (FNR) = Σ False negative / Σ Condition positive (also called Miss rate) True Negative Rate (TNR) = Σ True negative / Σ Condition negative (also called Specificity (SPC)) Accuracy (ACC) = (Σ True positive + Σ True negative) / Σ Total population Positive Predictive Value (PPV), = Σ True positive / Σ Test outcome positive (also called Precision) False Discovery Rate (FDR) = Σ False positive / Σ Test outcome positive False Omission Rate (FOR) = Σ False negative / Σ Test outcome negative Negative Predictive Value (NPV) = Σ True negative / Σ Test outcome negative Positive Likelihood Ratio (LR+) = TPR / FPR Negative Likelihood Ratio (LR ) = FNR / TNR Diagnostic Odds Ratio (DOR) = LR+ / LR SS IT Applications in Business Analytics - 6. Analytical Use Case 1 18
19 Classification Method Comparison Linearly separable pattern: Binary (2-classes) classification SS IT Applications in Business Analytics - 6. Analytical Use Case 1 19
20 Classification Method Comparison Linearly inseparable pattern: Binary Classification for a simple XOR pattern SS IT Applications in Business Analytics - 6. Analytical Use Case 1 20
21 Classification Method Comparison Linearly separable pattern: 3-classes classification SS IT Applications in Business Analytics - 6. Analytical Use Case 1 21
22 Classification Method Comparison Linearly inseparable pattern: Binary Classification for a complex XOR pattern SS IT Applications in Business Analytics - 6. Analytical Use Case 1 22
23 Classification Method Comparison 4-classes classification for a complex pattern SS IT Applications in Business Analytics - 6. Analytical Use Case 1 23
24 Classification Method Comparison Try to understand the pattern of data... by applying visual data analysis by applying pairwise comparison of attributes Is your data Linear Separable? Yes: Logistic Regression, Neuronal Networks be cautious on Decision Tree or Random Forrest No: Random Forrest or SVM???: Random Forrest good balance of generalization and accuracy, and its computational cost is relatively low But: Neuronal Networks can (not must) be the best solution but it s not easy to tune them to deliver good results (many parameters). SS IT Applications in Business Analytics - 6. Analytical Use Case 1 24
25 Decision Tree Learning SS IT Applications in Business Analytics - 6. Analytical Use Case 1 25
26 Decision Tree Learning A supervised learning method. Purpose: Predict the value of a certain target variable of an item based on observations on other variables from other items. If the target variable is from a finite set of values, then we call it classification tree. Otherwise a regression tree. Leaves represent class labels, whereas Branches represent conjunctions of features (variables) that lead to those class labels. Decision Tree (partial) for Bike Sales Sample SS IT Applications in Business Analytics - 6. Analytical Use Case 1 26
27 Decision Tree Learning A decision trees describe data, not decisions. A decision tree can be used as input for decision making, e.g. a prediction. Computation: Recursive Partitioning Recursively split the data set into subsets based on an attribute-value-test. (Greedy Algorithm) The recursion is completed when the subset at a node has all the same value of the target variable, or when splitting no longer adds value to the predictions. This approach is called top-down induction of decision trees Different algorithms and metrics have been developed to solve the core in decision tree generation: What is the right variable at each step that best splits the set of items? Greedy Algorithm: making the locally optimal choice at each stage of recursive process. SS IT Applications in Business Analytics - 6. Analytical Use Case 1 27
28 Decision Tree Learning in Knime Metric (quality measure) for splitting: Gini Index or Gini Impurity : Given a set of m items i of {1,2,,m} and f i be the fraction of items labeled with the value v i. Information Gain Ratio: Based on the entropy* of an information: Information Gain is defined as = Entropy(parent) - Weighted Sum of Entropy(Children) *the expected value of an information. SS IT Applications in Business Analytics - 6. Analytical Use Case 1 28
29 Decision Tree Learning in Knime Pruning Method Pruning reduces tree size and avoids overfitting which increases the generalization performance, and thus, the prediction quality. Available is the "Minimal Description Length" (MDL) pruning or it can also be switched off. Reduced Error Pruning Just relevant if execution speed matters. Otherwise switch it off. Skip nominal columns with domain information Always switch on. This ensures that columns with too many nominal values (e.g. the customer name in the bike sales sample) are automatically skipped. SS IT Applications in Business Analytics - 6. Analytical Use Case 1 29
30 Bike Sales Solutions SS IT Applications in Business Analytics - 6. Analytical Use Case 1 30
31 Bike Sales using Decision Tree SS IT Applications in Business Analytics - 6. Analytical Use Case 1 31
32 Bike Sales using Optimized Random Forrest SS IT Applications in Business Analytics - 6. Analytical Use Case 1 32
33 Result Comparision Decision Tree Optimized Random Forrest SS IT Applications in Business Analytics - 6. Analytical Use Case 1 33
34 Bike Sales reevaluation by common sense Just 2000 new customers? Let s send everyone an SS IT Applications in Business Analytics - 6. Analytical Use Case 1 34
35 Lecture Summary & Homework SS IT Applications in Business Analytics - 6. Analytical Use Case 1 35
36 Lessons Learned Try to understand the business problem end-to-end. Try think beyond the scope of your current knowledge and work. That s analytical thinking. Even simple looking analytical problems may get tricky. You must follow multiple analytical paths to find the best solution. SS IT Applications in Business Analytics - 6. Analytical Use Case 1 36
37 Homework Read the post Classification performance comparison Read the article Predicting Good Probabilities With Supervised Learning scu-mizilc05.pdf SS IT Applications in Business Analytics - 6. Analytical Use Case 1 37
38 Any Questions? SS IT Applications in Business Analytics - 6. Analytical Use Case 1 38
Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal
Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Lecture 15 - ROC, AUC & Lift Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk [email protected] Tom Kelsey ID5059-17-AUC
COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics
CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
Data Mining for Knowledge Management. Classification
1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
Data mining techniques: decision trees
Data mining techniques: decision trees 1/39 Agenda Rule systems Building rule systems vs rule systems Quick reference 2/39 1 Agenda Rule systems Building rule systems vs rule systems Quick reference 3/39
Classification and Prediction
Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser
Data Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
Data Mining with R. Decision Trees and Random Forests. Hugh Murrell
Data Mining with R Decision Trees and Random Forests Hugh Murrell reference books These slides are based on a book by Graham Williams: Data Mining with Rattle and R, The Art of Excavating Data for Knowledge
Performance Measures in Data Mining
Performance Measures in Data Mining Common Performance Measures used in Data Mining and Machine Learning Approaches L. Richter J.M. Cejuela Department of Computer Science Technische Universität München
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data
Introduction to Data Mining
Introduction to Data Mining a.j.m.m. (ton) weijters (slides are partially based on an introduction of Gregory Piatetsky-Shapiro) Overview Why data mining (data cascade) Application examples Data Mining
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
Lecture 10: Regression Trees
Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models
Predictive Data modeling for health care: Comparative performance study of different prediction models
Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath [email protected] National Institute of Industrial Engineering (NITIE) Vihar
Decision-Tree Learning
Decision-Tree Learning Introduction ID3 Attribute selection Entropy, Information, Information Gain Gain Ratio C4.5 Decision Trees TDIDT: Top-Down Induction of Decision Trees Numeric Values Missing Values
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
Decision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Data Mining in CRM & Direct Marketing. Jun Du The University of Western Ontario [email protected]
Data Mining in CRM & Direct Marketing Jun Du The University of Western Ontario [email protected] Outline Why CRM & Marketing Goals in CRM & Marketing Models and Methodologies Case Study: Response Model Case
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
CRISP-DM: The life cicle of a data mining project. KDD Process
CRISP-DM: The life cicle of a data mining project KDD Process Business understanding the project objectives and requirements from a business perspective. then converting this knowledge into a data mining
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
Chapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
MACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
Beating the MLB Moneyline
Beating the MLB Moneyline Leland Chen [email protected] Andrew He [email protected] 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series
Didacticiel Études de cas
1 Theme Data Mining with R The rattle package. R (http://www.r project.org/) is one of the most exciting free data mining software projects of these last years. Its popularity is completely justified (see
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
In this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
Data Mining Techniques Chapter 6: Decision Trees
Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................
MHI3000 Big Data Analytics for Health Care Final Project Report
MHI3000 Big Data Analytics for Health Care Final Project Report Zhongtian Fred Qiu (1002274530) http://gallery.azureml.net/details/81ddb2ab137046d4925584b5095ec7aa 1. Data pre-processing The data given
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
Gerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19
PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
Microsoft Azure Machine learning Algorithms
Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql [email protected] http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation
New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
Introduction to Learning & Decision Trees
Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing
Predicting earning potential on Adult Dataset
MSc in Computing, Business Intelligence and Data Mining stream. Business Intelligence and Data Mining Applications Project Report. Predicting earning potential on Adult Dataset Submitted by: xxxxxxx Supervisor:
Better credit models benefit us all
Better credit models benefit us all Agenda Credit Scoring - Overview Random Forest - Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis
W6.B.1. FAQs CS535 BIG DATA W6.B.3. 4. If the distance of the point is additionally less than the tight distance T 2, remove it from the original set
http://wwwcscolostateedu/~cs535 W6B W6B2 CS535 BIG DAA FAQs Please prepare for the last minute rush Store your output files safely Partial score will be given for the output from less than 50GB input Computer
BOR 6335 Data Mining. Course Description. Course Bibliography and Required Readings. Prerequisites
BOR 6335 Data Mining Course Description This course provides an overview of data mining and fundamentals of using RapidMiner and OpenOffice open access software packages to develop data mining models.
Performance Metrics for Graph Mining Tasks
Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical
Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
Big Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
Data Mining Methods: Applications for Institutional Research
Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"!"#"$%&#'()*+',$$-.&#',/"-0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:
Professor Anita Wasilewska. Classification Lecture Notes
Professor Anita Wasilewska Classification Lecture Notes Classification (Data Mining Book Chapters 5 and 7) PART ONE: Supervised learning and Classification Data format: training and test data Concept,
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
Maschinelles Lernen mit MATLAB
Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical
Data Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS
DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.
Implementation of Data Mining Techniques to Perform Market Analysis
Implementation of Data Mining Techniques to Perform Market Analysis B.Sabitha 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, P.Balasubramanian 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
An Overview of Predictive Analytics for Practitioners. Dean Abbott, Abbott Analytics
An Overview of Predictive Analytics for Practitioners Dean Abbott, Abbott Analytics Thank You Sponsors Empower users with new insights through familiar tools while balancing the need for IT to monitor
Data Mining - The Next Mining Boom?
Howard Ong Principal Consultant Aurora Consulting Pty Ltd Abstract This paper introduces Data Mining to its audience by explaining Data Mining in the context of Corporate and Business Intelligence Reporting.
Data Mining for Business Analytics
Data Mining for Business Analytics Lecture 2: Introduction to Predictive Modeling Stern School of Business New York University Spring 2014 MegaTelCo: Predicting Customer Churn You just landed a great analytical
Evaluation & Validation: Credibility: Evaluating what has been learned
Evaluation & Validation: Credibility: Evaluating what has been learned How predictive is a learned model? How can we evaluate a model Test the model Statistical tests Considerations in evaluating a Model
Predicting Flight Delays
Predicting Flight Delays Dieterich Lawson [email protected] William Castillo [email protected] Introduction Every year approximately 20% of airline flights are delayed or cancelled, costing
Data Preprocessing. Week 2
Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.
Data Mining: Foundation, Techniques and Applications
Data Mining: Foundation, Techniques and Applications Lesson 1b :A Quick Overview of Data Mining Li Cuiping( 李 翠 平 ) School of Information Renmin University of China Anthony Tung( 鄧 锦 浩 ) School of Computing
STATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati [email protected], [email protected]
Fast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments
Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for
1. Classification problems
Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification
Advanced analytics at your hands
2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously
An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework
An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework Jakrarin Therdphapiyanak Dept. of Computer Engineering Chulalongkorn University
Machine Learning Capacity and Performance Analysis and R
Machine Learning and R May 3, 11 30 25 15 10 5 25 15 10 5 30 25 15 10 5 0 2 4 6 8 101214161822 0 2 4 6 8 101214161822 0 2 4 6 8 101214161822 100 80 60 40 100 80 60 40 100 80 60 40 30 25 15 10 5 25 15 10
CRISP - DM. Data Mining Process. Process Standardization. Why Should There be a Standard Process? Cross-Industry Standard Process for Data Mining
Mining Process CRISP - DM Cross-Industry Standard Process for Mining (CRISP-DM) European Community funded effort to develop framework for data mining tasks Goals: Cross-Industry Standard Process for Mining
CS590D: Data Mining Chris Clifton
CS590D: Data Mining Chris Clifton March 10, 2004 Data Mining Process Reminder: Midterm tonight, 19:00-20:30, CS G066. Open book/notes. Thanks to Laura Squier, SPSS for some of the material used How to
USING THE PREDICTIVE ANALYTICS FOR EFFECTIVE CROSS-SELLING
USING THE PREDICTIVE ANALYTICS FOR EFFECTIVE CROSS-SELLING Michael Combopiano Northwestern University [email protected] Sunil Kakade Northwestern University [email protected] Abstract--The
Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign
Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Arun K Mandapaka, Amit Singh Kushwah, Dr.Goutam Chakraborty Oklahoma State University, OK, USA ABSTRACT Direct
Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining
Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,
Analysis Tools and Libraries for BigData
+ Analysis Tools and Libraries for BigData Lecture 02 Abhijit Bendale + Office Hours 2 n Terry Boult (Waiting to Confirm) n Abhijit Bendale (Tue 2:45 to 4:45 pm). Best if you email me in advance, but I
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
S03-2008 The Difference Between Predictive Modeling and Regression Patricia B. Cerrito, University of Louisville, Louisville, KY
S03-2008 The Difference Between Predictive Modeling and Regression Patricia B. Cerrito, University of Louisville, Louisville, KY ABSTRACT Predictive modeling includes regression, both logistic and linear,
An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century
An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIRPO
MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS
MAXIMIZING RETURN ON DIRET MARKETING AMPAIGNS IN OMMERIAL BANKING S 229 Project: Final Report Oleksandra Onosova INTRODUTION Recent innovations in cloud computing and unified communications have made a
Analytics on Big Data
Analytics on Big Data Riccardo Torlone Università Roma Tre Credits: Mohamed Eltabakh (WPI) Analytics The discovery and communication of meaningful patterns in data (Wikipedia) It relies on data analysis
Course Syllabus. Purposes of Course:
Course Syllabus Eco 5385.701 Predictive Analytics for Economists Summer 2014 TTh 6:00 8:50 pm and Sat. 12:00 2:50 pm First Day of Class: Tuesday, June 3 Last Day of Class: Tuesday, July 1 251 Maguire Building
WEKA Explorer User Guide for Version 3-4-3
WEKA Explorer User Guide for Version 3-4-3 Richard Kirkby Eibe Frank November 9, 2004 c 2002, 2004 University of Waikato Contents 1 Launching WEKA 2 2 The WEKA Explorer 2 Section Tabs................................
Machine Learning Logistic Regression
Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.
