IT Applications in Business Analytics SS2016 / Lecture 07 Use Case 1 (Two Class Classification) Thomas Zeutschler
|
|
- Annis Craig
- 7 years ago
- Views:
Transcription
1 Hochschule Düsseldorf University of Applied Scienses Fachbereich Wirtschaftswissenschaften W Business Analytics (M.Sc.) IT in Business Analytics IT Applications in Business Analytics SS2016 / Lecture 07 Use Case 1 (Two Class Classification) SS IT Applications in Business Analytics - 6. Analytical Use Case 1 1
2 Let s get started be a business analytics consultant! SS IT Applications in Business Analytics - 6. Analytical Use Case 1 2
3 Case 1 Bike Sales SS IT Applications in Business Analytics - 6. Analytical Use Case 1 3
4 Point of Departure 2016 Polygon Whether you're making a go at XC mountain bike racing or simply looking to upgrade your confidence level on the trail, the Polygon hardtail mountain bike proves to be the perfect choice. The Polygon feature sour race-proven 29er geometry with a low-slung bottom bracket and incredibly short chainstays for a planted sensation, snappy handling, and efficient power transfer. It's the obvious mountain bike for anyone who demands speed and reliability. SS IT Applications in Business Analytics - 6. Analytical Use Case 1 4
5 Point of Departure Bike Shop We run a bike shop, both stationary and online. Based on an online competition we collected a couple of new customer records. We want to send an to the most promising new customers to advertise our new 2016 mountain bike model, the Polygon. Who are they? SS IT Applications in Business Analytics - 6. Analytical Use Case 1 5
6 The best team will win 4x Teams volunteer to deliver the best proposal for the campaign. Main Deliverable Proposal for list of new customers to send an . Evaluate the best prediction model Use the ROC AUC (area under curve) value Present your results (next week) What have you done and why? (use your Knime workflows to explain) What is your conclusion and proposal? Compile a few slides, max. 10 minutes presentation SS IT Applications in Business Analytics - 6. Analytical Use Case 1 6
7 CRISP DM Phases and Tasks Business Understanding Determine Business Objectives Background. Business Objectives. Business Success Criteria. Assess Situation Inventory of Resources, Requirements, Assumptions and Constraints. Risks and Contingencies Terminology. Costs and Benefits. Determine Data Mining Goals Data Mining Goals. Data Mining Success Criteria. Data Understanding Collect Initial Data Initial Data Collection Report. Describe Data Data Description Report. Explore Data Data Exploration Report. Verify Data Quality Data Quality Report. Data Preparation Select Data Rationale for Inclusion/ Exclusion. Clean Data Data Cleaning Report. Construct Data Derived Attributes. Generated Records. Integrate Data Merged Data. Format Data Reformatted Data. Dataset Dataset Description. Modelling Select Modelling Technique Modelling Technique. Modelling Assumptions. Generate Test Design Test Design. Build Model Parameter Settings Models. Model Description. Assess Model Model Assessment. Revised Parameter Settings. Evaluation Evaluate Results Assessment of Data. Mining Results w.r.t. Business Success Criteria. Approved Models. Review Process Review of Process. Determine Next Steps List of Possible Actions. Decision. Deployment Plan Deployment Deployment Plan. Plan Monitoring and Maintenance Monitoring and Maintenance Plan. Produce Final Report Final Report. Final Presentation. Review Project Experience Documentation. Produce Project Plan Project Plan. Initial Assessment of Tools and Techniques. SS IT Applications in Business Analytics - 6. Analytical Use Case 1 7
8 Available Data Sheet: ExistingCustomers >>> Use for model training and test. Sheet: NewCustomers >>> Select promising s receivers. SS IT Applications in Business Analytics - 6. Analytical Use Case 1 8
9 Knime Sample Implementation Beat the teacher. Area Under Curve = 0,756 chler/seiten/default.aspx Receiver Operating Characteristic (ROC), is a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied. SS IT Applications in Business Analytics - 6. Analytical Use Case 1 9
10 Want to beat your teacher? (AUC 0,756) Do you have a full understanding of the business problem? What is about data quality? Do we need further data preparation? What is the class of the problem to solve (tip: cheat-sheet)? How to select the right / best prediction model? SS IT Applications in Business Analytics - 6. Analytical Use Case 1 10
11 Cheating SS IT Applications in Business Analytics - 6. Analytical Use Case 1 11
12 Two Class Classification SS IT Applications in Business Analytics - 6. Analytical Use Case 1 12
13 Two Class Classification Introduction Also called Binary Classification Statistical Problem: Classify the elements of a given set into two groups by applying a certain classification method. Application in economies: Customer selection, e.g. Whom to send an ? Portfolio decisions, e.g. What stocks or products to buy? Any kind of Yes/No assignment Application in medical testing: Has a patient a certain disease or not? SS IT Applications in Business Analytics - 6. Analytical Use Case 1 13
14 Two Class Classification Similar Problems Super-Problem: Statistical Classification One Class (unary) Classification Identify specific elements among others. Application: outlier detection, anomaly detection, novelty detection Multi-Class (multinomial) Classification Classify the elements of a given set into more than two groups by applying a certain classification method. Application: clustering, attribute assignment, just more then 2 classes SS IT Applications in Business Analytics - 6. Analytical Use Case 1 14
15 Two Class Classification Confusion Matrix Purpose: Evaluate the performance of a certain classification algorithm. Biker Buyer? Yes Predicted Class No Actual Class Yes No SS IT Applications in Business Analytics - 6. Analytical Use Case 1 15
16 Two Class Classification Confusion Matrix Purpose: Evaluate the performance of a certain classification algorithm. Biker Buyer? Yes Predicted Class No Actual Class Yes No true positives false positive true negatives false negatives error correct SS IT Applications in Business Analytics - 6. Analytical Use Case 1 16
17 Two Class Classification Confusion Matrix Purpose: Evaluate the performance of a certain classification algorithm. Biker Buyer? Population = Yes Predicted Class No Actual Class Yes No SS IT Applications in Business Analytics - 6. Analytical Use Case 1 17
18 Two Class Classification Confusion Matrix Purpose: Evaluate the performance of a certain classification algorithm. Total Population predicted condition positive negative Prevalence = Σ Condition positive / Σ Total population real condition positive negative true positive false positive (type I error) false negative (type II error) true negative True Positive Rate (TPR) = Σ True positive / Σ Condition positive (also called Sensitivity, Recall) False Positive Rate (FPR) = Σ False positive / Σ Condition negative (also called Fall-out) False Negative Rate (FNR) = Σ False negative / Σ Condition positive (also called Miss rate) True Negative Rate (TNR) = Σ True negative / Σ Condition negative (also called Specificity (SPC)) Accuracy (ACC) = (Σ True positive + Σ True negative) / Σ Total population Positive Predictive Value (PPV), = Σ True positive / Σ Test outcome positive (also called Precision) False Discovery Rate (FDR) = Σ False positive / Σ Test outcome positive False Omission Rate (FOR) = Σ False negative / Σ Test outcome negative Negative Predictive Value (NPV) = Σ True negative / Σ Test outcome negative Positive Likelihood Ratio (LR+) = TPR / FPR Negative Likelihood Ratio (LR ) = FNR / TNR Diagnostic Odds Ratio (DOR) = LR+ / LR SS IT Applications in Business Analytics - 6. Analytical Use Case 1 18
19 Classification Method Comparison Linearly separable pattern: Binary (2-classes) classification SS IT Applications in Business Analytics - 6. Analytical Use Case 1 19
20 Classification Method Comparison Linearly inseparable pattern: Binary Classification for a simple XOR pattern SS IT Applications in Business Analytics - 6. Analytical Use Case 1 20
21 Classification Method Comparison Linearly separable pattern: 3-classes classification SS IT Applications in Business Analytics - 6. Analytical Use Case 1 21
22 Classification Method Comparison Linearly inseparable pattern: Binary Classification for a complex XOR pattern SS IT Applications in Business Analytics - 6. Analytical Use Case 1 22
23 Classification Method Comparison 4-classes classification for a complex pattern SS IT Applications in Business Analytics - 6. Analytical Use Case 1 23
24 Classification Method Comparison Try to understand the pattern of data... by applying visual data analysis by applying pairwise comparison of attributes Is your data Linear Separable? Yes: Logistic Regression, Neuronal Networks be cautious on Decision Tree or Random Forrest No: Random Forrest or SVM???: Random Forrest good balance of generalization and accuracy, and its computational cost is relatively low But: Neuronal Networks can (not must) be the best solution but it s not easy to tune them to deliver good results (many parameters). SS IT Applications in Business Analytics - 6. Analytical Use Case 1 24
25 Decision Tree Learning SS IT Applications in Business Analytics - 6. Analytical Use Case 1 25
26 Decision Tree Learning A supervised learning method. Purpose: Predict the value of a certain target variable of an item based on observations on other variables from other items. If the target variable is from a finite set of values, then we call it classification tree. Otherwise a regression tree. Leaves represent class labels, whereas Branches represent conjunctions of features (variables) that lead to those class labels. Decision Tree (partial) for Bike Sales Sample SS IT Applications in Business Analytics - 6. Analytical Use Case 1 26
27 Decision Tree Learning A decision trees describe data, not decisions. A decision tree can be used as input for decision making, e.g. a prediction. Computation: Recursive Partitioning Recursively split the data set into subsets based on an attribute-value-test. (Greedy Algorithm) The recursion is completed when the subset at a node has all the same value of the target variable, or when splitting no longer adds value to the predictions. This approach is called top-down induction of decision trees Different algorithms and metrics have been developed to solve the core in decision tree generation: What is the right variable at each step that best splits the set of items? Greedy Algorithm: making the locally optimal choice at each stage of recursive process. SS IT Applications in Business Analytics - 6. Analytical Use Case 1 27
28 Decision Tree Learning in Knime Metric (quality measure) for splitting: Gini Index or Gini Impurity : Given a set of m items i of {1,2,,m} and f i be the fraction of items labeled with the value v i. Information Gain Ratio: Based on the entropy* of an information: Information Gain is defined as = Entropy(parent) - Weighted Sum of Entropy(Children) *the expected value of an information. SS IT Applications in Business Analytics - 6. Analytical Use Case 1 28
29 Decision Tree Learning in Knime Pruning Method Pruning reduces tree size and avoids overfitting which increases the generalization performance, and thus, the prediction quality. Available is the "Minimal Description Length" (MDL) pruning or it can also be switched off. Reduced Error Pruning Just relevant if execution speed matters. Otherwise switch it off. Skip nominal columns with domain information Always switch on. This ensures that columns with too many nominal values (e.g. the customer name in the bike sales sample) are automatically skipped. SS IT Applications in Business Analytics - 6. Analytical Use Case 1 29
30 Bike Sales Solutions SS IT Applications in Business Analytics - 6. Analytical Use Case 1 30
31 Bike Sales using Decision Tree SS IT Applications in Business Analytics - 6. Analytical Use Case 1 31
32 Bike Sales using Optimized Random Forrest SS IT Applications in Business Analytics - 6. Analytical Use Case 1 32
33 Result Comparision Decision Tree Optimized Random Forrest SS IT Applications in Business Analytics - 6. Analytical Use Case 1 33
34 Bike Sales reevaluation by common sense Just 2000 new customers? Let s send everyone an SS IT Applications in Business Analytics - 6. Analytical Use Case 1 34
35 Lecture Summary & Homework SS IT Applications in Business Analytics - 6. Analytical Use Case 1 35
36 Lessons Learned Try to understand the business problem end-to-end. Try think beyond the scope of your current knowledge and work. That s analytical thinking. Even simple looking analytical problems may get tricky. You must follow multiple analytical paths to find the best solution. SS IT Applications in Business Analytics - 6. Analytical Use Case 1 36
37 Homework Read the post Classification performance comparison Read the article Predicting Good Probabilities With Supervised Learning scu-mizilc05.pdf SS IT Applications in Business Analytics - 6. Analytical Use Case 1 37
38 Any Questions? SS IT Applications in Business Analytics - 6. Analytical Use Case 1 38
Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal
Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether
More informationData Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Lecture 15 - ROC, AUC & Lift Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-17-AUC
More informationCOMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
More informationData Project Extract Big Data Analytics course. Toulouse Business School London 2015
Data Project Extract Big Data Analytics course Toulouse Business School London 2015 How do you analyse data? Project are often a flop: Need a problem, a business problem to solve. Start with a small well-defined
More informationInsurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics
More informationCI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
More informationData Mining for Knowledge Management. Classification
1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationData mining techniques: decision trees
Data mining techniques: decision trees 1/39 Agenda Rule systems Building rule systems vs rule systems Quick reference 2/39 1 Agenda Rule systems Building rule systems vs rule systems Quick reference 3/39
More informationClassification and Prediction
Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser
More informationData Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
More informationData Mining with R. Decision Trees and Random Forests. Hugh Murrell
Data Mining with R Decision Trees and Random Forests Hugh Murrell reference books These slides are based on a book by Graham Williams: Data Mining with Rattle and R, The Art of Excavating Data for Knowledge
More informationPerformance Measures in Data Mining
Performance Measures in Data Mining Common Performance Measures used in Data Mining and Machine Learning Approaches L. Richter J.M. Cejuela Department of Computer Science Technische Universität München
More informationData Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data
More informationIntroduction to Data Mining
Introduction to Data Mining a.j.m.m. (ton) weijters (slides are partially based on an introduction of Gregory Piatetsky-Shapiro) Overview Why data mining (data cascade) Application examples Data Mining
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationLecture 10: Regression Trees
Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
More informationEXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models
More informationPredictive Data modeling for health care: Comparative performance study of different prediction models
Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath hiremat.nitie@gmail.com National Institute of Industrial Engineering (NITIE) Vihar
More informationDecision-Tree Learning
Decision-Tree Learning Introduction ID3 Attribute selection Entropy, Information, Information Gain Gain Ratio C4.5 Decision Trees TDIDT: Top-Down Induction of Decision Trees Numeric Values Missing Values
More informationTOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
More informationDecision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationData Mining in CRM & Direct Marketing. Jun Du The University of Western Ontario jdu43@uwo.ca
Data Mining in CRM & Direct Marketing Jun Du The University of Western Ontario jdu43@uwo.ca Outline Why CRM & Marketing Goals in CRM & Marketing Models and Methodologies Case Study: Response Model Case
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationCRISP-DM: The life cicle of a data mining project. KDD Process
CRISP-DM: The life cicle of a data mining project KDD Process Business understanding the project objectives and requirements from a business perspective. then converting this knowledge into a data mining
More informationFacebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
More informationApplied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
More informationChapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
More informationMACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
More informationBeating the MLB Moneyline
Beating the MLB Moneyline Leland Chen llxchen@stanford.edu Andrew He andu@stanford.edu 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series
More informationDidacticiel Études de cas
1 Theme Data Mining with R The rattle package. R (http://www.r project.org/) is one of the most exciting free data mining software projects of these last years. Its popularity is completely justified (see
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationIn this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
More informationData Mining Techniques Chapter 6: Decision Trees
Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................
More informationMHI3000 Big Data Analytics for Health Care Final Project Report
MHI3000 Big Data Analytics for Health Care Final Project Report Zhongtian Fred Qiu (1002274530) http://gallery.azureml.net/details/81ddb2ab137046d4925584b5095ec7aa 1. Data pre-processing The data given
More informationPredicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationGerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
More informationData quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
More informationCONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19
PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations
More informationComparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
More informationMicrosoft Azure Machine learning Algorithms
Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql Tomaz.kastrun@gmail.com http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation
More informationNew Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
More informationIntroduction to Learning & Decision Trees
Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing
More informationPredicting earning potential on Adult Dataset
MSc in Computing, Business Intelligence and Data Mining stream. Business Intelligence and Data Mining Applications Project Report. Predicting earning potential on Adult Dataset Submitted by: xxxxxxx Supervisor:
More informationBetter credit models benefit us all
Better credit models benefit us all Agenda Credit Scoring - Overview Random Forest - Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis
More informationW6.B.1. FAQs CS535 BIG DATA W6.B.3. 4. If the distance of the point is additionally less than the tight distance T 2, remove it from the original set
http://wwwcscolostateedu/~cs535 W6B W6B2 CS535 BIG DAA FAQs Please prepare for the last minute rush Store your output files safely Partial score will be given for the output from less than 50GB input Computer
More informationBOR 6335 Data Mining. Course Description. Course Bibliography and Required Readings. Prerequisites
BOR 6335 Data Mining Course Description This course provides an overview of data mining and fundamentals of using RapidMiner and OpenOffice open access software packages to develop data mining models.
More informationPerformance Metrics for Graph Mining Tasks
Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationData Mining Methods: Applications for Institutional Research
Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014
More information!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"!"#"$%&#'()*+',$$-.&#',/"-0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:
More informationProfessor Anita Wasilewska. Classification Lecture Notes
Professor Anita Wasilewska Classification Lecture Notes Classification (Data Mining Book Chapters 5 and 7) PART ONE: Supervised learning and Classification Data format: training and test data Concept,
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationHow To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
More informationClassifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationMaschinelles Lernen mit MATLAB
Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical
More informationData Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
More informationASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS
DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.
More informationImplementation of Data Mining Techniques to Perform Market Analysis
Implementation of Data Mining Techniques to Perform Market Analysis B.Sabitha 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, P.Balasubramanian 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
More informationAn Overview of Predictive Analytics for Practitioners. Dean Abbott, Abbott Analytics
An Overview of Predictive Analytics for Practitioners Dean Abbott, Abbott Analytics Thank You Sponsors Empower users with new insights through familiar tools while balancing the need for IT to monitor
More informationData Mining - The Next Mining Boom?
Howard Ong Principal Consultant Aurora Consulting Pty Ltd Abstract This paper introduces Data Mining to its audience by explaining Data Mining in the context of Corporate and Business Intelligence Reporting.
More informationData Mining for Business Analytics
Data Mining for Business Analytics Lecture 2: Introduction to Predictive Modeling Stern School of Business New York University Spring 2014 MegaTelCo: Predicting Customer Churn You just landed a great analytical
More informationEvaluation & Validation: Credibility: Evaluating what has been learned
Evaluation & Validation: Credibility: Evaluating what has been learned How predictive is a learned model? How can we evaluate a model Test the model Statistical tests Considerations in evaluating a Model
More informationPredicting Flight Delays
Predicting Flight Delays Dieterich Lawson jdlawson@stanford.edu William Castillo will.castillo@stanford.edu Introduction Every year approximately 20% of airline flights are delayed or cancelled, costing
More informationData Preprocessing. Week 2
Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.
More informationSupervised Learning Evaluation (via Sentiment Analysis)!
Supervised Learning Evaluation (via Sentiment Analysis)! Why Analyze Sentiment? Sentiment Analysis (Opinion Mining) Automatically label documents with their sentiment Toward a topic Aggregated over documents
More informationData Mining: Foundation, Techniques and Applications
Data Mining: Foundation, Techniques and Applications Lesson 1b :A Quick Overview of Data Mining Li Cuiping( 李 翠 平 ) School of Information Renmin University of China Anthony Tung( 鄧 锦 浩 ) School of Computing
More informationSTATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
More informationPerformance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com
More informationFast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
More informationDATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
More informationCOPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments
Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for
More information1. Classification problems
Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification
More informationAdvanced analytics at your hands
2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously
More informationAn analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework
An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework Jakrarin Therdphapiyanak Dept. of Computer Engineering Chulalongkorn University
More informationMachine Learning Capacity and Performance Analysis and R
Machine Learning and R May 3, 11 30 25 15 10 5 25 15 10 5 30 25 15 10 5 0 2 4 6 8 101214161822 0 2 4 6 8 101214161822 0 2 4 6 8 101214161822 100 80 60 40 100 80 60 40 100 80 60 40 30 25 15 10 5 25 15 10
More informationCRISP - DM. Data Mining Process. Process Standardization. Why Should There be a Standard Process? Cross-Industry Standard Process for Data Mining
Mining Process CRISP - DM Cross-Industry Standard Process for Mining (CRISP-DM) European Community funded effort to develop framework for data mining tasks Goals: Cross-Industry Standard Process for Mining
More informationCS590D: Data Mining Chris Clifton
CS590D: Data Mining Chris Clifton March 10, 2004 Data Mining Process Reminder: Midterm tonight, 19:00-20:30, CS G066. Open book/notes. Thanks to Laura Squier, SPSS for some of the material used How to
More informationUSING THE PREDICTIVE ANALYTICS FOR EFFECTIVE CROSS-SELLING
USING THE PREDICTIVE ANALYTICS FOR EFFECTIVE CROSS-SELLING Michael Combopiano Northwestern University Michael.Comobopiano@att.net Sunil Kakade Northwestern University Sunil.kakade@gmail.com Abstract--The
More informationRole of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign
Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Arun K Mandapaka, Amit Singh Kushwah, Dr.Goutam Chakraborty Oklahoma State University, OK, USA ABSTRACT Direct
More informationData Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining
Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,
More informationAnalysis Tools and Libraries for BigData
+ Analysis Tools and Libraries for BigData Lecture 02 Abhijit Bendale + Office Hours 2 n Terry Boult (Waiting to Confirm) n Abhijit Bendale (Tue 2:45 to 4:45 pm). Best if you email me in advance, but I
More informationExperiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationS03-2008 The Difference Between Predictive Modeling and Regression Patricia B. Cerrito, University of Louisville, Louisville, KY
S03-2008 The Difference Between Predictive Modeling and Regression Patricia B. Cerrito, University of Louisville, Louisville, KY ABSTRACT Predictive modeling includes regression, both logistic and linear,
More informationAn Overview of Data Mining: Predictive Modeling for IR in the 21 st Century
An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIRPO
More informationMAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS
MAXIMIZING RETURN ON DIRET MARKETING AMPAIGNS IN OMMERIAL BANKING S 229 Project: Final Report Oleksandra Onosova INTRODUTION Recent innovations in cloud computing and unified communications have made a
More informationAnalytics on Big Data
Analytics on Big Data Riccardo Torlone Università Roma Tre Credits: Mohamed Eltabakh (WPI) Analytics The discovery and communication of meaningful patterns in data (Wikipedia) It relies on data analysis
More informationCourse Syllabus. Purposes of Course:
Course Syllabus Eco 5385.701 Predictive Analytics for Economists Summer 2014 TTh 6:00 8:50 pm and Sat. 12:00 2:50 pm First Day of Class: Tuesday, June 3 Last Day of Class: Tuesday, July 1 251 Maguire Building
More informationWEKA Explorer User Guide for Version 3-4-3
WEKA Explorer User Guide for Version 3-4-3 Richard Kirkby Eibe Frank November 9, 2004 c 2002, 2004 University of Waikato Contents 1 Launching WEKA 2 2 The WEKA Explorer 2 Section Tabs................................
More informationMachine Learning Logistic Regression
Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.
More information