Predictive Analytics: Modeling the World. Richard D. De Veaux Professor of Statistics, Williams College January 28, 2005 OR/MS Seminar
|
|
- Clarence Bell
- 8 years ago
- Views:
Transcription
1 Predictive Analytics: Modeling the World Richard D. De Veaux Professor of Statistics, Williams College January 28, 2005 OR/MS Seminar
2 Getting to Know Your Customers 50 years ago this was easy Customer data base could fit in one person s head Retention of customers depended on ability to do so 2
3 21 st Century Data Bases Ability to anticipate customer s needs crucial for retention Even Sam Walton didn t know all his customer s preferences Amazon.com Earth s biggest selection $390,000 Diamond Necklace World s biggest book Yak Cheese from Tibet No one can do this without help Well, almost no one! 3
4 Direct Marketing Example Paralyzed Veterans of America KDD 1998 cup Mailing list of 3.5 million potential donors Lapsed donors Made their last donation to PVA 13 to 24 months prior to June ,000 (training and test sets) Who should get the current mailing? Cost effective strategy 4
5 Why is this Hard? Amount of Information 481 predictors 2 responses Cross tabs / OLAP How many combinations? What to focus on? Data Preparation This alone can be 60-95% of the effort Categorical vs. Quantitative 5
6 What s Hard? --Example 6
7 T-Code 7
8 So, what does it mean? T-Code Title 0 _ 1 6 DEAN 4 8 CORP ORAL LIC. 1 M R. 1 7 J UDGE 5 0 ELDER S A M ES SRS J UDGE & M RS. 5 6 M AYOR DA M R. & M RS. 1 8 M AJ OR LIEUTENANT & M RS S R. 2 M RS M AJ OR & M RS. 6 2 LORD S RA MESDAMES 19 SENATOR 63 CARDINAL 118 SRTA. 3 M IS S 2 0 GOVERNOR 6 4 FRIEND YOUR M AJ ES TY M IS S ES S ERGEANT & M RS. 6 5 FRIENDS HIS HIGHNES S 4 DR COLNEL & MRS. 68 ARCHDEACON 123 HER HIGHNESS 4002 DR. & MRS. 24 LIEUTENANT 69 CANON 124 COUNT DOCTORS 2 6 M ONSIGNOR 7 0 BIS HOP LADY 5 MADAME 27 REVEREND REVEREND & MRS. 126 PRINCE 6 S ERGEANT 2 8 M S. 7 3 PAS TOR P RINCES S 9 RABBI MSS. 75 ARCHBISHOP 128 CHIEF 1 0 P ROFES S OR 2 9 BIS HOP 8 5 SP ECIALIS T BARON P ROFES S OR & M RS. 3 1 AM BAS S ADOR 8 7 PRIVATE S HEIK P ROFES S ORS AM BAS S ADOR & M RS 8 9 SEAM AN P RINCE AND P RINCES S 1 1 ADM IRAL 3 3 CANTOR 9 0 AIRM AN YOUR IM PERIAL M AJ ES T ADM IRAL & M RS. 3 6 BROTHER 9 1 J US TICE M. ET M M E. 1 2 GENERAL 3 7 S IR 9 2 M R. J US TICE P ROF GENERAL & M RS. 3 8 COM M ODORE M. 1 3 COLONEL 4 0 FATHER M LLE COLONEL & MRS. 42 SISTER 104 CHANCELLOR 1 4 CAPTAIN 4 3 P RES IDENT REP RES ENTATIV E CAPTAIN & M RS. 4 4 M AS TER SECRETARY 1 5 COM M ANDER 4 6 M OTHER LT. GOVERNOR COM M ANDER & M RS. 4 7 CHAP LAIN 8
9 Results for PVA Data Set If entire list (100,000 donors) are mailed, net donation is $10,500 Using data mining techniques, this was increased 41.37% 9
10 KDD CUP 98 Results 10
11 KDD CUP 98 Results 2 11
12 Data Mining Is the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. --- Fayyad finding interesting structure (patterns, statistical models, relationships) in data bases.--- Fayyad, Chaduri and Bradley a knowledge discovery process of extracting previously unknown, actionable information from very large data bases --- Zornes a process that uses a variety of data analysis tools to discover patterns and relationships in data that may be used to make valid predictions. ---Edelstein 12
13 Data Mining Is 13
14 Case Study I Ingot Cracking ,000 lb. Ingots 20% cracking rate $30,000 per recast 90 Potential Explanatory Variables Water composition Metal composition Process variables Other environmental variables Can we predict under what conditions ingots will crack? 14
15 Case Study II Car Insurance mature policies 65 Potential Predictors Can we find a pattern for the unprofitable policies? 15
16 Case Study III Breast Cancer Diagnosis Mammograms used as screening instrument Expensive radiologist read Inaccurate False positive and negative rates over 25% Over a decade, nearly 100% false positive rate Can we do better? Automatically read by a scanning algorithm Automatically diagnosed by a model 16
17 Why not Queries? Queries Describe Models promote understanding Models can be assessed both by their understanding and their predictions It s difficult to predict especially the future Queries are Event Driven Models are phenomenon driven Queries are reactive Models are proactive 17
18 What Happened on the Titanic? Class Crew First Second Third 18
19 Mosaic Plot 1 F D M F S M C C312 C 3 A 2 19
20 Models Powerful predictors for optimizing performance Powerful summaries for understanding Used to explore data set Are not perfect All models are wrong, but some are useful Statisticians, like artists, have the bad habit of falling in love with their models. 20
21 Tree Diagram M F Adult Child 3 1,2,C 2 or 3 1 or Crew 3 1 or 2 46% 93% 14% Crew 1st 27% 100% 23% 33% 21
22 Why Models? What s interesting? Most associated variables in the census What s associated with shampoo purchases? Beer and Diapers In the convenience stores we looked at, on Friday nights, purchases of beer and purchases of diapers are highly associated Conclusions? Actions? 22
23 Beer and Diapers Picture from Tandem TM ad 23
24 Toy Toy Problem train2[, i] 24 train2$y train2[, i] train2$y train2$y train2[, i] train2[, i] train2$y train2$y train2[, i] train2[, i] train2$y train2$y train2[, i] train2[, i] train2$y train2$y train2[, i] train2[, i] train2$y
25 Familiar Models Linear Regression 25
26 Logistic Regression 26
27 Linear Regression Term Estimate Std Error t Ratio Prob> t Intercept x <.0001 x <.0001 x x <.0001 x <.0001 x x x x x R-squared: 76.1% Train 73.3% Test 27
28 Stepwise Regression Term Estimate Std Error t Ratio Prob> t Intercept x <.0001 x <.0001 x x <.0001 x <.0001 x R-squared 76.0% on Train 73.4% Test 28
29 Stepwise 2 ND Order Model Term Estimate Std Error t Ratio Prob> t Intercept x <.0001 (x )*(x ) <.0001 x <.0001 (x )*(x ) <.0001 x <.0001 (x )*(x ) <.0001 x x <.0001 (x )*(x ) x x (x )*(x ) (x )*(x ) (x )*(x ) (x )*(x ) (x )*(x ) R-squared 90.0% Train 88.5% Test 29
30 Next Steps Higher order terms? When to stop? Transformations? Too simple: underfitting bias Too complex: inconsistent predictions, overfitting high variance Selecting models is Occam s razor Keep goals of interpretation vs. Prediction in mind 30
31 Tree Model x4< x1< x1< x4< x2< x5< x2< x3< x5< x2< x5< x3< x5< x5< x3< x2< x4< x4< x4< x4< x4< x2< x3< x4< x2< x4< x3< x3< x3< x3< x1< x3< x3< x3< x8< x4< x4< R squared 82.3% Train 67.2% Test
32 Feature Creation New predictor based on original predictors Often linear: z = α + b x i 1 1 b p x p Principal components Factor analysis Multidimensional scaling 32
33 Neural Nets Don t resemble the brain Are just a statistical model 33
34 A Single Neuron x1 x x3 x4 x Input (z1) s(z1) Output x0 z1 = x1 +.7x2 -.2x3 +.4x4 -.5x5 34
35 More exotic Neural networks x1 z1 x2 z2 y z3 Output layer Input layer Hidden layer 35
36 Running a Neural Net 36
37 Predictions for Example R squared 92.7% Train 90.6% Test 37
38 What Does This Get Us? Enormous flexibility Ability to fit anything Including noise Interpretation? 38
39 Case Study Warranty Data A new backpack inkjet printer is showing higher than expected warranty claims What are the important variables? What s going on? A neural networks shows that Zipcode is the most important predictor 39
40 Spatial Analysis Warranty Data showing problem with ink jet printer Use the model as a black box for variable selection 40
41 y MARS Multivariate Adaptive Regression Splines What do they do? Replace each step function in a tree model by a pair of linear functions y y x x x 41
42 MARS Variable Importance R-squared 95.0% Train 94.3% Test (96.3%) (95.8%) 42
43 MARS Function Output 43
44 Collaborative Filtering Goal: predict what movies people will like Data: list of movies each person has watched Lyle Andre, Starwars Ellen Andre, Starwars, Coeur en Hiver Fred Starwars, Batman Dean Starwars, Batman, Rambo Jason Coeur en Hiver, Chocolat 44
45 Data Base Data can be represented as a sparse matrix Andre Starwars Batman Rambo Coeur Chocolat Lyle y y Ellen y y y Fred y y Dean y y y Jason y y y Karen y????? Karen likes Andre. What else might she like? CDNow doubled responses 45
46 How Do We Really Start? Life is not so kind Categorical variables Missing data 500 variables, not variables where to start? 46
47 Where to Start? EDM Use a tree to find a smaller subset of variables to investigate Explore this set graphically Start the modeling process over Build model Compare model on small subset with full predictive model 47
48 Start With a Simple Model Maybe a Tree: x4< x2< x1< x5< x1< x5< x2< x1< x5< x2< x4< x4< x5< x5<
49 Automatic Models KXEN 49
50 PVA Results from KXEN 50
51 Combining Models -- Bagging Bagging (Bootstrap Aggregation) Bootstrap a data set repeatedly Take many versions of same model (e.g. tree) Form a committee of models Take majority rule of predictions 51
52 Combining Models -- Boosting Take the data and apply a simple classifier Reweight the data, weighting the misclassified data much higher. Reapply the classifier Repeat over and over The final prediction is a combination of the output of each classifier, weighted by the overall misclassification rate. Details in Freund, Y. Boosting a weak learning algorithm by majority, Information and Computation 121(2),
53 Breast Cancer Diagnosis 53
54 Results from Random Forest Results from 1000 splits of Training and Test data False Positive Rate False Negative Rate Tree 32.20% 33.70% Boosted Trees 24.90% 32.50% Random Forest 19.30% 28.80% Neural Netw ork 25.50% 31.70% Radiologists 22.40% 35.80% 54
55 Case Study Ingot failures Ingot cracking ,000 lb. Ingots 20% cracking rate $30,000 per recast 90 potential explanatory variables Water composition (reduced) Metal composition Process variables Other environmental variables 55
56 Model building process Model building Train Test Evaluate 56
57 Most Important Variable Take One Here we started with trees Alloy We know that OK, take two Yttrium What do you think is in the alloy? Third time s the charm? Selenium! OH! 57
58 Case Study Car Insurance Now that we have mature policies, can we find other factors to price policies better? 65 potential predictors Industry, vehicle age, color, numbers of vehicles, usage and location etc 58
59 Fast Fail Not every modeling effort is a success A model search can save lots of queries Data took 8 months to get ready Analyst spent 2 months exploring it A new model search program (KXEN) running for several hours found no out of sample predictive ability Tree model gave similar results 59
60 PVA Recap Remember predictor variables Need a way to trim this down Need an exploratory model Neural network? Tree? 60
61 Students in Data Mining Class Student #1 $15,024 Student #2 $14,695 Student #3 $14,345 61
62 Take Home Messages What a great time to be a Statistician! Problems are exciting Research is exciting Success in Data mining Requires Team Work Requires Flexibility in modeling Means that you Act on Your results Depends much more on the way you mine the data rather than the specific model or tool that you use Which method to use? Yes!! Have fun! 62
63 Thank you! 63
Assessing Data Mining: The State of the Practice
Assessing Data Mining: The State of the Practice 2003 Herbert A. Edelstein Two Crows Corporation 10500 Falls Road Potomac, Maryland 20854 www.twocrows.com (301) 983-3555 Objectives Separate myth from reality
More informationData Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
More informationIntroduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing
Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition
More informationSuccessful Data Mining in Practice: Where do we Start?
Successful Data Mining in Practice: Where do we Start? Richard D. De Veaux Department of Mathematics and Statistics Williams College Williamstown MA, 01267 deveaux@williams williams.edu http://www.williams
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationEnsemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationData Mining Methods: Applications for Institutional Research
Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014
More informationData Mining for Model Creation. Presentation by Paul Below, EDS 2500 NE Plunkett Lane Poulsbo, WA USA 98370 paul.below@eds.
Sept 03-23-05 22 2005 Data Mining for Model Creation Presentation by Paul Below, EDS 2500 NE Plunkett Lane Poulsbo, WA USA 98370 paul.below@eds.com page 1 Agenda Data Mining and Estimating Model Creation
More informationAn Overview of Data Mining: Predictive Modeling for IR in the 21 st Century
An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIRPO
More informationEnsemble Methods. Adapted from slides by Todd Holloway h8p://abeau<fulwww.com/2007/11/23/ ensemble- machine- learning- tutorial/
Ensemble Methods Adapted from slides by Todd Holloway h8p://abeau
More informationApplied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
More informationExample application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health
Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining
More informationA Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationWhy do statisticians "hate" us?
Why do statisticians "hate" us? David Hand, Heikki Mannila, Padhraic Smyth "Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationA Basic Guide to Modeling Techniques for All Direct Marketing Challenges
A Basic Guide to Modeling Techniques for All Direct Marketing Challenges Allison Cornia Database Marketing Manager Microsoft Corporation C. Olivia Rud Executive Vice President Data Square, LLC Overview
More informationEnsemble Learning Better Predictions Through Diversity. Todd Holloway ETech 2008
Ensemble Learning Better Predictions Through Diversity Todd Holloway ETech 2008 Outline Building a classifier (a tutorial example) Neighbor method Major ideas and challenges in classification Ensembles
More informationIntroduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationCI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
More informationPredictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD
Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,
More informationNew Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
More informationData Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction
Data Mining and Exploration Data Mining and Exploration: Introduction Amos Storkey, School of Informatics January 10, 2006 http://www.inf.ed.ac.uk/teaching/courses/dme/ Course Introduction Welcome Administration
More informationData Mining for Fun and Profit
Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools
More informationIT and CRM A basic CRM model Data source & gathering system Database system Data warehouse Information delivery system Information users
1 IT and CRM A basic CRM model Data source & gathering Database Data warehouse Information delivery Information users 2 IT and CRM Markets have always recognized the importance of gathering detailed data
More informationData Mining Techniques Chapter 6: Decision Trees
Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................
More informationModel Validation Techniques
Model Validation Techniques Kevin Mahoney, FCAS kmahoney@ travelers.com CAS RPM Seminar March 17, 2010 Uses of Statistical Models in P/C Insurance Examples of Applications Determine expected loss cost
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationData Mining Lab 5: Introduction to Neural Networks
Data Mining Lab 5: Introduction to Neural Networks 1 Introduction In this lab we are going to have a look at some very basic neural networks on a new data set which relates various covariates about cheese
More informationIntroduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
More informationWebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat
Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for
More informationPredictive modelling around the world 28.11.13
Predictive modelling around the world 28.11.13 Agenda Why this presentation is really interesting Introduction to predictive modelling Case studies Conclusions Why this presentation is really interesting
More informationPredictive Modeling Techniques in Insurance
Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics
More informationSTATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
More informationTitle. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.
Title Introduction to Data Mining Dr Arulsivanathan Naidoo Statistics South Africa OECD Conference Cape Town 8-10 December 2010 1 Outline Introduction Statistics vs Knowledge Discovery Predictive Modeling
More informationData Mining. for Process Improvement DATA MINING. Paul Below, Quantitative Software Management, Inc. (QSM)
Data mining techniques can be used to help thin out the forest so that we can examine the important trees. Hopefully, this article will encourage you to learn more about data mining, try some of the techniques
More informationLeveraging Ensemble Models in SAS Enterprise Miner
ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to
More informationKnowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes
Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-19-B &
More informationBOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING
BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING Xavier Conort xavier.conort@gear-analytics.com Session Number: TBR14 Insurance has always been a data business The industry has successfully
More informationDatabase Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
More informationDecision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
More informationRole of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign
Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Arun K Mandapaka, Amit Singh Kushwah, Dr.Goutam Chakraborty Oklahoma State University, OK, USA ABSTRACT Direct
More informationFoundations of Artificial Intelligence. Introduction to Data Mining
Foundations of Artificial Intelligence Introduction to Data Mining Objectives Data Mining Introduce a range of data mining techniques used in AI systems including : Neural networks Decision trees Present
More informationCar Insurance. Havránek, Pokorný, Tomášek
Car Insurance Havránek, Pokorný, Tomášek Outline Data overview Horizontal approach + Decision tree/forests Vertical (column) approach + Neural networks SVM Data overview Customers Viewed policies Bought
More informationA Short Tour of the Predictive Modeling Process
Chapter 2 A Short Tour of the Predictive Modeling Process Before diving in to the formal components of model building, we present a simple example that illustrates the broad concepts of model building.
More informationData Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
More informationChapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida - 1 -
Chapter 11 Boosting Xiaogang Su Department of Statistics University of Central Florida - 1 - Perturb and Combine (P&C) Methods have been devised to take advantage of the instability of trees to create
More informationData Warehousing and Data Mining in Business Applications
133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business
More informationLocation matters. 3 techniques to incorporate geo-spatial effects in one's predictive model
Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort xavier.conort@gear-analytics.com Motivation Location matters! Observed value at one location is
More informationnot possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
More informationWhite Paper. Redefine Your Analytics Journey With Self-Service Data Discovery and Interactive Predictive Analytics
White Paper Redefine Your Analytics Journey With Self-Service Data Discovery and Interactive Predictive Analytics Contents Self-service data discovery and interactive predictive analytics... 1 What does
More informationA STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH
205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology
More informationData Mining Techniques
15.564 Information Technology I Business Intelligence Outline Operational vs. Decision Support Systems What is Data Mining? Overview of Data Mining Techniques Overview of Data Mining Process Data Warehouses
More informationData Mining: An Introduction
Data Mining: An Introduction Michael J. A. Berry and Gordon A. Linoff. Data Mining Techniques for Marketing, Sales and Customer Support, 2nd Edition, 2004 Data mining What promotions should be targeted
More informationData Mining Techniques for Prognosis in Pancreatic Cancer
Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree
More informationPharmaSUG2011 Paper HS03
PharmaSUG2011 Paper HS03 Using SAS Predictive Modeling to Investigate the Asthma s Patient Future Hospitalization Risk Yehia H. Khalil, University of Louisville, Louisville, KY, US ABSTRACT The focus of
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationLecture 10: Regression Trees
Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
More informationPredictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar
Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar Prepared by Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com Louise.francis@data-mines.cm
More informationFast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
More informationPerspectives on Data Mining
Perspectives on Data Mining Niall Adams Department of Mathematics, Imperial College London n.adams@imperial.ac.uk April 2009 Objectives Give an introductory overview of data mining (DM) (or Knowledge Discovery
More informationCross Validation. Dr. Thomas Jensen Expedia.com
Cross Validation Dr. Thomas Jensen Expedia.com About Me PhD from ETH Used to be a statistician at Link, now Senior Business Analyst at Expedia Manage a database with 720,000 Hotels that are not on contract
More informationData Mining Applications in Fund Raising
Data Mining Applications in Fund Raising Nafisseh Heiat Data mining tools make it possible to apply mathematical models to the historical data to manipulate and discover new information. In this study,
More informationData Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
More informationChapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
More informationTNS EX A MINE BehaviourForecast Predictive Analytics for CRM. TNS Infratest Applied Marketing Science
TNS EX A MINE BehaviourForecast Predictive Analytics for CRM 1 TNS BehaviourForecast Why is BehaviourForecast relevant for you? The concept of analytical Relationship Management (acrm) becomes more and
More informationPredicting Student Persistence Using Data Mining and Statistical Analysis Methods
Predicting Student Persistence Using Data Mining and Statistical Analysis Methods Koji Fujiwara Office of Institutional Research and Effectiveness Bemidji State University & Northwest Technical College
More informationPrediction of Car Prices of Federal Auctions
Prediction of Car Prices of Federal Auctions BUDT733- Final Project Report Tetsuya Morito Karen Pereira Jung-Fu Su Mahsa Saedirad 1 Executive Summary The goal of this project is to provide buyers who attend
More informationComparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
More informationData Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationEvent driven trading new studies on innovative way. of trading in Forex market. Michał Osmoła INIME live 23 February 2016
Event driven trading new studies on innovative way of trading in Forex market Michał Osmoła INIME live 23 February 2016 Forex market From Wikipedia: The foreign exchange market (Forex, FX, or currency
More informationIn this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
More informationAdvanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
More informationAgenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller
Agenda Introduktion till Prediktiva modeller Beslutsträd Beslutsträd och andra prediktiva modeller Mathias Lanner Sas Institute Pruning Regressioner Neurala Nätverk Utvärdering av modeller 2 Predictive
More informationData Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
More informationWhat is Customer Relationship Management? Customer Relationship Management Analytics. Customer Life Cycle. Objectives of CRM. Three Types of CRM
Relationship Management Analytics What is Relationship Management? CRM is a strategy which utilises a combination of Week 13: Summary information technology policies processes, employees to develop profitable
More informationWhy Ensembles Win Data Mining Competitions
Why Ensembles Win Data Mining Competitions A Predictive Analytics Center of Excellence (PACE) Tech Talk November 14, 2012 Dean Abbott Abbott Analytics, Inc. Blog: http://abbottanalytics.blogspot.com URL:
More informationData Mining + Business Intelligence. Integration, Design and Implementation
Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution
More informationHandling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza
Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and
More informationBOR 6335 Data Mining. Course Description. Course Bibliography and Required Readings. Prerequisites
BOR 6335 Data Mining Course Description This course provides an overview of data mining and fundamentals of using RapidMiner and OpenOffice open access software packages to develop data mining models.
More information6.2.8 Neural networks for data mining
6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural
More informationData Analytics and Business Intelligence (8696/8697)
http: // togaware. com Copyright 2014, Graham.Williams@togaware.com 1/36 Data Analytics and Business Intelligence (8696/8697) Ensemble Decision Trees Graham.Williams@togaware.com Data Scientist Australian
More informationPentaho Data Mining Last Modified on January 22, 2007
Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org
More informationOverview. Data Mining. Predicting Stock Market Returns. Predicting Health Risk. Wharton Department of Statistics. Wharton
Overview Data Mining Bob Stine www-stat.wharton.upenn.edu/~bob Applications - Marketing: Direct mail advertising (Zahavi example) - Biomedical: finding predictive risk factors - Financial: predicting returns
More informationCOPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments
Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for
More informationPredicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
More informationWhat is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO
What is Data Mining? Data Mining (Knowledge discovery in database) Data Mining: "The non trivial extraction of implicit, previously unknown, and potentially useful information from data" William J Frawley,
More informationModel Combination. 24 Novembre 2009
Model Combination 24 Novembre 2009 Datamining 1 2009-2010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationDATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
More informationData Mining. Vera Goebel. Department of Informatics, University of Oslo
Data Mining Vera Goebel Department of Informatics, University of Oslo 2011 1 Lecture Contents Knowledge Discovery in Databases (KDD) Definition and Applications OLAP Architectures for OLAP and KDD KDD
More informationA Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model
A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model Twinkle Patel, Ms. Ompriya Kale Abstract: - As the usage of credit card has increased the credit card fraud has also increased
More informationImproving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationA Property & Casualty Insurance Predictive Modeling Process in SAS
Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing
More informationData Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI
Data Mining Knowledge Discovery, Data Warehousing and Machine Learning Final remarks Lecturer: JERZY STEFANOWSKI Email: Jerzy.Stefanowski@cs.put.poznan.pl Data Mining a step in A KDD Process Data mining:
More information