Prediction of Car Prices of Federal Auctions
|
|
|
- Rolf Bradford
- 10 years ago
- Views:
Transcription
1 Prediction of Car Prices of Federal Auctions BUDT733- Final Project Report Tetsuya Morito Karen Pereira Jung-Fu Su Mahsa Saedirad 1
2 Executive Summary The goal of this project is to provide buyers who attend Federal government car auctions, a simple indicator of whether or not to bid for specific vehicles, thus helping to improve their decision-making process and their chances of winning a fair bid. The data was obtained from Karl Olson, an alumnus of the Data Mining course at Robert H. Smith School of Business. The original data set contained information on 136,152 cars and included both numerical and categorical predictors. After examining the original data for feasibility, in consultation with Karl, we decided to use data on two specific car manufacturers, reducing the number of records to 12,133. The final model predictors are listed in Exhibit 1 with their descriptions. After careful exploration of the data, we decided to transform and/or bin existing predictors to create new related predictors, to suit our problem-solving approach and modeling requirements. Transformations included Year and SaleDt to Age in Years, Location to State, Miles to Mileage Range, and Proceeds to Sales Category. This included creating bins, for Location based on geographical region rather than town, and for Color based on the base color instead of shades, in order to reduce the number of dummy variables. Bins were also created for Miles so that numerical ranges could be used in models such as Naïve Bayes that only accept categorical predictors. Eventually, we decided to use Age in Years, State, Model, Miles (or Mileage Range), Fuel, Color and Sale Type as our primary predictors to classify a specific car as belonging to the Average/Rough Trade-in or to the Clean Trade-in or Better Sales Category. These classes were created based on certain pre-determined formulae and values provided to us by Karl. KNN, Naïve Bayes, Logistic Regression, Classification Tree, and Discriminant Analysis were the classifiers that were applied to the data set to obtain the most accurate model(s), which could potentially be used for both classification and profiling purposes. Among the above models, the Classification Tree was determined to be most accurate for classification, since it had the lowest overall error rates and was also parsimonious. Due to its comparable overall error rates and ease of interpretation, Logistic Regression was chosen as the best model for profiling, to indicate which predictors are most significant in determining the Sales Category. Our recommendation, therefore, is to use the Classification Tree model to predict whether a new car being auctioned should be sold at the Average/Rough Trade-in range or at the Clean Trade-in or Better range, and to use the Logistic Regression model to understand which predictors determine the range of Proceeds gained when cars are auctioned. 2
3 Technical Summary Problem/Task The goal of this project is to help the decision making process in Federal used car auctions and indicates which variables are more important for prediction. Therefore, the task is to predict whether a car will fall into the category Average/Rough Trade-in or Clean Trade-in or Better To get the best prediction, we decide to use five classification methods: K Near Neighbors, Naïve Bayes, Logistic Regression, Classification Tree and Discriminant Analysis. Then we compare results and recommend the best performing model. Data Analysis The first thing we need to do is clean the data. There are more than 30 kinds of locations and colors in the data, so we create larger regions and combine sub-colors to main ones. After evaluation, only CARAVAN, IMPLALA, MALIBU and STRATUS are included in this project since their record numbers are significantly more than others. Missing data and blanks are fixed as well. After cleaning the data, we visually analyze the data to understand which variables have different patterns with two Sales Categories, Average/Rough Trade-in and Clean Trade-in or Better, and which variables have any relationships with them by using box charts and scatter plots. We can conclude that the age of vehicle has almost same pattern in both categories and has little predictive power in explaining differences between the two classes, while Miles show some interesting patterns (Exhibit 3). Further exploration showed that shorter mileage vehicles tend to be traded as Average/Rough Trade-in range while longer mileage vehicles traded as Clean Trade-in or Better. Especially Clean Trade-in or Better is mainly concentrated on 50, ,000 miles. One of the more interesting patterns was that though we transformed Proceed to Sales Category, when we look at scatter plots of two variables, Proceed and Miles, we can see how well they separate observations between the two classes. Thus we concluded that mileage contributes to explain differences between the two classes in meaningful ways (Exhibit 2). Model Comparison First we partition the data into 50%, 30% and 20% for training, validation and testing data set respectively. The reason to have testing data is to prevent over-fitting issues. To predict which Sales Category a specific car belongs, Average/Rough Trade-in or Clean Trade-in or Better, and determine which variables is the most influencing to decide category, we experimented with several different types of classifiers including Logistic Regression, Discriminant Analysis, Classification 3
4 Trees, KNN and Naïve Bayes. Then, we ran all five models using all of our variables including Age in years, State, Model, Miles (or Mileage Range), Fuel and Color after cleaning the data described above. When we run those models, we also tried several combinations of variables and pick some variables by assessing test sets error rate. However, the results from running with 4 car models aren t very satisfying. We can see from the error rate (Exhibit 4) that the naïve rule is 29.31%. In Exhibit 4, even the best performing classification tree has 21.34% error rate, which has 27.19% improvement compared to naïve rule. We don t find these performances comfortable, so we decide to dig deeper into individual car model. As each car model is tried, the best improvement we have is with CARAVAN. Exhibit 4 shows the results with CARAVAN. The naïve rate for CARAVAN is 36.29% while classification tree has an error rate of 11.62%. The improvement here is 67.98%, and it gives the prediction almost 90% accuracy rates. We try all five algorithms with each car model, but only the one with CARAVAN yields better performance. Recommendation/Conclusion As a result, we decide to recommend classification tree (Exhibit 5) with CARAVAN as the model to use in the future. There are a couple of reasons we recommend classification tree. First, it has the lowest error rate among five algorithms. The error rate is only 11.62% and the improvement is 67.98%. Second, classification tree is very intuitive and straightforward; it can be easily understood and implemented in the future. Last and the most important, classification tree is a data driven algorithm. The data we acquired from Karl Olson has thousands of records, and it is only for one year. Karl has the data for the past ten years. Therefore, the predictive power of classification tree will improve as it absorbs more data and learns through it. It is for sure that Karl will get more data as time passes, so we believe classification tree is the best fit for this project. In addition to that, Karl also expresses that he would like to know which variable is more important for prediction. For explanatory reason, we decide to run logistic regression without partitioning the data, the result is shown as Exhibit 5. The error rate of this non partitioning logistic regression is about 25.6%. P-value of the output shows that State_Northwest has the biggest impact among all coefficients. Furthermore, we can also see that there are many more significant variables than we see in classification tree. In the future, Karl may be able to use these variables to get a rough idea about how the auction will end up. To sum up, our recommendation is that Karl should use classification tree as the algorithm to predict whether an auction will fall into Average/Rough Trade-in or Clean Trade-in or Better At the same time, significant variables found in logistic regression can be used to support judgment. We believe that this will provide the best prediction to the real life usage. 4
5 Exhibit 1: Predictors and their Descriptions Predictor Description Type New? Included? Year Year in which the car was manufactured Numerical Original No Age in Years Age of car since manufacture, in years Numerical New Yes Miles Miles run by the car to date Numerical Original Yes Mileage Range Binned mileage (range of 10,000 miles for each bin) Categorical New Yes Proceeds Amount at which car was auctioned Numerical Original No SaleDt Date on which car was auctioned Numerical Original No Cylinders Number of cylinders in car Numerical Original Yes Location Location of auction Categorical Original No State Geographical region to which location belongs Categorical New Yes VIN Vehicle Identification Number Numerical Original No Make Manufacturer of car Categorical Original Yes Model Name of car model Categorical Original Yes Color Color of car Categorical Original Yes Fuel Fuel type of car (Ethanol ETH or Gas GAS) Categorical Original Yes Sale Type Live auction or auction over the Internet Categorical Original Yes Sales Category Average/Rough Trade-in or Clean Trade-in or Better Categorical New Yes Exhibit 2: Relationships between predictors, and between predictors and Proceeds 5
6 Exhibit 3: Distribution of prediction classes Age in years Miles Exhibit 4: Model Comparison Error Rate 4 car models Improvement Caravan Improvement Naïve Rule 29.31% NA 36.29% NA CT 21.34% 27.19% 11.62% 67.98% LR 21.84% 25.49% 13.46% 62.91% KNN 23.32% 20.44% 16.82% 53.65% NB 24.02% 18.05% 13.46% 62.91% DA 25.71% 12.28% 13.46% 62.91% Exhibit 5: Classification Tree and Logistic Regression Input variables Coefficient Std. Error p-value Odds Constant term * State_Mid-West State_North-East State_North-West State_South State_South-East State_South-West Age in Years Miles Color_GN Color_WH Cylinders
Understanding Characteristics of Caravan Insurance Policy Buyer
Understanding Characteristics of Caravan Insurance Policy Buyer May 10, 2007 Group 5 Chih Hau Huang Masami Mabuchi Muthita Songchitruksa Nopakoon Visitrattakul Executive Summary This report is intended
Determining Factors of a Quick Sale in Arlington's Condo Market. Team 2: Darik Gossa Roger Moncarz Jeff Robinson Chris Frohlich James Haas
Determining Factors of a Quick Sale in Arlington's Condo Market Team 2: Darik Gossa Roger Moncarz Jeff Robinson Chris Frohlich James Haas Executive Summary The real estate market for condominiums in Northern
BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics
Determining optimum insurance product portfolio through predictive analytics BADM Final Project Report
2012 Determining optimum insurance product portfolio through predictive analytics BADM Final Project Report Dinesh Ganti(61310071), Gauri Singh(61310560), Ravi Shankar(61310210), Shouri Kamtala(61310215),
JetBlue Airways Stock Price Analysis and Prediction
JetBlue Airways Stock Price Analysis and Prediction Team Member: Lulu Liu, Jiaojiao Liu DSO530 Final Project JETBLUE AIRWAYS STOCK PRICE ANALYSIS AND PREDICTION 1 Motivation Started in February 2000, JetBlue
Business Analytics using Data Mining
Business Analytics using Data Mining Project Report Indian School of Business Group A6 Bhushan Khandelwal 61410182 61410806 - Mahabaleshwar Bhat Mayank Gupta 61410659 61410697 - Shikhar Angra Sujay Koparde
CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19
PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations
Predicting Defaults of Loans using Lending Club s Loan Data
Predicting Defaults of Loans using Lending Club s Loan Data Oleh Dubno Fall 2014 General Assembly Data Science Link to my Developer Notebook (ipynb) - http://nbviewer.ipython.org/gist/odubno/0b767a47f75adb382246
Numerical Algorithms Group
Title: Summary: Using the Component Approach to Craft Customized Data Mining Solutions One definition of data mining is the non-trivial extraction of implicit, previously unknown and potentially useful
Lecture 10: Regression Trees
Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
TECHNICAL APPENDIX ESTIMATING THE OPTIMUM MILEAGE FOR VEHICLE RETIREMENT
TECHNICAL APPENDIX ESTIMATING THE OPTIMUM MILEAGE FOR VEHICLE RETIREMENT Regression analysis calculus provide convenient tools for estimating the optimum point for retiring vehicles from a large fleet.
A Short Tour of the Predictive Modeling Process
Chapter 2 A Short Tour of the Predictive Modeling Process Before diving in to the formal components of model building, we present a simple example that illustrates the broad concepts of model building.
Lowering social cost of car accidents by predicting high-risk drivers
Lowering social cost of car accidents by predicting high-risk drivers Vannessa Peng Davin Tsai Shu-Min Yeh Why we do this? Traffic accident happened every day. In order to decrease the number of traffic
Course Syllabus. Purposes of Course:
Course Syllabus Eco 5385.701 Predictive Analytics for Economists Summer 2014 TTh 6:00 8:50 pm and Sat. 12:00 2:50 pm First Day of Class: Tuesday, June 3 Last Day of Class: Tuesday, July 1 251 Maguire Building
An Application of Weibull Analysis to Determine Failure Rates in Automotive Components
An Application of Weibull Analysis to Determine Failure Rates in Automotive Components Jingshu Wu, PhD, PE, Stephen McHenry, Jeffrey Quandt National Highway Traffic Safety Administration (NHTSA) U.S. Department
Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition
Brochure More information from http://www.researchandmarkets.com/reports/2170926/ Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd
Business Analytics and Credit Scoring
Study Unit 5 Business Analytics and Credit Scoring ANL 309 Business Analytics Applications Introduction Process of credit scoring The role of business analytics in credit scoring Methods of logistic regression
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal
Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.7 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Linear Regression Other Regression Models References Introduction Introduction Numerical prediction is
SOA 2013 Life & Annuity Symposium May 6-7, 2013. Session 30 PD, Predictive Modeling Applications for Life and Annuity Pricing and Underwriting
SOA 2013 Life & Annuity Symposium May 6-7, 2013 Session 30 PD, Predictive Modeling Applications for Life and Annuity Pricing and Underwriting Moderator: Barry D. Senensky, FSA, FCIA, MAAA Presenters: Jonathan
Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors
Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann
Dealing with continuous variables and geographical information in non life insurance ratemaking. Maxime Clijsters
Dealing with continuous variables and geographical information in non life insurance ratemaking Maxime Clijsters Introduction Policyholder s Vehicle type (4x4 Y/N) Kilowatt of the vehicle Age Age of the
Finding Supporters. Political Predictive Analytics Using Logistic Regression. Multivariate Solutions
Finding Supporters Political Predictive Analytics Using Logistic Regression Multivariate Solutions What is Logistic Regression? In a political application, logistic regression can describe the outcome
Prediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
Gerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
Implied Matching. Functionality. One of the more difficult challenges faced by exchanges is balancing the needs and interests of
Functionality In Futures Markets By James Overdahl One of the more difficult challenges faced by exchanges is balancing the needs and interests of different segments of the market. One example is the introduction
Data Mining Methods: Applications for Institutional Research
Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014
Data Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
Modeling Lifetime Value in the Insurance Industry
Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting
Multiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
from Larson Text By Susan Miertschin
Decision Tree Data Mining Example from Larson Text By Susan Miertschin 1 Problem The Maximum Miniatures Marketing Department wants to do a targeted mailing gpromoting the Mythic World line of figurines.
Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems
Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems Ran M. Bittmann School of Business Administration Ph.D. Thesis Submitted to the Senate of Bar-Ilan University Ramat-Gan,
Data Mining Techniques Chapter 6: Decision Trees
Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................
VEHICLE SURVIVABILITY AND TRAVEL MILEAGE SCHEDULES
DOT HS 809 952 January 2006 Technical Report VEHICLE SURVIVABILITY AND TRAVEL MILEAGE SCHEDULES Published By: NHTSA s National Center for Statistics and Analysis This document is available to the public
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
Predictive Data modeling for health care: Comparative performance study of different prediction models
Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath [email protected] National Institute of Industrial Engineering (NITIE) Vihar
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
Customer and Business Analytic
Customer and Business Analytic Applied Data Mining for Business Decision Making Using R Daniel S. Putler Robert E. Krider CRC Press Taylor &. Francis Group Boca Raton London New York CRC Press is an imprint
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
Nine Common Types of Data Mining Techniques Used in Predictive Analytics
1 Nine Common Types of Data Mining Techniques Used in Predictive Analytics By Laura Patterson, President, VisionEdge Marketing Predictive analytics enable you to develop mathematical models to help better
Data Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY
Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship
The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Analyst @ Expedia
The Impact of Big Data on Classic Machine Learning Algorithms Thomas Jensen, Senior Business Analyst @ Expedia Who am I? Senior Business Analyst @ Expedia Working within the competitive intelligence unit
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through
Compressed Natural Gas Study for Westport Light Duty, Inc. Kelley Blue Book Irvine, California April 3, 2012
Compressed Natural Gas Study for Westport Light Duty, Inc. Kelley Blue Book Irvine, California April 3, 2012 2 Overview Westport Light Duty is part of the Westport Innovations company, a leader in the
Car Insurance. Prvák, Tomi, Havri
Car Insurance Prvák, Tomi, Havri Sumo report - expectations Sumo report - reality Bc. Jan Tomášek Deeper look into data set Column approach Reminder What the hell is this competition about??? Attributes
Employer Health Insurance Premium Prediction Elliott Lui
Employer Health Insurance Premium Prediction Elliott Lui 1 Introduction The US spends 15.2% of its GDP on health care, more than any other country, and the cost of health insurance is rising faster than
A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND
Paper D02-2009 A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND ABSTRACT This paper applies a decision tree model and logistic regression
Model Selection. Introduction. Model Selection
Model Selection Introduction This user guide provides information about the Partek Model Selection tool. Topics covered include using a Down syndrome data set to demonstrate the usage of the Partek Model
Business Analytics using Data Mining Project Report. Optimizing Operation Room Utilization by Predicting Surgery Duration
Business Analytics using Data Mining Project Report Optimizing Operation Room Utilization by Predicting Surgery Duration Project Team 4 102034606 WU, CHOU-CHUN 103078508 CHEN, LI-CHAN 102077503 LI, DAI-SIN
Some Essential Statistics The Lure of Statistics
Some Essential Statistics The Lure of Statistics Data Mining Techniques, by M.J.A. Berry and G.S Linoff, 2004 Statistics vs. Data Mining..lie, damn lie, and statistics mining data to support preconceived
Establishing a Cost Effective Fleet Replacement Program
Establishing a Cost Effective Fleet Replacement Program Regardless of what purpose your company s fleet serves, there are certain fundamentals to keep in mind when designing and implementing a cost-effective
IBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
Chapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
Analysis of Whether the Prices of Renewable Fuel Standard RINs Have Affected Retail Gasoline Prices
Analysis of Whether the Prices of Renewable Fuel Standard RINs Have Affected Retail Gasoline Prices A Whitepaper Prepared for the Renewable Fuels Association Key Findings Changes in prices of renewable
Application of SAS! Enterprise Miner in Credit Risk Analytics. Presented by Minakshi Srivastava, VP, Bank of America
Application of SAS! Enterprise Miner in Credit Risk Analytics Presented by Minakshi Srivastava, VP, Bank of America 1 Table of Contents Credit Risk Analytics Overview Journey from DATA to DECISIONS Exploratory
A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model
A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model Twinkle Patel, Ms. Ompriya Kale Abstract: - As the usage of credit card has increased the credit card fraud has also increased
Regression 3: Logistic Regression
Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic regression Logistic regression in R Outline Logistic regression Introduction The model Looking at and comparing
Data Mining Individual Assignment report
Björn Þór Jónsson [email protected] Data Mining Individual Assignment report This report outlines the implementation and results gained from the Data Mining methods of preprocessing, supervised learning, frequent
A Property & Casualty Insurance Predictive Modeling Process in SAS
Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing
6 Steps to Data Blending for Spatial Analytics
6 Steps to Data Blending for Spatial Analytics What is Spatial Analytics? Spatial analytics goes beyond understanding the physical location of key assets on a map, enabling you to gain deep insights into
Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
Understanding Your Solar-Powered Electric Vehicle. Home of the Sunray and Kudo Solar-Powered Electric Vehicles
Understanding Your Electric Vehicle Home of the Sunray and Kudo Electric Vehicles Most people in today s age have heard about solar power. And most everyone understands the benefits of conserving energy...from
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
The Predictive Data Mining Revolution in Scorecards:
January 13, 2013 StatSoft White Paper The Predictive Data Mining Revolution in Scorecards: Accurate Risk Scoring via Ensemble Models Summary Predictive modeling methods, based on machine learning algorithms
APPLICATION OF DATA MINING TECHNIQUES FOR DIRECT MARKETING. Anatoli Nachev
86 ITHEA APPLICATION OF DATA MINING TECHNIQUES FOR DIRECT MARKETING Anatoli Nachev Abstract: This paper presents a case study of data mining modeling techniques for direct marketing. It focuses to three
Paper AA-08-2015. Get the highest bangs for your marketing bucks using Incremental Response Models in SAS Enterprise Miner TM
Paper AA-08-2015 Get the highest bangs for your marketing bucks using Incremental Response Models in SAS Enterprise Miner TM Delali Agbenyegah, Alliance Data Systems, Columbus, Ohio 0.0 ABSTRACT Traditional
A New Approach for Evaluation of Data Mining Techniques
181 A New Approach for Evaluation of Data Mining s Moawia Elfaki Yahia 1, Murtada El-mukashfi El-taher 2 1 College of Computer Science and IT King Faisal University Saudi Arabia, Alhasa 31982 2 Faculty
Point Biserial Correlation Tests
Chapter 807 Point Biserial Correlation Tests Introduction The point biserial correlation coefficient (ρ in this chapter) is the product-moment correlation calculated between a continuous random variable
Predictive Analytics: Extracts from Red Olive foundational course
Predictive Analytics: Extracts from Red Olive foundational course For more details or to speak about a tailored course for your organisation please contact: Jefferson Lynch: [email protected]
Predicting earning potential on Adult Dataset
MSc in Computing, Business Intelligence and Data Mining stream. Business Intelligence and Data Mining Applications Project Report. Predicting earning potential on Adult Dataset Submitted by: xxxxxxx Supervisor:
These VAT rules apply whether you are a sole trader or a limited company.
Cars and Vehicles VAT You will not be able to reclaim any input VAT on the purchase of your car, (nor will you have to charge any output VAT on sale, unless you are in the very unlikely business of making
not possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
Binary Logistic Regression
Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including
ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS
DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.
IBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic
A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic Report prepared for Brandon Slama Department of Health Management and Informatics University of Missouri, Columbia
Guidelines for Monthly Statistics Data Collection
Guidelines for Monthly Statistics Data Collection Final version Data Expert Group 31 May 2015 1. Table of contents 1. Table of contents... 2 2. Remarks... 3 3. Introduction... 3 4. Geographical perimeter
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Demand forecasting & Aggregate planning in a Supply chain. Session Speaker Prof.P.S.Satish
Demand forecasting & Aggregate planning in a Supply chain Session Speaker Prof.P.S.Satish 1 Introduction PEMP-EMM2506 Forecasting provides an estimate of future demand Factors that influence demand and
Simple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
