A Study of Car Insurance in the Netherlands. BUDT733: Spring 2011

Size: px
Start display at page:

Download "A Study of Car Insurance in the Netherlands. BUDT733: Spring 2011"

Transcription

1 A Study of Car Insurance in the Netherlands BUDT733: Spring 2011 Vijayakumar Ayyaswamy Logan Baranowitz Cyrus Havewala Stephanie Romich Car Insurance in Netherlands Page 1 of 7

2 Executive Summary The project is to analyze data for a car insurance firm in Holland. The firm will use the report to target the zip codes that best suit the business and create marketing strategies to promote their insurance products in the area. The data collected for the projects include product usage and socio-demographic information for different zip codes in the country. Each record corresponds to a particular zip code and customer type with details about percentage of the population belonging to various demographic categories, average contribution to other policies and average number of other policies held by the group. The intent of the analysis is to profile the zip code area to create marketing strategies that will increase the buyer sensitivity towards the car insurance and avoid investment in areas that do not suit the business. The analysis will help in cutting costs, effectively using the advertising expenditures to target right customers and increase the return on investment. It is common sense to consider the areas where usage of cars is more. Contributions to other policies provide insight on their existing usage pattern and their spending power. For example, more contributions to bicycle insurance or moped insurance in a zip code shows that the population is more inclined to use bicycles than cars. The area may be congested and densely populated like downtowns where the preference is smaller vehicle and easier accessibility. Our analysis showed exactly same pattern. 1. The area with high contributions to bicycle, moped, fire policies and third party firms will not contribute to car insurance. The area may be densely populated and preference is smaller vehicles. Contribution towards third party firm insurance denotes that area is business center with possibility of less parking and more crowds. 2. The area with high contributions to social security insurance and tractor policies show potential for car insurance. It shows that the area may be farm lands or outskirts where the need for car is more. 3. The analysis also shows that the areas where the more than 50% of the population own at least one car and have more than 2 policies other than car insurance are conducive for the business. Based on the analysis, we recommend the following options for targeted marketing 1. Print advertising and direct mail marketing: Advertise on local papers in the targeted zip codes and send out direct mails through postal. They should target rural communities in which there is not a high concentration of alternative transportation such as Mopeds or bicycles. 2. Joint marketing with car dealers in the area may prove profitable. Areas in which more than 50% of the population have more than 1 car, seem to have a higher potential of having car policies. By encouraging people to buy more cars, the company can increase the market for car policies. 3. Provide bundled products of car and tractor policies. As noted previously, rural areas appear to be a significant portion of the company s customers. By providing a bundled product of tractor and car policy, the company can reach customers that may not have considered purchasing car insurance on its own. Car Insurance in Netherlands Page 2 of 7

3 Technical Summary Goal Definition: The overarching goal of the project is to use a dataset containing demographic and insurance information of 9822 zip codes and see if this data is helpful in explaining the purchase of car insurance in these zip codes. Data description: The insurance dataset contains 9822 records and 85 dimensions. Each record contains the common characteristics of households in a postal zip code. 42 dimensions represent different policy types, out of which 21 indicate the average number of policies owned and the other 21 indicate the total monetary contribution to those policies, for a particular zip code. The next 39 dimensions are categorical and show the percentage of total households within a zip code that represent the dimension. For example, 5 dimensions represent household income level categories, and the records indicate what percentage of households in a zip code fall within that income level. The other major categories covered are Social Class (5), Profession (5), Religious Affiliation (4), Marital Status (4), Education (3), No. of Cars (3), Rent/Own Home (2), Children (2), Health Insurance (2), Average Income (1), Status (1), Average Age (1) and Purchasing Power (1). Finally, Number of Houses and Average Household size are numerical dimensions. The data set attempts to classify the zip codes into customer categories which is indicated by the dimension Customer Type. Data preprocessing, Exploratory Data Analysis and Choice of Variables: The raw data contained numbers that were linked to a dictionary with the actual bin definitions for each dimension. The first step was to convert the raw data into meaningful bins. For example, in the Average Age dimension, a value of 2 was converted to the bin years. Next, dimensions representing the same major category were consolidated. For example, the 5 dimensions representing household income levels (< 30k, 30-40k, etc.) with bins showing average percentage of households falling in that category were consolidate to a single dimension Household Income. The record was modified to reflect the majority value of the 5 dimensions i.e. if 30-40k had the largest percentage of households, then it became the representative for Household Income for that record. Initial data exploration in Spot fire indicated that there was no meaningful relationship between the demographic data and the number of car insurance policies owned by people in a zip code. In most of the cases, for any demographic variable, the number of zip codes with car insurance (and any other insurance, for that matter) was about 50%. However, there was a strong relationship between a zip codes average contribution to other different kinds of policies with car insurance policies. For example, zip codes with high contribution to bicycle or tractor insurance did not have car policies. Once the contribution to different polices and total number of policies (3 or more) owned in a zip code were identified as the important dimensions toward owning car insurance, the dataset was culled by Car Insurance in Netherlands Page 3 of 7

4 eliminating all the demographic information. At this point, the data was ready to be used in different classification models. The variables we identified to include in further analysis consisted of the following: Variable Description Example third party insurance for personal insurance Contribution private third party insurance Contribution third party insurance (firms) third party insurance for firms The contribution is denoted in a range (Netherland Currency): 0, 1-49, 50-99, , , , , , , > If the average contribution to insure on third party individuals for a zip code is 150, the data is denoted as 4. Contribution tractor policies Contribution moped policies tractor policies moped policies Contribution fire policies fire policies Contribution bicycle policies bicycle policies Contribution social security insurance social security insurance policies policies Total Number of Policies (not Car) > 2 No Car < 50% Dummy variable created to denote 0 if the total number of policies other than car insurance less than or equal to 2 and 1 if the count is greater than 2 Dummy variable created to denote 1 if the percentage of population in the zip code with no cars is less than or equal to 50% and 0 if the percentage is greater than 50% Values are 0 or 1 Values are 0 or 1 Choice of methods and models used: Since the goal of the project was to profile the data, the following methods were deemed appropriate: The Naïve Rule, Classification Trees and Logistic Regression. The major characteristics of each of the models are displayed in the table below: Model Sensitivity Specificity False Positive False Negative Overall Error Naïve Rule 50.88% 49.12% 0.00% 49.12% 49.12% - Classification tree 54.63% 45.37% 24.18% 10.33% 34.51% 29.74% Logistic Regression 64.07% 35.93% 27.65% 8.28% 35.93% 26.86% Lift Car Insurance in Netherlands Page 4 of 7

5 The Classification Tree Model (Exhibit D) had the following characteristics: Used the log of all individual contribution amounts, total policies > 2 variable and <50% have no car variable Tree was pruned to use only 6 decision points Additional contribution variables had very little effect on the overall accuracy. Error rate is 34.51% The Logistic Regression Model (Exhibit E) had the following characteristics: Started with same variables as the Classification Tree (including all contributions) Narrowed best output to a model with nine variables Error rate is 35.93% Interesting note four of the contribution variables had negative coefficients, meaning that zip codes with higher average contributions to these policies were less likely to purchase at least one car insurance policy Based upon the above results we see that the classification tree and the Logistic regression model provide a significant lift to the Naïve rule. Even though the classification tree is marginally better with the overall error rate and specificity, the logistic regression model is the best fit for our overall goal since it provides a more complete picture of the characteristics of zip codes with car policies due to the increased number of variables included in the final model. Car Insurance in Netherlands Page 5 of 7

6 Exhibit A: Similar Demographics for those with or without Car Policies Exhibit B Effect of Contribution to Moped Policies Zip Codes with Contributions to Moped Policies have lower % with Car Policy Exhibit C - Effect of Contribution to Fire Policies on Car Policy Insurance Zip Codes with increasing Contributions to Fire Policies have higher % with Car Policy Car Insurance in Netherlands Page 6 of 7

7 Exhibit D Classification Tree: Pruned Tree Exhibit E - Logistic Regression Results P rior cla ss proba bilitie s A ccording to relative occurrences in training data rdParty_prv C las s 1 0 The Re gre ssion M ode l Pr o b <-- Success Class MopedPolicy_ FirePolicy_c FirePolicy_c Input variables Constant term Contribution_3rdParty _prvt_t Contribution_3rdParty _f irms _ Contribution_trac torpolic y _Tr Contribution_MopedPolic y _Tr Contribution_FirePolic y _Tran Contribution_Bic y c lepolic y_t Contribution_s s _ins _polic y _t Polic y Count > 2 (not c ar) <50% No Car Tra ining Da ta scoring - S um m a ry Re port Coe fficient Std. Error p -value Odds * Cut off Prob.Val. for Success (Updatable) BicyclePolic Policy # > Classification Confusion M atrix Pre dicted Class Actual Clas s Error Report C las s # C as e s # Er r o r s % Er r o r Ove rall Car Insurance in Netherlands Page 7 of 7

Car Insurance Policies. TEAM 1 Vijayakumar Ayyaswamy Logan Baranowitz Cyrus Havewala Stephanie Romich

Car Insurance Policies. TEAM 1 Vijayakumar Ayyaswamy Logan Baranowitz Cyrus Havewala Stephanie Romich Car Insurance Policies TEAM 1 Vijayakumar Ayyaswamy Logan Baranowitz Cyrus Havewala Stephanie Romich Agenda Data Source Description Key Question Data Preprocessing and Exploration Models Summary and Conclusion

More information

Understanding Characteristics of Caravan Insurance Policy Buyer

Understanding Characteristics of Caravan Insurance Policy Buyer Understanding Characteristics of Caravan Insurance Policy Buyer May 10, 2007 Group 5 Chih Hau Huang Masami Mabuchi Muthita Songchitruksa Nopakoon Visitrattakul Executive Summary This report is intended

More information

Prediction of Car Prices of Federal Auctions

Prediction of Car Prices of Federal Auctions Prediction of Car Prices of Federal Auctions BUDT733- Final Project Report Tetsuya Morito Karen Pereira Jung-Fu Su Mahsa Saedirad 1 Executive Summary The goal of this project is to provide buyers who attend

More information

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an

More information

Lowering social cost of car accidents by predicting high-risk drivers

Lowering social cost of car accidents by predicting high-risk drivers Lowering social cost of car accidents by predicting high-risk drivers Vannessa Peng Davin Tsai Shu-Min Yeh Why we do this? Traffic accident happened every day. In order to decrease the number of traffic

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

Determining Factors of a Quick Sale in Arlington's Condo Market. Team 2: Darik Gossa Roger Moncarz Jeff Robinson Chris Frohlich James Haas

Determining Factors of a Quick Sale in Arlington's Condo Market. Team 2: Darik Gossa Roger Moncarz Jeff Robinson Chris Frohlich James Haas Determining Factors of a Quick Sale in Arlington's Condo Market Team 2: Darik Gossa Roger Moncarz Jeff Robinson Chris Frohlich James Haas Executive Summary The real estate market for condominiums in Northern

More information

Determining optimum insurance product portfolio through predictive analytics BADM Final Project Report

Determining optimum insurance product portfolio through predictive analytics BADM Final Project Report 2012 Determining optimum insurance product portfolio through predictive analytics BADM Final Project Report Dinesh Ganti(61310071), Gauri Singh(61310560), Ravi Shankar(61310210), Shouri Kamtala(61310215),

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Numerical Algorithms Group

Numerical Algorithms Group Title: Summary: Using the Component Approach to Craft Customized Data Mining Solutions One definition of data mining is the non-trivial extraction of implicit, previously unknown and potentially useful

More information

Easily Identify the Right Customers

Easily Identify the Right Customers PASW Direct Marketing 18 Specifications Easily Identify the Right Customers You want your marketing programs to be as profitable as possible, and gaining insight into the information contained in your

More information

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4. Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics

More information

The Insurance Company (TIC) Benchmark Original Problem Task Description

The Insurance Company (TIC) Benchmark Original Problem Task Description The Insurance Company (TIC) Benchmark Original Problem Task Description (This was the original text at the CoIL Challenge 2000 website. The Challenge is closed now.) Direct mailings to a company s potential

More information

not possible or was possible at a high cost for collecting the data.

not possible or was possible at a high cost for collecting the data. Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

More information

from Larson Text By Susan Miertschin

from Larson Text By Susan Miertschin Decision Tree Data Mining Example from Larson Text By Susan Miertschin 1 Problem The Maximum Miniatures Marketing Department wants to do a targeted mailing gpromoting the Mythic World line of figurines.

More information

Applying Customer Attitudinal Segmentation to Improve Marketing Campaigns Wenhong Wang, Deluxe Corporation Mark Antiel, Deluxe Corporation

Applying Customer Attitudinal Segmentation to Improve Marketing Campaigns Wenhong Wang, Deluxe Corporation Mark Antiel, Deluxe Corporation Applying Customer Attitudinal Segmentation to Improve Marketing Campaigns Wenhong Wang, Deluxe Corporation Mark Antiel, Deluxe Corporation ABSTRACT Customer segmentation is fundamental for successful marketing

More information

IBM SPSS Direct Marketing 19

IBM SPSS Direct Marketing 19 IBM SPSS Direct Marketing 19 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This document contains proprietary information of SPSS

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product

Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Sagarika Prusty Web Data Mining (ECT 584),Spring 2013 DePaul University,Chicago sagarikaprusty@gmail.com Keywords:

More information

Predicting earning potential on Adult Dataset

Predicting earning potential on Adult Dataset MSc in Computing, Business Intelligence and Data Mining stream. Business Intelligence and Data Mining Applications Project Report. Predicting earning potential on Adult Dataset Submitted by: xxxxxxx Supervisor:

More information

Easily Identify Your Best Customers

Easily Identify Your Best Customers IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do

More information

Finding Supporters. Political Predictive Analytics Using Logistic Regression. Multivariate Solutions

Finding Supporters. Political Predictive Analytics Using Logistic Regression. Multivariate Solutions Finding Supporters Political Predictive Analytics Using Logistic Regression Multivariate Solutions What is Logistic Regression? In a political application, logistic regression can describe the outcome

More information

Data Mining Applications in Higher Education

Data Mining Applications in Higher Education Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

More information

Predictive Data modeling for health care: Comparative performance study of different prediction models

Predictive Data modeling for health care: Comparative performance study of different prediction models Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath hiremat.nitie@gmail.com National Institute of Industrial Engineering (NITIE) Vihar

More information

Logistic Regression. BUS 735: Business Decision Making and Research

Logistic Regression. BUS 735: Business Decision Making and Research Goals of this section 2/ 8 Specific goals: Learn how to conduct regression analysis with a dummy independent variable. Learning objectives: LO2: Be able to construct and use multiple regression models

More information

Young Researchers Seminar 2011

Young Researchers Seminar 2011 Young Researchers Seminar 2011 Young Researchers Seminar 2011 DTU, Denmark, 8 10 June, 2011 DTU, Denmark, June 8-10, 2011 Quantifying the influence of social characteristics on accident and injuries risk

More information

Machine Learning Logistic Regression

Machine Learning Logistic Regression Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.

More information

Demand for Life Insurance in Malaysia

Demand for Life Insurance in Malaysia Demand for Life Insurance in Malaysia Yiing Jia Loke 1+ and Yi Yuern Goh 2 1 School of Social Sciences, Universiti Sains Malaysia 2 HSBC Bank, Penang. Abstract. The insurance sector in Malaysia has shown

More information

B2C Email Case Study: Service Company

B2C Email Case Study: Service Company B2C Email Case Study: Service Company AcquireWeb worked with a large regional energy provider to conduct a combined email and direct mail campaign to attract new customers in its recently-deregulated consumer

More information

Introduction to Marketing

Introduction to Marketing Introduction to Marketing Theocharis Katranis Spring Semester 2013 1 Today s Lecture 1. We will explain the importance of information in gaining insights about the marketplace and customers. 2. We will

More information

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Getting the Most from Demographics: Things to Consider for Powerful Market Analysis

Getting the Most from Demographics: Things to Consider for Powerful Market Analysis Getting the Most from Demographics: Things to Consider for Powerful Market Analysis Charles J. Schwartz Principal, Intelligent Analytical Services Demographic analysis has become a fact of life in market

More information

Business Analytics using Data Mining

Business Analytics using Data Mining Business Analytics using Data Mining Project Report Indian School of Business Group A6 Bhushan Khandelwal 61410182 61410806 - Mahabaleshwar Bhat Mayank Gupta 61410659 61410697 - Shikhar Angra Sujay Koparde

More information

EARLY VS. LATE ENROLLERS: DOES ENROLLMENT PROCRASTINATION AFFECT ACADEMIC SUCCESS? 2007-08

EARLY VS. LATE ENROLLERS: DOES ENROLLMENT PROCRASTINATION AFFECT ACADEMIC SUCCESS? 2007-08 EARLY VS. LATE ENROLLERS: DOES ENROLLMENT PROCRASTINATION AFFECT ACADEMIC SUCCESS? 2007-08 PURPOSE Matthew Wetstein, Alyssa Nguyen & Brianna Hays The purpose of the present study was to identify specific

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Lecture 15 - ROC, AUC & Lift Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-17-AUC

More information

Binary Logistic Regression

Binary Logistic Regression Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including

More information

Chapter 20: Data Analysis

Chapter 20: Data Analysis Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

More information

How to set the main menu of STATA to default factory settings standards

How to set the main menu of STATA to default factory settings standards University of Pretoria Data analysis for evaluation studies Examples in STATA version 11 List of data sets b1.dta (To be created by students in class) fp1.xls (To be provided to students) fp1.txt (To be

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

More information

Employer Health Insurance Premium Prediction Elliott Lui

Employer Health Insurance Premium Prediction Elliott Lui Employer Health Insurance Premium Prediction Elliott Lui 1 Introduction The US spends 15.2% of its GDP on health care, more than any other country, and the cost of health insurance is rising faster than

More information

Gaining an Understanding of Your Customers Using Portfolio Analysis

Gaining an Understanding of Your Customers Using Portfolio Analysis Gaining an Understanding of Your Customers Using Portfolio Analysis By: Ruby C. Kerr To successfully manage a process, you must control and measure that process. To manage the credit process, today s credit

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

Direct Marketing of Insurance. Integration of Marketing, Pricing and Underwriting

Direct Marketing of Insurance. Integration of Marketing, Pricing and Underwriting Direct Marketing of Insurance Integration of Marketing, Pricing and Underwriting As insurers move to direct distribution and database marketing, new approaches to the business, integrating the marketing,

More information

Imagine what it would mean to your marketing

Imagine what it would mean to your marketing DATA MINING Assessing Loan Risks: A Data Mining Case Study Rob Gerritsen Imagine what it would mean to your marketing clients if you could predict how their customers would respond to a promotion, or if

More information

Creating a Comprehensive Email Scientists Automotive Database

Creating a Comprehensive Email Scientists Automotive Database A Are You Using Email Marketing To Reach Your Automotive Customers and Prospects? 2012 Email Scientists. All rights reserved. www.emailscientists.com ARE YOU USING EMAIL MARKETING TO REACH YOUR AUTOMOTIVE

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

More information

Digital Data Landscape

Digital Data Landscape Digital Data Landscape John Neswadi Discussions Who is John Neswadi? In terms of Digital Data, is anyone benefiting from 3 rd party Digital Data today? Utilizing Behavioral Targeting Benefiting i from

More information

Data Mining: Overview. What is Data Mining?

Data Mining: Overview. What is Data Mining? Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,

More information

VI. The Investigation of the Determinants of Bicycling in Colorado

VI. The Investigation of the Determinants of Bicycling in Colorado VI. The Investigation of the Determinants of Bicycling in Colorado Using the data described earlier in this report, statistical analyses are performed to identify the factors that influence the propensity

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

Projektgruppe. Categorization of text documents via classification

Projektgruppe. Categorization of text documents via classification Projektgruppe Steffen Beringer Categorization of text documents via classification 4. Juni 2010 Content Motivation Text categorization Classification in the machine learning Document indexing Construction

More information

Marketing Applications of Predictive Analytics. Robert J. Walling III, FCAS, MAAA San Diego, CA October 6, 2008

Marketing Applications of Predictive Analytics. Robert J. Walling III, FCAS, MAAA San Diego, CA October 6, 2008 Marketing Applications of Predictive Analytics Robert J. Walling III, FCAS, MAAA San Diego, CA October 6, 2008 Overview Who s Buying What? Who s Selling What? A Proactive Approach Monitoring Results Who

More information

Data Mining III: Numeric Estimation

Data Mining III: Numeric Estimation Data Mining III: Numeric Estimation Computer Science 105 Boston University David G. Sullivan, Ph.D. Review: Numeric Estimation Numeric estimation is like classification learning. it involves learning a

More information

IBM SPSS Direct Marketing

IBM SPSS Direct Marketing IBM Software IBM SPSS Statistics 19 IBM SPSS Direct Marketing Understand your customers and improve marketing campaigns Highlights With IBM SPSS Direct Marketing, you can: Understand your customers in

More information

IBM SPSS Direct Marketing 20

IBM SPSS Direct Marketing 20 IBM SPSS Direct Marketing 20 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This edition applies to IBM SPSS Statistics 20 and to

More information

Data Mining with SAS. Mathias Lanner mathias.lanner@swe.sas.com. Copyright 2010 SAS Institute Inc. All rights reserved.

Data Mining with SAS. Mathias Lanner mathias.lanner@swe.sas.com. Copyright 2010 SAS Institute Inc. All rights reserved. Data Mining with SAS Mathias Lanner mathias.lanner@swe.sas.com Copyright 2010 SAS Institute Inc. All rights reserved. Agenda Data mining Introduction Data mining applications Data mining techniques SEMMA

More information

1DP-BR INDEPENDENT DEALER PROFILE & MEDIA USAGE

1DP-BR INDEPENDENT DEALER PROFILE & MEDIA USAGE 1DP-BR 2013 INDEPENDENT DEALER PROFILE & MEDIA USAGE ABOUT THIS RESEARCH Independent dealers have been and continue to be critical to AutoTrader.com s business. To better understand how Independent dealers

More information

Data Mining for Fun and Profit

Data Mining for Fun and Profit Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools

More information

A Property & Casualty Insurance Predictive Modeling Process in SAS

A Property & Casualty Insurance Predictive Modeling Process in SAS Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing

More information

DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING

DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING ABSTRACT The objective was to predict whether an offender would commit a traffic offence involving death, using decision tree analysis. Four

More information

Some Statistical Applications In The Financial Services Industry

Some Statistical Applications In The Financial Services Industry Some Statistical Applications In The Financial Services Industry Wenqing Lu May 30, 2008 1 Introduction Examples of consumer financial services credit card services mortgage loan services auto finance

More information

Data mining and statistical models in marketing campaigns of BT Retail

Data mining and statistical models in marketing campaigns of BT Retail Data mining and statistical models in marketing campaigns of BT Retail Francesco Vivarelli and Martyn Johnson Database Exploitation, Segmentation and Targeting group BT Retail Pp501 Holborn centre 120

More information

LECTURE 2 SERVICE SYSTEM DESIGN AND DELIVERY PROCESS

LECTURE 2 SERVICE SYSTEM DESIGN AND DELIVERY PROCESS LECTURE 2 SERVICE SYSTEM DESIGN AND DELIVERY PROCESS Learning Objectives To discuss generic approaches to service system design for different service processes 3.5 Utilizing Service Process Structure for

More information

LCs for Binary Classification

LCs for Binary Classification Linear Classifiers A linear classifier is a classifier such that classification is performed by a dot product beteen the to vectors representing the document and the category, respectively. Therefore it

More information

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

More information

Profiles and Data Analysis. 5.1 Introduction

Profiles and Data Analysis. 5.1 Introduction Profiles and Data Analysis PROFILES AND DATA ANALYSIS 5.1 Introduction The survey of consumers numbering 617, spread across the three geographical areas, of the state of Kerala, who have given information

More information

Detecting Email Spam. MGS 8040, Data Mining. Audrey Gies Matt Labbe Tatiana Restrepo

Detecting Email Spam. MGS 8040, Data Mining. Audrey Gies Matt Labbe Tatiana Restrepo Detecting Email Spam MGS 8040, Data Mining Audrey Gies Matt Labbe Tatiana Restrepo 5 December 2011 INTRODUCTION This report describes a model that may be used to improve likelihood of recognizing undesirable

More information

Demographics of Atlanta, Georgia:

Demographics of Atlanta, Georgia: Demographics of Atlanta, Georgia: A Visual Analysis of the 2000 and 2010 Census Data 36-315 Final Project Rachel Cohen, Kathryn McKeough, Minnar Xie & David Zimmerman Ethnicities of Atlanta Figure 1: From

More information

Cluster this! June 2011

Cluster this! June 2011 Cluster this! June 2011 Agenda On the agenda today: SAS Enterprise Miner (some of the pros and cons of using) How multivariate statistics can be applied to a business problem using clustering Some cool

More information

How To Predict Diabetes In A Cost Bucket

How To Predict Diabetes In A Cost Bucket Paper PH10-2012 An Analysis of Diabetes Risk Factors Using Data Mining Approach Akkarapol Sa-ngasoongsong and Jongsawas Chongwatpol Oklahoma State University, Stillwater, OK 74078, USA ABSTRACT Preventing

More information

Programming Exercise 3: Multi-class Classification and Neural Networks

Programming Exercise 3: Multi-class Classification and Neural Networks Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks

More information

Bb 2. Targeting Segmenting and Profiling How to generate leads and get new customers I N S I G H T. Profiling. What is Segmentation?

Bb 2. Targeting Segmenting and Profiling How to generate leads and get new customers I N S I G H T. Profiling. What is Segmentation? Bb 2 ISSUE 2 Targeting Segmenting and Profiling How to generate leads and get new customers Profiling Why, what does it entail and how do you do it? What is Segmentation? The method and the benefits Targeting

More information

Small-to medium-business partnership overview. Partner with Experian to enhance your revenue by helping your clients find and acquire more customers

Small-to medium-business partnership overview. Partner with Experian to enhance your revenue by helping your clients find and acquire more customers Small-to medium-business partnership overview Partner with Experian to enhance your revenue by helping your clients find and acquire more customers By partnering with Experian, you can help your small-to

More information

Media Efficiency Panel MEP INMA Conference, Lissabon

Media Efficiency Panel MEP INMA Conference, Lissabon GfK Consumer Tracking INMA European Conference, Media Efficiency Panel October 2011 Media Efficiency Panel MEP INMA Conference, Lissabon Laurent de Groof GfK Netherlands 2 Adspend Newspapers and Magazines

More information

PASW Direct Marketing 18

PASW Direct Marketing 18 i PASW Direct Marketing 18 For more information about SPSS Inc. software products, please visit our Web site at http://www.spss.com or contact SPSS Inc. 233 South Wacker Drive, 11th Floor Chicago, IL 60606-6412

More information

MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS

MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS MAXIMIZING RETURN ON DIRET MARKETING AMPAIGNS IN OMMERIAL BANKING S 229 Project: Final Report Oleksandra Onosova INTRODUTION Recent innovations in cloud computing and unified communications have made a

More information

Dealing with continuous variables and geographical information in non life insurance ratemaking. Maxime Clijsters

Dealing with continuous variables and geographical information in non life insurance ratemaking. Maxime Clijsters Dealing with continuous variables and geographical information in non life insurance ratemaking Maxime Clijsters Introduction Policyholder s Vehicle type (4x4 Y/N) Kilowatt of the vehicle Age Age of the

More information

Lending Club Interest Rate Data Analysis

Lending Club Interest Rate Data Analysis Lending Club Interest Rate Data Analysis 1. Introduction Lending Club is an online financial community that brings together creditworthy borrowers and savvy investors so that both can benefit financially

More information

Predictive Modeling on the Cheap

Predictive Modeling on the Cheap Predictive Modeling on the Cheap ACT Enrollment Planners Conference Chicago, IL July 24, 2014 Kenton Pauls, Dean of Enrollment Management Mike Wallinga, Director of Institutional Research Northwestern

More information

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Erkan Er Abstract In this paper, a model for predicting students performance levels is proposed which employs three

More information

Visual Presentation Fall 2011

Visual Presentation Fall 2011 Call Center Print House Customer Track Rapid Fresh Prospects C.R.M Exclusive Leads Custom Demographics Highest R.O.I. Local Customers Highest Rated BBB Mail House Call Center Print House Customer Track

More information

Non-Emergent Emergency Department Use among Adults with Disabilities

Non-Emergent Emergency Department Use among Adults with Disabilities Non-Emergent Emergency Department Use among Adults with Disabilities June 8, 2014 David Idala, Nancy Miller, Adele Kirk, Charles Betley, Seung Kim, Yi-An Chen, Ming Liang Dai Introduction Disparities in

More information

Role of Social Networking in Marketing using Data Mining

Role of Social Networking in Marketing using Data Mining Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:

More information

UNIVERSITY OF SOUTHERN CALIFORNIA Marshall School of Business BUAD 425 Data Analysis for Decision Making (Fall 2013) Syllabus

UNIVERSITY OF SOUTHERN CALIFORNIA Marshall School of Business BUAD 425 Data Analysis for Decision Making (Fall 2013) Syllabus UNIVERSITY OF SOUTHERN CALIFORNIA Marshall School of Business BUAD 425 Data Analysis for Decision Making (Fall 2013) Contact Information Syllabus Professor: Dr. Abbass Sharif Office: BRI 400-E Office Hours:

More information

L3: Statistical Modeling with Hadoop

L3: Statistical Modeling with Hadoop L3: Statistical Modeling with Hadoop Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: December 10, 2014 Today we are going to learn...

More information

The Artificial Prediction Market

The Artificial Prediction Market The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory

More information

Learning Objectives: Quick answer key: Question # Multiple Choice True/False. 14.1 Describe the important of accounting and financial information.

Learning Objectives: Quick answer key: Question # Multiple Choice True/False. 14.1 Describe the important of accounting and financial information. 0 Learning Objectives: 14.1 Describe the important of accounting and financial information. 14.2 Differentiate between managerial and financial accounting. 14.3 Identify the six steps of the accounting

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Data Mining is the process of knowledge discovery involving finding

Data Mining is the process of knowledge discovery involving finding using analytic services data mining framework for classification predicting the enrollment of students at a university a case study Data Mining is the process of knowledge discovery involving finding hidden

More information

Addressing Analytics Challenges in the Insurance Industry. Noe Tuason California State Automobile Association

Addressing Analytics Challenges in the Insurance Industry. Noe Tuason California State Automobile Association Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile Association Overview Two Challenges: 1. Identifying High/Medium Profit who are High/Low Risk of Flight Prospects

More information

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK Agenda Analytics why now? The process around data and text mining Case Studies The Value of Information

More information

1 Choosing the right data mining techniques for the job (8 minutes,

1 Choosing the right data mining techniques for the job (8 minutes, CS490D Spring 2004 Final Solutions, May 3, 2004 Prof. Chris Clifton Time will be tight. If you spend more than the recommended time on any question, go on to the next one. If you can t answer it in the

More information

A STUDY ON ASSET MANAGEMENT OF SELECTED AUTOMOBILE COMPANIES IN INDIA

A STUDY ON ASSET MANAGEMENT OF SELECTED AUTOMOBILE COMPANIES IN INDIA 51 A STUDY ON ASSET MANAGEMENT OF SELECTED AUTOMOBILE COMPANIES IN INDIA ABSTRACT DR.M. DHANABHAKYAM*; S.KAVITHA** *Assistant Professor, Department of Commerce, Bharathiar University, Coimbatore - 46.

More information

PAST PRESENT FUTURE YoU can T TEll where ThEY RE going if YoU don T know where ThEY ve been.

PAST PRESENT FUTURE YoU can T TEll where ThEY RE going if YoU don T know where ThEY ve been. PAST PRESENT FUTURE You can t tell where they re going if you don t know where they ve been. L everage the power of millions of customer transactions to maximize your share of customer travel spend. Vistrio

More information

Data Select SM Creating a search and placing an order

Data Select SM Creating a search and placing an order Data Select SM Creating a search and placing an order Introduction Explore the step-by-step procedures for creating a search and submitting an order in the ConsumerView SM database. You will learn how

More information

Debtor s Full Legal Name: Spouse s Full Legal Name: Other Names Ever Used: Email: Tel#: Cell#: Emergency Contact (name & number):

Debtor s Full Legal Name: Spouse s Full Legal Name: Other Names Ever Used: Email: Tel#: Cell#: Emergency Contact (name & number): Law Office of Jeffrey B. Kelly, P.C. Chapter 7 Chapter 13 Bankruptcy Questionnaire DEBTOR INFO: How did you first hear about my office? Office Location Debtor s Full Legal Name: SS# DOB: Spouse s Full

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information