Using Zip-Code as an Attribute in Direct Marketing Research

Similar documents
IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 22

MARK IMC Dr. Freling EXAM II REVIEW

Getting the Most from Demographics: Things to Consider for Powerful Market Analysis

Overview of MARKET SEGMENTATION. A Tool for Targeting Recruitment

Lecture 3: Models of Spatial Information

PASW Direct Marketing 18

OPTIMIZING YOUR MARKETING STRATEGY THROUGH MODELED TARGETING

Improve Marketing Campaign ROI using Uplift Modeling. Ryan Zhao

IBM SPSS Direct Marketing 19

Market Research. Module 2. Driving Revenue with an Optimal Marketing Strategy Solutions. Advancing Your Success.

CHAPTER THREE Market Segmentation and Strategic Targeting

Local outlier detection in data forensics: data mining approach to flag unusual schools

Market Segmentation and Product Differentiation. What is Market Segmentation? Market Segmentation. Lectures in Marketing 2003 Dr.

Algebra I Sample Questions. 1 Which ordered pair is not in the solution set of (1) (5,3) (2) (4,3) (3) (3,4) (4) (4,4)

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

IBM SPSS Direct Marketing 20

Segmentation: Foundation of Marketing Strategy

CLUSTER ANALYSIS FOR SEGMENTATION

Chapter Seven Customer-Driven Marketing Strategy: Creating Value for Target Customers

The Carnegie Conference Walt Disney World Resort January 24, 2013

Easily Identify the Right Customers

UNDERSTANDING CUSTOMERS - PROFILING AND SEGMENTATION

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Market Analysis: Using GIS to Analyze Areas for Business Retail Expansion

EXAMINING DIRECT & INTERACTIVE MARKETING APPLICATIONS IN A VARIETY OF SECTORS

Modeling Multi-Channel Response Behavior Richard J. Courtheoux

Spatial Data Analysis

HOW TO USE YOUR DATABASE OR DATA WAREHOUSE OF PROFITABILITY INFORMATION TO MAKE BETTER MARKETING DECISIONS

Paper AA Get the highest bangs for your marketing bucks using Incremental Response Models in SAS Enterprise Miner TM

Technical Note. Consumer Confidence Survey Technical Note February Introduction and Background

Predictive Modeling on the Cheap

DEVELOPING LISTS AND DISCOVERING MARKETS

B2C Case Study: Service Company

Insight for location-powered decision making.

Successful Direct Marketing Techniques. Presented by Richard Bufkin, VP

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

B203A Q. Week 9 Marketing Chapter 4 Chapter 6

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

Statistics, Data Analysis & Econometrics

Introduction to Data Mining

One Size Doesn t Fit All: How a Segmentation Approach Can Help Guide CE Product Strategy

Market Research CUSTOMER SEGMENTATION

Easily Identify Your Best Customers

Environmental Remote Sensing GEOG 2021

Direct Marketing of Insurance. Integration of Marketing, Pricing and Underwriting

Additional sources Compilation of sources:

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Social Studies 201 Notes for November 19, 2003

A HYBRID APPROACH FOR AUTOMATED AREA AGGREGATION

Geodemographic Segmentation: Do Birds of a Feather Flock Together?

Introduction to Machine Learning Using Python. Vikram Kamath

Modeling Lifetime Value in the Insurance Industry

37 Marketing Automation Best Practices David M. Raab Raab Associates Inc.

Understanding Your Retail Trade Area

EVALUATION AND MEASUREMENT IN MARKETING: TRENDS AND CHALLENGES

WHITE PAPER: Optimizing Employee Recognition Programs

DEPARTMENT OF AGRICULTURE. Food and Nutrition Service. Agency Information Collection Activities: Proposed Collection; Comment Request

IST 557 Final Project

Demystifying the Modeling Process An Introduction to Database Marketing Practices

BINOMIAL OPTIONS PRICING MODEL. Mark Ioffe. Abstract

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Data Mining: Overview. What is Data Mining?

Scoring of Bank Customers for a Life Insurance Campaign

FOR375 EXAM #2 STUDY SESSION SPRING Lecture 14 Exam #2 Study Session

In comparison, much less modeling has been done in Homeowners

Ira J. Haimowitz Henry Schwarz

Problem of Missing Data

Market Simulators for Conjoint Analysis

CALCULATIONS & STATISTICS

International Tourism Market Segmentation Based on Consumer Behavior

Predict the Popularity of YouTube Videos Using Early View Data

Descriptive Methods Ch. 6 and 7

MATH2740: Environmental Statistics

CHAPTER 12 TESTING DIFFERENCES WITH ORDINAL DATA: MANN WHITNEY U

5.3 The Cross Product in R 3

A HYDROLOGIC NETWORK SUPPORTING SPATIALLY REFERENCED REGRESSION MODELING IN THE CHESAPEAKE BAY WATERSHED

Integrating Numeric, Statistical, and Geospatial Data Services for Graduate Students

Vermont Department of Tourism and Marketing A Geo-Demographic Analysis of the Vermont Visitor Project Report

Targeted Marketing, KDD Cup and Customer Modeling

Decision support system based on socio-demographic segmentation and distribution channel analysis in the US furniture market

Missing data: the hidden problem

Introduction to General and Generalized Linear Models

FLORIDA SOUTHERN COLLEGE

The impact of question wording reversal on probabilistic estimates of defection / loyalty for a subscription product

An Overview of Predictive Analytics for Practitioners. Dean Abbott, Abbott Analytics

Analyzing CRM Results: It s Not Just About Software and Technology

Market Segmentation, Targeting, and Positioning. Leonard Walletzký

GIS I Business Exr02 (av 9-10) - Expand Market Share (v3b, Jul 2013)

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Brand Website Activity Impact Analysis:

Billing Zip Codes in Cellular Telephone Sampling

ConsumerViewSM. Insight on more than 235 million consumers and 113 million households to improve your marketing campaigns

Poisson Models for Count Data

PUBLIC DISCLOSURE COMMUNITY REINVESTMENT ACT PERFORMANCE EVALUATION. October 18, Harris Bank Elk Grove, N.A. Charter Number: 15916

Workshop Discussion Notes: Housing

The Nature and Causes of the Increase in Direct Mail Volume in the Last. Half of the Twentieth Century

Weight of Evidence Module

Obesity in America: A Growing Trend

Transcription:

Using Zip-Code as an Attribute in Direct Marketing Research Raja Sengupta Siddhartha Bhattacharya Department of Geography Department of Management Southern Illinois University Carbondale, IL 62901 sarojsen@siu.edu 618-453-7252 sidb@siu.edu 618-453-7884 Introduction The importance a locational indicator such as zip-codes in identifying potential customers is readily recognized by the overwhelming interest of marketers in the field of "geodemographics". According to Goss (1995), "geodemographics is an information technology that enables marketers to predict behavioral responses of customers based on statistical models of identity and residential location". Therefore, the rationale behind all geodemographic systems is that individuals can be divided into groups based on certain characteristics. These characteristics can include income, education, product preference and household type. Further, geodemographic systems also follow the premise that "birds of a feather flock together", or that individuals who fall into the same group live in similar neighborhoods. All geodemographic systems start with some basic unit of geography that can be used to divide the U.S. into segments for which relevant information such as income, education etc. is available. These units can be census blocks, zip-codes or individual households. The geodemographic systems then groups the basic units based on aggregates of psychographic (personality traits), consumer behavior (product loyalty etc.) and demographic information (age, income, race etc.) for the region. Each group is termed as a "segment" or "cluster". Mitchell (1995) describes a "cluster" as "a class of households with common demographic and lifestyle characteristics, designated by a label". The clusters are developed from the data attached to the basic geographic units using multivariate regression analysis. Once developed, these clusters can then be used to predict the location of potential buyers. Given the fact that location can have a direct bearing on response rate, it would only seem sensible that zip-codes should be successful in developing DM models that predict customer response to direct marketing campaigns. However, although customer's zip-codes are also found in the database, they are often not used as an attribute in developing the DM model because they do not appear to have any information about the customer's buying habits. Part of the reason for this paradox may be because a nationwide mailing may not elicit response from more than a single customer in a certain zip-code. Each zip-code then would yield very little information. An alternative is to cluster the zip-codes into groups based on spatial adjacency (i.e. zip-codes that are adjacent to one another fall into the same category) and then use the combined zip-codes as an attribute in the DM model. The assumption here is that adjacent zip-codes will have similar response rates. However, as consecutive zipcodes are not spatially adjacent to one another, traditional databases cannot be used to group the zip-codes in this http://hsb.baylor.edu/ramsower/ais.ac.97/papers/sengupta.htm (1 of 5)3/25/2004 10:34:58 AM

fashion. This problem can be overcome using a Geographic Information System(GIS). In this paper, we present one approach to clustering zip-codes using a GIS, and present some preliminary results obtained following the clustering. Methodology Silk (1979) provides two methods to cluster point patterns into groups. The first technique is termed "quadrat analysis". Using a GIS with both raster and vector capability (such as UNIX ARC/INFO), a grid can be overlain over the points distributed in space. Silk also proposed a rule of thumb for selecting the cell size for the grid, where he suggested that the cell size should approximately equal twice the mean area per point. Once a grid has been overlain over the points, all the points that fall in the same cell of the grid can be cluster together as one group. We used this technique to cluster the zip-codes into groups based on spatial adjacency. A second technique provided by Silk (1979) to cluster points uses the nearest neighbor algorithm. The nearest neighbor algorithm identifies the closest neighbors to each point in a distribution of points. Thus, a zip-code can be clustered with it's nearest neighboring zip-codes. Future work done by us will focus on the effectiveness of using the nearest neighbor algorithm instead of quadrat analysis to cluster zip-codes. To test the hypothesis that grouping spatially adjacent zip-codes will allow them to be used as an attribute in developing DM models, we used the clustered zip-codes and a real world dataset of 26,178 customers. Our objective was to develop a score indicating response rate for each cell of the grid, and use decile analysis to determine the benefits of concentrating sales to customers living in cells with a high score. The real world dataset contained information about customers who had responded (responders) or not responded (non-responders) to a direct marketing (DM) campaign. The first step in the process was to sum the responders and non-responders for each zip-code. Based on the aggregate total of responders and non-responders per zip-code in a cell of the grid, two models were developed that generated a score for each cell. The scores (WGi) was calculated as follows: 1) For Model 1 where 2) For Model 2 In both the equations above, Wz denotes the score of an individual zip-code Z in a cell, rz is the number of responders in a zip-code, nz is the number of non-responders in a zip-code, and NG i is the total number of ziphttp://hsb.baylor.edu/ramsower/ais.ac.97/papers/sengupta.htm (2 of 5)3/25/2004 10:34:58 AM

codes in cell Gi. The scores obtained from the models above were then associated with the individuals in the main database using their zip-codes. In keeping with DM industry practice, performance of the models was examined through decile analysis (David Shepard Associates, 1995) of the scores assigned to the individual customers. The results obtained are shown in Table 1 and 2. Further, another model was developed that examined the premise that the information content of the zip-codes are adequate by themselves, and aggregating adjacent zip-codes spatially using quadrat analysis yields no added benefit. The formula for this model is given below (the symbols are as defined earlier): As before, the scores Wz obtained above (for each zip-code, in this case) were associated with the individuals in the main dataset using zip-codes, and model performance examined through decile analysis(table 3). Results and Discussion A decile analysis shows individuals ranked by their respective model scores - higher scores indicating better performance -- and separated into 10 equal groups. In table 1, a typical decile analysis, the first row (top decile) indicates performance for the best 10% of the individuals as identified by the model. The Cumulative Lift (CL) provides a measure of improvement over a random mailing, and is calculated as follows: Thus, in Table 1, a cumulative lift of 208 in the top decile indicates that the first model is expected to identify 2.08 times more responders than a random mailing to 10% of the file. Similarly, if 20% of the file is mailed to, the first model is expected to perform 1.75 times better than a random mailing (no model). In Table 2, a cumulative lift of 220 and 185 was achieved for the top two decile by the second model. On the other hand, the third model (Table 3) identified only 1.27 and 1.19 times more responders than a random mailing survey for the top two deciles. These results indicate that spatial clustering of the 5-digit zip-codes from the real world dataset using quadrat analysis improves the information content of the zip-codes, and thereby allowing them to be used as an attribute along with other variables in developing DM models. Future work will focus on using nearest neighbor analysis to cluster zip-codes, as well as testing the model developed on other real world datasets. References: available on request from the authors. http://hsb.baylor.edu/ramsower/ais.ac.97/papers/sengupta.htm (3 of 5)3/25/2004 10:34:58 AM

Decile Customers top 2,618 1,572 208 2 2,619 1,081 175 3 2,619 953 159 4 2,619 901 149 5 2,618 813 140 6 2,619 740 133 7 2,619 645 126 8 2,619 560 120 9 2,619 310 111 Table 1: Model 1 decile analysis Decile Customers top 2,618 1,665 220 2 2,619 1,144 185 3 2,619 987 167 4 2,619 892 155 5 2,618 800 145 6 2,619 706 136 7 2,619 616 128 8 2,619 499 121 9 2,619 266 111 Table 2: Model 2 decile analysis http://hsb.baylor.edu/ramsower/ais.ac.97/papers/sengupta.htm (4 of 5)3/25/2004 10:34:58 AM

Decile Customers top 2,618 962 127 2 2,619 843 119 3 2,619 880 118 4 2,619 821 116 5 2,618 862 115 6 2,619 804 114 7 2,619 844 113 8 2,619 1,004 116 9 2,619 555 111 Table 3: Model 3 decile analysis http://hsb.baylor.edu/ramsower/ais.ac.97/papers/sengupta.htm (5 of 5)3/25/2004 10:34:58 AM