Predicting Car Purchase Intent Using Data Mining Approach



Similar documents
Prediction of Stock Performance Using Analytical Techniques

Data quality in Accounting Information Systems

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Potential Value of Data Mining for Customer Relationship Marketing in the Banking Industry

Towards applying Data Mining Techniques for Talent Mangement

Data Mining Solutions for the Business Environment

Chapter 12 Discovering New Knowledge Data Mining

Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring

Data Mining Algorithms Part 1. Dejan Sarka

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing Classifier

DATA MINING TECHNIQUES AND APPLICATIONS

Comparison of K-means and Backpropagation Data Mining Algorithms

Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case Study: Qom Payame Noor University)

Data mining and statistical models in marketing campaigns of BT Retail

A Neural Network based Approach for Predicting Customer Churn in Cellular Network Services

A Property & Casualty Insurance Predictive Modeling Process in SAS

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

Joseph Twagilimana, University of Louisville, Louisville, KY

Algorithmic Scoring Models

Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad

Predicting Student Performance by Using Data Mining Methods for Classification

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing

Keywords data mining, prediction techniques, decision making.

A Basic Guide to Modeling Techniques for All Direct Marketing Challenges

A New Approach for Evaluation of Data Mining Techniques

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

USING LOGIT MODEL TO PREDICT CREDIT SCORE

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

An Overview of Knowledge Discovery Database and Data mining Techniques

White Paper. Data Mining for Business

Customer Classification And Prediction Based On Data Mining Technique

NEURAL NETWORKS IN DATA MINING

What is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: X DATA MINING TECHNIQUES AND STOCK MARKET

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

Data Mining - Evaluation of Classifiers

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

A Review of Data Mining Techniques

TNS EX A MINE BehaviourForecast Predictive Analytics for CRM. TNS Infratest Applied Marketing Science

Weather forecast prediction: a Data Mining application

Data Mining Applications in Higher Education

USE OF DATA MINING TO DERIVE CRM STRATEGIES OF AN AUTOMOBILE REPAIR SERVICE CENTER IN KOREA

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

Predictive Modeling of Titanic Survivors: a Learning Competition

The Data Mining Process

Identification of User Patterns in Social Networks by Data Mining Techniques: Facebook Case

A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

Small Business Credit Scoring: A Comparison of Logistic Regression, Neural Network, and Decision Tree Models

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS

How To Use Neural Networks In Data Mining

City University of Hong Kong. Information on a Course offered by Department of Management Sciences with effect from Semester A in 2010 / 2011

Data Mining Techniques for Mortality at Advanced Age

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

Course Syllabus. Purposes of Course:

Prediction of Cancer Count through Artificial Neural Networks Using Incidence and Mortality Cancer Statistics Dataset for Cancer Control Organizations

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

How to Get More Value from Your Survey Data

Data Mining with SAS. Mathias Lanner Copyright 2010 SAS Institute Inc. All rights reserved.

POST-HOC SEGMENTATION USING MARKETING RESEARCH

Neural Networks in Data Mining

Predictive Dynamix Inc

Easily Identify Your Best Customers

Paper AA Get the highest bangs for your marketing bucks using Incremental Response Models in SAS Enterprise Miner TM

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Churn Prediction. Vladislav Lazarov. Marius Capota.

PharmaSUG2011 Paper HS03

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.

DATA MINING AND REPORTING IN HEALTHCARE

Predictive time series analysis of stock prices using neural network classifier

An Introduction to Data Mining

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA

Microsoft Azure Machine learning Algorithms

A Hybrid Data Mining Model to Improve Customer Response Modeling in Direct Marketing

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Neural Networks and Back Propagation Algorithm

MERGING BUSINESS KPIs WITH PREDICTIVE MODEL KPIs FOR BINARY CLASSIFICATION MODEL SELECTION

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

MS1b Statistical Data Mining

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Knowledge Based Descriptive Neural Networks

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

Feature Subset Selection in Spam Detection

Chapter 6. The stacking ensemble approach

from Larson Text By Susan Miertschin

Stock Portfolio Selection using Data Mining Approach

A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND

Introduction to Data Mining

Clustering Marketing Datasets with Data Mining Techniques

Transcription:

2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD) Predicting Car Purchase Intent Using Data Mining Approach 1 Yap Bee Wah, 2 Nor Huwaina Ismail Faculty of Computer and Mathematical Sciences Universiti Teknologi MARA 40450 Shah Alam, Selangor, Malaysia beewah@tmsk.uitm.edu.my 1,huwaina_ismail@yahoo.com 2 3 Simon Fong 3 Department of Computer and Information Science, University of Macau, China CCFong@umac.mo 3 Abstract Data mining involves the exploration and analysis of large databases to find patterns and valuable information that can aid in decision making. This paper illustrates the use of data mining approach to build predictive models for predicting customer s intent of car purchase after booking a car. Records show that a customer who has booked a car has the tendency to cancel their booking. Three data mining predictive models: Logistic Regression (LR), Decision Tree (DT) and Neural Network (NN) were used to model the intent of purchase (IOP). The sample for this study has 1935 cases. The data was partitioned into training (70%) and validation (30%) samples. Comparisons of the performance of these three predictive models were based on the validation accuracy rate, sensitivity and specificity. Results show that all three models validation accuracy rate are quite similar (LR= 91.79%, CART=91.17%, NN=91.17%) while LR has the highest sensitivity (LR=87.77%, CART=85.47%, NN=85.89%). Important customer characteristics were also revealed from these models. Keywords- logistic regression, decision tree, data mining, classification, predictive modeling I. INTRODUCTION Data mining is one of the stages in the overall process of Knowledge Discovery in large databases (KDD). With the emergence of data mining software, data mining is gaining popularity among banks, telecommunication companies, insurance companies, educational institutions and business organizations to gain valuable information from the data which can aid in decision-making. Such organizations can use data mining for finding undiscovered patterns and/or relationships in large databases [1-5]. The goal of data mining is to find patterns in historical data that shed light on customer purchase behaviour, needs and preferences. Such valuable information can help organizations improve their business performance and practices such as improving target marketing, sales, and customer management. The different stages in the data mining process have been described in [2], [3] and [5]. The kinds of information that can be discovered depend upon the data mining objectives and techniques employed. Data mining techniques can be categorized into three categories: classification and prediction, cluster analysis and association analysis. Classification and prediction techniques fall under predictive modeling. Predictive modeling is also known as supervised classification or supervised learning because the prediction model is constructed from the data where the target or response variable is known. Generally, Linear Discriminant Analysis and logistic regression are two popular statistical methods to construct predictive models [6]. However, with the emergence of Data mining software such as SAS Enterprise Miner and SPSS Clementine, not only the classical methods but new novel predictive modeling and classification techniques such as decision tree, neural networks, support vector machine (SVM), and k-nearest neighbours are available for practical applications to real data from various discipline. Various studies in different subject areas have compared their predictive performance. For example, the ability of neural network models was compared with conventional techniques such as discriminant analysis, probit analysis and logistic regression in evaluating credit risk in Egyptian banks [7]. Some of these data mining classification algorithms were compared in predicting breast cancer survival [8] while [9] used an integrated data mining methodology to predict graft survival for heart-lung transplantation patients. Reference [10] investigated the performance of the SVM approach in credit rating prediction in comparison with back propagation neural networks while [11] reported that compared with neural networks, genetic programming and decision tree classifiers, the SVM classifier achieved identical classification accuracy with relatively few input variables. The performance of these data mining techniques will continuously be compared in different area of applications. The objective of this paper is to develop predictive model to foretell a customer s intent of purchase after booking a car. This study considered and compared the predictive ability of Logistic Regression (LR), decision tree (C5.0, CHAID and CART) and Neural Network (NN) models. This paper is organized as follows. In Section 2, we briefly review the applications of predictive models and the selection of variables. Section 3 presents the methodology for constructing the models. The results are discussed in Section 4. Finally, some concluding remarks are given in Section 5. 978-1-61284-181-6/11/$26.00 2011 IEEE 2052

II. METHODOLOGY A. Logistic Regression Logistic regression is a popular non-linear statistical model and widely applied in many fields. In contrast to multiple regression model, the logistic regression model a binary or polytomous dependent variable. For a binary dependent variable, the event of interest is coded as 1 and the nonevent as 0. The logistic regression model is written as: P( Y = 1) log = α + β P Y 1 ( = 1) 2 + Equation (1) can be solved to obtain 1 P( Y = 1) = z 1+ e where (2) where + β X + + β 1 1 2 2 1 X 1 + β 2 X +... β k X k (1) k X k The logistic regression model enables us to calculate the probability of event Y=1 occurring for each case. The predictors, X k can be a mixture of continuous and categorical variables. B. Decision Tree A decision tree model consists of a set of rules for dividing a large collection of observations into smaller homogeneous group with respect to a particular target variable. The target variable is usually categorical and the decision tree model is used either to calculate the probability that a given record belongs to each of the target category, or to classify the record by assigning it to the most likely category. Decision tree can also be used for continuous target variable although multiple linear regression models are more suitable for such variable. Given a target variable and a set of explanatory variables, decision algorithms automatically determine which variables are most important, and subsequently sort the observations into the correct output category [12]. The common decision tree algorithms in data mining software are CHAID (Chi-Square Automatic Interaction Detector), CART (Classification and Regression tree) and C5. The CART algorithm uses gini as the splitting criteria for categorical dependent variable while C5 uses entropy. Meanwhile, CHAID uses chi-square test as the splitting criteria. These algorithms will produce the tree-like structure diagram and the decision rules whereby important information can be extracted. C. Artificial Neural Networks Artificial Neural Networks (ANNs) are seen as an attractive alternative to traditional statistical methods. They are modeled after the human brain, which can be perceived as a highly connected network of neurons (called nodes in neural networks terminology). Each node (in a layer of nodes) receives inputs from at least one node in a previous layer and combines the inputs and generates an output to at least one node in the next layer. Generally, the independent variables comprise the input layer and the dependent variable comprises the output layer. Between the input and output layers, one or more hidden layers of nodes may exist. The multilayer perceptron (MLP) is the most widely used neural network model in data analysis. ANNs can identify and learn correlated patterns between input data sets and corresponding target values. However, Artificial neural networks (ANNs) have been criticized for its black box approach and interpretative difficulties. Nevertheless, they provide an alternative model to be compared with other classification techniques. After training, ANNs can be used to predict the outcome for new independent input data ([1],[4],[13],[14]). D. Literature on Car Purchase In building a predictive model, historical data on customers who previously purchased or cancelled car booking are required. Reference [15] conducted a study on one thousand recent buyers of a new car. Among those, seventeen percent only considered the brand of their previous car before purchase another car. The factors that influence the consideration of a single brand are satisfaction with the previous car and dealer, socio-demographic variables (being old, with a lower education and lower income), low perceived risk, and a number of product-specific elements (owning only one car, not owning a foreign car, staying in the same product segment, having driven only 30,000 kilometers with the previous car and having owned ten cars in the past). In predicting purchase behavior from stated intentions [16] proposed a unified model and applied it to a survey which involved randomly selected 2000 households. For the automobile data, the purchase intention is defined as 1 if the consumer intends to purchase or (actually purchases) an automobile within 12 months. Meanwhile, the purchase intention is defined as 0 if the consumer does not intend to purchase or (does not actually purchase) an automobile within 12 months. They considered variables such as occupation and education level of household head, type of residence, income, number of cars and years of cars currently in household. According to [17] current owners of cars are more likely to repurchase the brands they currently own when they are asked intent questions. In addition, the purchase behavior of current car owners is more consistent with their brand attitudes. Firsttime car buyers, on the other hand, are more likely to purchase brands that have large market shares. Reference [18] presented a model which produces simultaneous forecasts of car holding, new car purchase and scrappage. All are sensitive to changes in income and prices or car costs. The basic theoretical foundation of their model is the assumption that a potential car holder is a person between 18 and 75 years old. Car holder means here a person holding a registered car. Car holding is largely determined by income, people s expectation and car cost components. Evolution of car holding is sensitive to economic circumstances. Nevertheless, new car purchase is very much more sensitive to economic circumstances than is car holding. The role of affordability is also an important predictor of purchase instead of attitudes and purchase intentions. That is why income is an important variable in economics and is examined extensively. Total family income (TFI) is used to segment markets, profile consumers, and provide explanations for changes in purchasing patterns [19]. Reference [20] examined the impact of gender on 2053

car buyer satisfaction and found that the attitudes of male and female consumers toward car purchasing showed a clear difference. It is clearly shown that the price of a car to be important for both male and female buyers, but for different reasons. For male buyer, paying a higher price for a car means that they can have higher expectations and impress others more, while for female buyers a higher price is more important in assuring them that their car will perform as it should. Women are becoming an increasing force in the car buyer market. Their pattern of car buying differs from men. Women tend to buy lower-priced cars, and are strongest in the compact and subcompact segments. Hence, many car companies aim some of their advertising specifically at women. In a forecasting model of car ownership in Sweden, income is reported as an important predictor of car ownership. Income rates are growing faster among women than men in which 2 per cent growth in income for women and constant income for men. Male car ownership is forecast to grow only by 3 per cent to the year 2010, while female car ownership is forecast to grow by 70 per cent. Thus, he suggested that female car ownership is now the strategic factor for the future development of motorization [21]. In a study on households intention to replace the old car, the replacement intention has positive relationship with the quality of the new car and negative relationship with the perception of the old car. In other words, the household intent to replace their old car is based on the total number of miles driven, age of the car and the anticipated number of repairs [22]. E. Selection From the literature review and availability of data from the car dealer company, a description of the variables in the dataset are shown in Table 1. TABLE I. DESCRIPTION OF VARIABLES Role Name Type Description Intent of Credit card application Purchase (IOP) Target Binary 0 : Purchase 1: cancel age Input Continuous Age in years Income group Input Categorical Car status Input Categorical gender Input Binary LOU Input Categorical Car_Price Input Categorical Monthly income 0 : < 2000 1: 2000-4000 2 : 4000-6000 3 : 6000-8000 4: 8000-10000 4 : > 10000 Status of this car: 1 :Additional car 2: Replacement car 3:First car Applicant is 1: Male, 2: Female House (1: No 2: Yes) Price of car(rm): 1 : 40000-60000 2 : 60000-80000 Name Role Type Book_fee Input Categorical Down_pay Input Categorical Description 3 : 80000-100,000 4. > 100,000 Booking fee (RM) 1 : < 200 2 : 200-300 3 : 300-500 4: 500-1000 Down Payment (RM) 1: 0 2: 500-25000 3: 25000-50000 4: 50000-75000 5: 75000-100,000 6: > 100,000 Loan_amt Input Binary Loan amount (RM) 1 : 15000-50000 2: 50000-100,000 3: >100,000 4. 0 Model type Input Categorical Twelve model types F. Modeling using Clementine The sample data was first partitioned into a training sample (70%) and a validation sample (30%). The training sample data is used to build the models, while the validation sample data is for validation of the models. Fig. 1 depicts the data modeling process using SPSS Clementine. Fig. 1. Data Mining Process Flow Diagram The pentagon-shaped nodes show the construction of the models using logistic regression, decision trees (CART) and neural network. The diamond-shaped nodes show the model outputs of the respective models. For the logistic regression model, four selection methods (ENTER, STEPWISE, FORWARDS, BACKWARDS) were compared using the Analysis and Evaluation nodes. While for decision tress, the C5.0, CHAID and CART models were generated and compared. Then, the three predictive models which are stepwise logistic regression, CART and neural network are connected to the analysis node which provides the computation of accuracy rates, while the evaluation node produces the lift charts. 2054

III. RESULTS Car_Status = 2-2.72** -2.729** In this section the results of the predictive models are presented A. Logistic Regression Results For the Enter method, all variables are significant predictors except for gender. Meanwhile, the Forward, Backward and Stepwise models selected the same significant predictors. Table 3 summarizes the logistic regression results using Enter and Stepwise selection method. Based on the results in Table 2, the validation accuracy rates for the Enter and Stepwise models achieved the same value (91.79%). However, the Stepwise model has a highervalidation sensitivity (87.77%). TABLE II. ACCURACY RATE Model Sample Accuracy Sensitivity Specificity rate Enter Training 0.9208 0.8985 0.9327 Validation 0.9179 0.8734 0.9466 Stepwise Training 0.9177 0.8962 0.9292 Validation 0.9179 0.8777 0.9438 Results in Table 3 shows that, those without LOU, those with low income (< RM2000) and low booking fee are more likely to cancel their booking. Cancellation is also more likely for those who are purchasing a first car. Further crosstabulation results revealed that cancellation was more for model 9 and 4. Meanwhile, model 12 has the lowest cancellation rate. TABLE III. STEPWISE LOGISTIC REGRESSION RESULTS B (Enter) B (Stepwise) Constant -4.246** -5.262** Age -.046** -.046** Gender = F.12 LOU = N 7.162** 7.189** Booking_Fee = 1-3.79** -3.783** Booking_Fee = 2-3.618** -3.595** Booking_Fee = 3 -.257 -.208 Car_Price = 1 -.41 Car_Price = 2 -.79 Car_Price = 3-1.057 Income_Group = 1 3.377* 3.449* Income_Group = 2 3.384** 3.444** Income_Group = 3 2.293** 2.343** Income_Group = 4 1.738** 1.772** Income_Group = 5 1.735** 1.791** Model_Type = 1 2.938** 3.043** Model_Type = 2 3.402* 4.332** Model_Type = 3 -.082.479 Model_Type = 4 5.747** 5.959** Model_Type = 5 4.32** 4.929** Model_Type = 6.477.626 Model_Type = 7 2.862** 2.748** Model_Type = 8 4.738** 5.691** Model_Type = 9 4.161** 4.329** Model_Type = 10 5.324** 6.283** Model_Type = 11 2.373 2.937** Car_Status = 1 -.134 -.133 Chi-Square 1185.134** 1182.89** -2LL 485.222 487.466 Nagelkerke R-Sq 0.824 0.823 B. Decision Tree Model Results Decision tree is easy to understand and can be easily converted to a set of rules. Moreover, they can classify both categorical and numerical data and require no priori assumptions about the data. Because of the advantages listed above, the decision tree approach is extensively utilized for both classification and prediction purposes. The CART model finds four variables to be influential on the intent of purchase (LOU, booking fee, model type and car status) and the decision tree rules are listed in Table 4 while Fig. 2 shows the CART model. CANCEL TABLE IV. replacement car. Car Model: 2, 4, 6, 8, 9,10 or 11. replacement car. Car Model 1, 3, 5, 7 or 12. Booking fees are RM200- RM300, RM300-RM500 or RM500-RM1000. first car or additional car. Income groups are <RM2000, RM2000- RM4000 or RM4000- RM6000. CART RULES PURCHASE Customers have letter of undertaking (LOU) replacement car. Car Model: 1,3, 5, 7 or 12. Booking fee is <RM200. first car or additional car. Income groups are RM6000-RM8000, RM8000-RM10,000 or >RM10,000. Ages of customers are more than 43 years old. Car model: 1, 5, 11 or 12. Table 5 displays the sensitivity, specificity and the classification accuracy for each decision tree model. The sensitivity rate is the true positive rate (the percentage of customers who cancelled booking predicted correctly) while specificity is the true negative rate (percentage of those who purchase predicted correctly). All three models performances are quite similar. The CART produces simple rules and hence was chosen to be compared with LR and NN models. 2055

TABLE V. ACCURACY RATE, SENSITIVITY AND SPECIFICITY Model Sample Accuracy Sensitivity Specificity rate C5.0 Training 0.9273 0.9058 0.9389 Validation 0.9017 0.8632 0.9262 CHAID Training 0.9056 0.8972 0.9101 Validation 0.9000 0.8846 0.9098 CART Training 0.9139 0.8865 0.9286 Validation 0.9117 0.8547 0.9481 C. Neural Network Model For Neural Network (NN) model, the neural network has 34 neurons in the input layer, 3 neurons in the hidden layer and 2 neurons in the output layer. Table 5 shows the importance of the input variables in descending order. The top five most important input variables in descending order of importance are: Letter of undertaking, income group, model type, car status and car price. The estimated of accuracy rate of the neural network model is 90.79%. This is based on the correct classification rate in the training sample. TABLE VI. RELATIVE IMPORTANCE OF INPUT VARIABLES Importance value Letter of Undertaking 0.531 Income Group 0.139 Model Type 0.0760 Car Status 0.075 Car Price 0.07 Booking Fee 0.063 Age 0.035 Gender 0.009 D. Model Comparisons Comparison between these LR, CART and NN models was made to determine the best model. The accuracy rates for training and validation samples are given in Table 6. All three models predictive accuracy is quite comparable with Logistic Regression model having a slightly higher sensitivity. TABLE VII. ACCURACY RATE Model Sample Accuracy Sensitivity Specificity rate Logistic Training 0.9177 0.8962 0.9292 Regression Validation 0.9179 0.8777 0.9438 CART Training 0.9139 0.8865 0.9286 Neural Network Validation 0.9117 0.8547 0.9481 Training 0.9079 0.8737 0.9263 Validation 0.9117 0.8589 0.9454 IV. CONCLUSION There has been a rapid growth of data mining in business, applications, social and medical research. Logistic regression is the most popular statistical model to predict the probability of an event happening. With the emergence of data mining, nontraditional statistical methods such as neural networks, support vector machine and decision trees are gaining popularity in the search for a good predictive model. Data mining usually involves modeling large volumes of data and the focus is on the practical importance of the information or knowledge gained from the models. This study illustrated the construction and evaluation of three predictive models which include logistic regression, decision tree and neural network model to predict the intent of purchase of a car. Results revealed no models outperform the other but important characteristics of customers were obtained from the logistic regression and CART model. Work is in progress to cover other classification techniques such as SVM and Bayesian classification. The performance of predictive models depends on the data structure, data quality and variable selection. With the availability of data mining software, data mining models are easy to construct and apply in the business industry. However, a successful data mining project requires the involvement of experts in data mining, subject area experts and people in the business organization. REFERENCES [1] M. J. A. Berry and G. S. Linoff, Data Mining Ttehniques: For Marketing, Sales, and Customer Support. New York: John Wiley & Sons, Inc, 2004. [2] H. Jiawei and K. Micheline, Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2006. [3] A. Feelders, H. Daniels, M. Holsheimer, Methodological and Practical Aspects of Data Mining, Information & Management, 271-281, 2000. [4] G. Paolo, Applied Data Mining for Business and Industry, John Wiley & Sons, 2003. [5] K.J. Cios and L.A. Kurgan, Trends in data mining and knowledge discovery, Advanced Information and Knowledge Processing, 1-26,2005. [6] D. J. Hand and W. E. Henley, Statistical classification methods in consumer credit scoring: A review. Journal of the Royal Statistical Society: Series A (Statistics in Society), 160(3), 523 541,1997. [7] H. Abdou, J. Pointon, and A. El-Masry, Neural nets versus conventional techniques in credit scoring in Egyptian banking. Expert System with Applications, 35, 1275-1292, 2008. [8] A. Endo, T. Shibata and H. Tanaka, Comparisons of seven algorithms to predict breast cancer survival, Biomedical Soft Computing and Human Sciences, Vol 13, No. 2, 11-16, 2008. [9] A. Oztekin, D. Delen, Z. Kong, Predicting the graft survival for heartlung transplantation patients: An integrated data mining methodology,international Journal of Medical Informatics, 78(12),,e84- e96,2009. [10] Z. Huang, H. Chen,, C-J. Hsu, W-H. Chen, and S. Wu, Credit rating analysis with support vector machines and neural networks: a market comparative study. Decision Support System, 37, 543-558, 2004. [11] C-L. Huang, M-C. Chen, and, C-J. Wang, Credit scoring with a data mining approach based on support vector machines. Expert System with Applications, 37, 847-856,2007. [12] D. Olson and S. Yong, Introduction to Business Data Mining. McGraw Hill International Edition,2006. 2056

[13] J.D. Olden and D.A. Jackson, Illuminating the black box : a randomization approach for understanding variable contributions in artificial neural networks, Ecological Modeling, 154, 135-150,2002. [14] C. K. Hian and K.L. Chan, Going concern prediction using data mining techniques, Managerial Auditing Journal, Vol 19, No 3, 462-476, 2004. [15] E. Lapersonne, G. Laurent and J-J Le Goff, Consideration sets of size one: An empirical investigation of automobile purchases. International Journal of Research in Marketing 12, 55-66,1995. [16] B. Sun and Morwitz, V.G., Stated intentions and purchase behavior: A unified model. International Journal of Research in Marketing,Volume 27( 4), 356-366,2010. [17] G.J. Fitzsimons and Mortwitz, V.G.,The effect of measuring intent on brand-level purchase behavior. Journal of Consumer Research Inc., 23,1-11,1996. [18] Jorgensen, F. and Wentzel-Larsen, T, Forecasting car holding,scrappage and new car purchase in Norway, Journal of Transport Economics and Policy 24(2), 139-156,1990. [19] Notani, A.S., Perceptions of affordability: Their role in predicting purchase intent and purchase. Journal of Economic Psychology 18, 525-54,1995. [20] Moutinho, L., Davies, F. and Curry, B.,The impact of gender on car buyer satisfaction and loyalty. Journal of Retailing and Consumer Services 3(3), 135-144,1996. [21] Jansson, J. O., Car demand modeling and forecasting:a new approach. Journal of Transport Economics and Policy 23(2), 125-140,1989. [22] Marell, A., Davidsson, P., Garling, T. and Laitila, T., Direct and indirect effects on households intentions to replace the old car. Journal of Retailing and Consumer Services 11, 1 8,2004. 2057