CERTIFICATE. University, Mullana (Amabala) for the award of the degree of Doctor of Philosophy in



Similar documents
5.2 Customers Types for Grocery Shopping Scenario

Data Mining Techniques in CRM

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

An Overview of Knowledge Discovery Database and Data mining Techniques

Chapter 12 Discovering New Knowledge Data Mining

LIST OF TABLES. 4.3 The frequency distribution of employee s opinion about training functions emphasizes the development of managerial competencies

Role of Social Networking in Marketing using Data Mining

A Review of Data Mining Techniques

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

How to Get More Value from Your Survey Data

Federico Rajola. Customer Relationship. Management in the. Financial Industry. Organizational Processes and. Technology Innovation.

Introduction to Data Mining

IT and CRM A basic CRM model Data source & gathering system Database system Data warehouse Information delivery system Information users

Principles of Data Mining by Hand&Mannila&Smyth

List of Tables. Page Table Name Number. Number 2.1 Goleman's Emotional Intelligence Components Components of TLQ

ANALYTICS CENTER LEARNING PROGRAM

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

An Introduction to Data Mining

CUSTOMER RELATIONSHIP MANAGEMENT AND ITS INFLUENCE ON CUSTOMER LOYALTY AT LIBERTY LIFE IN SOUTH AFRICA. Leon du Plessis MINOR DISSERTATION

Customer and Business Analytic

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

T-test & factor analysis

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Data Mining + Business Intelligence. Integration, Design and Implementation

A New Approach for Evaluation of Data Mining Techniques

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

KNOWLEDGE BASE DATA MINING FOR BUSINESS INTELLIGENCE

DATA MINING TECHNIQUES FOR IDENTIFYING THE CUSTOMER BEHAVIOUR OF INVESTMENT IN STOCK MARKET IN INDIA

Easily Identify Your Best Customers

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Data Mining Solutions for the Business Environment

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING

Sanjeev Kumar. contribute

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Customer Analytics. Turn Big Data into Big Value

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.

Azure Machine Learning, SQL Data Mining and R

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Effect of Business Value Chain Practices on the Supply Chain Performance of Large Manufacturing Firms in Kenya

Database Marketing, Business Intelligence and Knowledge Discovery

Application of Predictive Model for Elementary Students with Special Needs in New Era University

An Overview of Database management System, Data warehousing and Data Mining

Statistics for BIG data

Class 10. Data Mining and Artificial Intelligence. Data Mining. We are in the 21 st century So where are the robots?

DATA MINING TECHNIQUES AND APPLICATIONS

Analyzing Polls and News Headlines Using Business Intelligence Techniques

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Data Analysis. Management Information Systems 13

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Fluency With Information Technology CSE100/IMT100

3/17/2009. Knowledge Management BIKM eclassifier Integrated BIKM Tools

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Information Management course

Course Syllabus Business Intelligence and CRM Technologies

Data Mining Techniques for Banking Applications

ISSN: (Online) Volume 3, Issue 7, July 2015 International Journal of Advance Research in Computer Science and Management Studies

2015 Workshops for Professors

How To Identify A Churner

CUSTOMER RELATIONSHIP MANAGEMENT (CRM) CII Institute of Logistics

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Data Mining: Overview. What is Data Mining?

The University of Jordan

Entrepreneurs of Small Scale Sector: A Factor Analytical Study of Business Obstacles

Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

MBA Data Mining & Knowledge Discovery

Succession planning in Chinese family-owned businesses in Hong Kong: an exploratory study on critical success factors and successor selection criteria

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Data Mining for Fun and Profit

Data Mining for Model Creation. Presentation by Paul Below, EDS 2500 NE Plunkett Lane Poulsbo, WA USA

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Factors Influencing the Adoption of Biometric Authentication in Mobile Government Security

not possible or was possible at a high cost for collecting the data.

Data Mining Analytics for Business Intelligence and Decision Support

TNS EX A MINE BehaviourForecast Predictive Analytics for CRM. TNS Infratest Applied Marketing Science

Get to Know the IBM SPSS Product Portfolio

A supply chain analytics approach to product assortment optimization

Course Syllabus For Operations Management. Management Information Systems

Data Warehousing and Data Mining in Business Applications

EXPERT SYSTEM FOR RESOLUTION OF DELAY CLAIMS IN CONSTRUCTION CONTRACTS

Course Syllabus. Purposes of Course:

Web Data Mining: A Case Study. Abstract. Introduction

Clustering Methods in Data Mining with its Applications in High Education

Introduction. A. Bellaachia Page: 1

STUDENTS ATTITUDES TOWARDS BUSINESS ETHICS: A COMPARISON BETWEEN INDONESIA AND LESOTHO.

Distance Learning and Examining Systems

Statistical Models in Data Mining

Learning outcomes. Knowledge and understanding. Competence and skills

TALENT MANAGEMENT PRACTICES AND ITS IMPACT ON ORGANIZATIONAL PRODUCTIVITY: A STUDY WITH REFERENCE TO IT SECTOR IN BENGALURU

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

Master of Science in Health Information Technology Degree Curriculum

What is Customer Relationship Management? Customer Relationship Management Analytics. Customer Life Cycle. Objectives of CRM. Three Types of CRM

Transcription:

CERTIFICATE This is to certify that the thesis titled Data Mining in Retailing in India : A Model Based Approach submitted by Ruchi Mittal to Maharishi Markandeshwar University, Mullana (Amabala) for the award of the degree of Doctor of Philosophy in Computer Science, is a bonafide record of original work done under my supervision and guidance. The work contained in this thesis has not been submitted to any other University or Institute for the award of any other degree or diploma. Dr. NAVEETA MEHTA Associate Professor M.M. Institute of Computer Technology & Business Management M.M. University, Mullana (Ambala) Haryana (India) i

ACKNOWLEDGEMENT I express gratitude to my supervisor, Dr. Naveeta Mehta, Associate Professor, Maharishi Markandeshwar Institute of Computer Technology & Business Management, M.M. University, Mullana, whose untiring guidance had made it possible for me to complete this work. Her dedication to academic life, discipline, and straight forward approach has had a great impact on my professional and personal life. She is humane with willingness to help others, care for everyone, and always being concerned about the progress. With these rare qualities, I found in her not merely supervisor but a noble soul, a Guru. I express my gratitude to Dr. Dimple Juneja, Principal and Professor, Maharishi Markandeshwar Institute of Computer Technology & Business Management, M.M. University, Mullana, for her mentorship and guidance at various stages of this research. I am indebted to all my colleagues at MAIMT, Jagadhri, especially my Director, Dr. Raj Kumar for the constant support and cooperation throughout my thesis. I would also like to thank, though it is difficult to put it into words, my gratitude to Dr. Anil Kapil, Professor and Head, Computer Science and Technology, Haryana Institute of Engineering & Technology, Kaithal and Dr. Sangeeta Gupta, Director & Professor, Om Institute of Technology and Management (Mgt), Juglan, Hisar, who have always been there to extend all moral support and professional mentoring. iii

I also like to thank my mother, Smt. Raj Aggarwal, who more than a mother has always been a friend for me and has always been with me through thick and thin. I am also thankful to my better half, Dr Amit Mittal for his personal and academic support and my child, Yash to whom, I dedicate this thesis. I am especially thankful to my brother and Prime Minister awardee, Ishan Aggarwal, who, due to his academic achievements has raised the bar of academic excellence in the family. I am also thankful to my father, Shri Ishwar Aggarwal, my father-in-law, Shri Sat Paul Mittal and mother-in-law, Smt. Trishla Mittal, for showing faith in me and for ensuring a conducive environment for my professional pursuits. I offer my regards to all those who I am not mentioning, but supported me or inspired me in any respect during the course of the completion of my work. Last but not the least; I thank almighty GOD for always being there and for seeing me through the tough times. RUCHI MITTAL If I have seen further, it is by standing on the shoulders of giants --- Isaac Newton iv

ABSTRACT Data mining is an inter-disciplinary emerging field that focuses on access of information useful for high-level decisions and includes Machine Learning, Statistics and Probabilities, On Line Analytical processing, Data visualization, Information science, High-performance computing, etc. Data mining enables business executives to manage their data and to make relevant decisions. Simply stated, data mining refers to extracting or mining of knowledge from large amount of data. Retail is amongst the major fields of application of data mining technology. It is India s largest industry accounting for over 10 per cent of the GDP and 8 per cent of employment. In India, the industry is facing the new millennium, and the models of the past are not sufficient to ensure tomorrow s successes. Firms are increasingly relying on data mining techniques which use existing databases to devise new strategies for growth, profitability and customer loyalty. The thesis starts with the discussions on the concepts of database management systems, data warehousing and then data mining. It provides the historic development of data mining and retailing in India. This also provides the background material for the research problem. The objectives, scope and significance of the study have also been clearly outlined. Then Review of Literature provides the theoretical and conceptual framework of the research. This thesis reviews the work done in the field of interest identified since 1983. The period for this research is purposively selected so as to ensure that the technology under review i.e. Data Mining; has had sufficient time to prove its usefulness in prediction and in ensuring its use brings positive results to organizations. The major v

concepts and technologies reviewed are: Data Mining and Business Intelligence; Customer Segmentation and Profiling; Store Image/ Attributes; Predictive modeling through Data Mining; Cluster analysis; Factor analysis; Multiple regression Analysis. This section also identifies all the important variables and seeks to identify the gaps in the research done in the field both in India and abroad. The next section discusses the various data mining concepts, functionalities, tools and techniques. The disciplines of statistics and data mining have also been discussed to prove that these areas are highly interrelated and share a symbiotic relationship. This section helps to gain a major understanding of the various data mining algorithms and the way these can be utilized in various business applications and the way these algorithms can be used in the descriptive and predictive data mining modeling. Then the research design in terms of the type of research, the sampling plan, and the designing of the survey instrument (questionnaire) have been discussed. This section also gives the detailed description on the various data mining techniques that have been used to achieve the research objectives. The next section relates to the data analysis of the data collected through the survey and the interpretations are mentioned so that meaningful recommendations and conclusions can be drawn. The analysis was performed using the various data mining techniques like: (1) Two-step cluster analysis this technique is used for identifying clusters of customers based on their homogeneous groupings drawn from an, otherwise, set of heterogeneous customer data base (2) Chi-Square test- this is intended to test how likely it is that an observed distribution is due to chance. It is also called a "goodness of fit" statistic (3) vi

Factor analysis this technique is used for data preprocessing and for reducing the data to a manageable level which can be used for further analysis such as modeling and suitable interpretation; and (4) Multiple regression analysis- this predictive data mining modeling technique is used to predict the dependent variable (in this case Store Loyalty ) on the basis of the independent variables (in this case Store image dimensions/ attributes ). Finally, in the end, this thesis provides the findings, recommendations and future scope of the study. The customer groups identified are store-loyals and store non-loyals. The nonloyals present a significantly large group and retailers need to understand the typical profile of such customers so that suitable strategies can be formulated targeting them. The importance of various customer variables has also been identified. The six salient store attributes dimensions that have emerged have been discussed and suggestions have been put forth for the benefit of retailers and for future research. vii

LIST OF ABREVIATIONS AI ANOVA BI CART CHAID CRIS CRM DBMS DM EDI EIS FDI GRDI IR KDD MANOVA MHI OLAP Artificial Intelligence Analysis of Variance Business Intelligence Classification and Regression Tree Chi-Square Automatic Interaction Detection Consumer Image of Retails Stores Customer Relationship Management Database Management System Data Mining Electronic Data Interchange Executive Information System Foreign Direct Investment Global Retail Development Index Information Retrieval Knowledge Discovery in Databases Multivariate Analysis of Variance Monthly Household Income Online Analytical Processing viii

PCA PLs RDBMS RFID RFM RIS SPSS SQL VAT VSM Principal Component Analysis Private Labels Relational Database Management System Radio Frequency Identification Device Recency, Frequency, Monetary Retail Information System Statistical Package for Social Science Structured Query Language Value Added Tax Vector Space Model ix

LIST OF FIGURES Figure No. Figure Description Page No. Figure 3.1 Steps of Knowledge Discovery in Databases 45 Figure 3.2 Phases of Data Mining Life Cycle 47 Figure 3.3 Predictive Modeling through Linear Regression 56 Figure 3.4 Nearest Neighbors for Three Unclassified Records 59 Figure 3.5 Discovering Clusters and Descriptions in a Database 60 Figure 3.6 Hierarchical clustering 61 Figure 3.7 Decision Tree for Cellular Telephone Industry 63 Figure 3.8 Structure of a Neural network 65 Figure 3.9 A Simplified View of Neural Network 65 Figure 3.10 Neural Network for Prediction of Loyalty 66 Figure 4.1 Steps in Factor Analysis 86 Figure 4.2 Steps for Multiple Regression Analysis 92 Figure 5.1 Graphical Representation of Cluster Distribution 101 Figure 5.2 Within Cluster Percentage of Gender 104 Figure 5.3 Chi- Square - Gender 104 Figure 5.4 Within Cluster Percentage of Age 105 Figure 5.5 Chi- Square - Age 105 x

Figure 5.6 Within Cluster Percentage of Occupation 106 Figure 5.7 Chi- Square - Occupation 107 Figure 5.8 Within Cluster Percentage of Education 108 Figure 5.9 Chi- Square - Education 108 Figure 5.10 Within Cluster Percentage of MHI 109 Figure 5.11 Chi- Square - Income 110 Figure 5.12 Within Cluster Percentage of Shop-with 111 Figure 5.13 Chi- Square Shop-with 111 Figure 5.14 Within Cluster Percentage of Spend 112 Figure 5.15 Chi- Square - Spend 112 Figure 5.16 Within Cluster Percentage of Trips 113 Figure 5.17 Chi- Square - Trips 114 Figure 5.18 Relative Importance of Demographic and Behavioral Variables 114 Figure 5.19 Scree Plot 128 Figure 5.20 Component Plot in Rotated Space 132 Figure 5.21 Figurative Description of Store Loyalty- Predictive Model 141 xi

LIST OF TABLES Table No. Table Detail Page No. Table 1.1 Steps in the Evolution of Data Mining 6 Table 1.2 KDnuggets : Polls: Data Mining Software (May 2008) 13 Table 4.1 Questions to measure Loyalty 71 Table 4.2 Summarized Sample Statistics 73 Table 4.3 Sample Descriptive with Coding 74 Table 4.4 Chi-Square Test Illustration 81 Table 4.5 Color Preference by Customers for Car Dealership 83 Table 4.6 Directions for Setting up Worksheet for Chi-Square 84 Table 5.1 Auto-Clustering 100 Table 5.2 Cluster Distribution 101 Table 5.3 Store Loyalty amongst Surveyed Customers 102 Table 5.4 Profiling of Cluster by Gender 103 Table 5.5 Profiling of Cluster by Age 104 Table 5.6 Profiling of Cluster by Occupation 105 Table 5.7 Profiling of Cluster by Education 107 Table 5.8 Profiling of Cluster by Income 109 Table 5.9 Profiling of Cluster by Shop-with 110 Table 5.10 Profiling of Cluster by Expenditure 112 xii

Table 5.11 Profiling of Cluster by Trips 113 Table 5.12 Summary of Demographic/ Behavioral variables sample distribution and cluster membership 115 Table 5.13 Descriptive Statistics 118 Table 5.14 Correlation Matrix 120 Table 5.15 Anti-Image Matrix 122 Table 5.16 KMO and Bartlett s Test 124 Table 5.17 Communalities 126 Table 5.18 Total Variance Explained 127 Table 5.19 Component Matrix 129 Table 5.20 Rotated Component Matrix 130 Table 5.21 Short- Listed Attributes (Factor Loadings above.40) 131 Table 5.22 Component Transformation Matrix 132 Table 5.23 Factor Score Coefficient Matrix 133 Table 5.24 Factor Analysis of Grocery Store Attribute: Interpretation of Factors 134 Table 5.25 Reliability Analysis of Factors 135 Table 5.26 Variables Entered/ Removed 137 Table 5.27 Model Summary 138 Table 5.28 ANOVA 139 Table 5.29 Regression Coefficients 140 xiii