CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19


1 PREFACE xi 1 INTRODUCTION Overview Definition Preparation Overview Accessing Tabular Data Accessing Unstructured Data Understanding the Variables and Observations Data Cleaning Transformation Variable Reduction Segmentation Preparing Data to Apply Analysis Data Mining Tasks Optimization Evaluation Model Forensics Deployment Outline of Book Overview Data Visualization Clustering Predictive Analytics Applications Software Summary Further Reading 17 2 DATA VISUALIZATION Overview Visualization Design Principles General Principles Graphics Design Anatomy of a Graph 28 v
2 vi CONTENTS 2.3 Tables Simple Tables Summary Tables TwoWay Contingency Tables Supertables Univariate Data Visualization Bar Chart Histograms Frequency Polygram Box Plots Dot Plot StemandLeaf Plot Quantile Plot Quantile Quantile Plot Bivariate Data Visualization Scatterplot Multivariate Data Visualization Histogram Matrix Scatterplot Matrix Multiple Box Plot Trellis Plot Visualizing Groups Dendrograms Decision Trees Cluster Image Maps Dynamic Techniques Overview Data Brushing Nearness Selection Sorting and Rearranging Searching and Filtering Summary Further Reading 66 3 CLUSTERING Overview Distance Measures Overview Numeric Distance Measures Binary Distance Measures Mixed Variables Other Measures Agglomerative Hierarchical Clustering Overview Single Linkage Complete Linkage Average Linkage Other Methods Selecting Groups 96
3 vii 3.4 PartitionedBased Clustering Overview kmeans Worked Example Miscellaneous PartitionedBased Clustering Fuzzy Clustering Overview Fuzzy kmeans Worked Examples Summary Further Reading PREDICTIVE ANALYTICS Overview Predictive Modeling Testing Model Accuracy Evaluating Regression Models Predictive Accuracy Evaluating Classification Models Predictive Accuracy Evaluating Binary Models Predictive Accuracy ROC Charts Lift Chart Principal Component Analysis Overview Principal Components Generating Principal Components Interpretation of Principal Components Multiple Linear Regression Overview Generating Models Prediction Analysis of Residuals Standard Error Coefficient of Multiple Determination Testing the Model Significance Selecting and Transforming Variables Discriminant Analysis Overview Discriminant Function Discriminant Analysis Example Logistic Regression Overview Logistic Regression Formula Estimating Coefficients Assessing and Optimizing Results Naive Bayes Classifiers Overview Bayes Theorem and the Independence Assumption Independence Assumption Classification Process 159
4 viii CONTENTS 4.7 Summary Further Reading APPLICATIONS Overview Sales and Marketing IndustrySpecific Data Mining Finance Insurance Retail Telecommunications Manufacturing Entertainment Government Pharmaceuticals Healthcare microrna Data Analysis Case Study Defining the Problem Preparing the Data Analysis Credit Scoring Case Study Defining the Problem Preparing the Data Analysis Deployment Data Mining Nontabular Data Overview Data Mining Chemical Data Data Mining Text Further Reading 213 APPENDIX A MATRICES 215 A.1 Overview of Matrices 215 A.2 Matrix Addition 215 A.3 Matrix Multiplication 216 A.4 Transpose of a Matrix 217 A.5 Inverse of a Matrix 217 APPENDIX B SOFTWARE 219 B.1 Software Overview 219 B.1.1 Software Objectives 219 B.1.2 Access and Installation 221 B.1.3 User Interface Overview 221 B.2 Data Preparation 223 B.2.1 Overview 223 B.2.2 Reading in Data 224 B.2.3 Searching the Data 225
5 ix B.2.4 Variable Characterization 227 B.2.5 Removing Observations and Variables 228 B.2.6 Cleaning the Data 228 B.2.7 Transforming the Data 230 B.2.8 Segmentation 235 B.2.9 Principal Component Analysis 236 B.3 Tables and Graphs 238 B.3.1 Overview 238 B.3.2 Contingency Tables 239 B.3.3 Summary Tables 240 B.3.4 Graphs 242 B.3.5 Graph Matrices 246 B.4 Statistics 246 B.4.1 Overview 246 B.4.2 Descriptive Statistics 248 B.4.3 Confidence Intervals 248 B.4.4 Hypothesis Tests 249 B.4.5 ChiSquare Test 250 B.4.6 ANOVA 251 B.4.7 Comparative Statistics 251 B.5 Grouping 253 B.5.1 Overview 253 B.5.2 Clustering 254 B.5.3 Associative Rules 257 B.5.4 Decision Trees 258 B.6 Prediction 261 B.6.1 Overview 261 B.6.2 Linear Regression 263 B.6.3 Discriminant Analysis 265 B.6.4 Logistic Regression 266 B.6.5 Naive Bayes 267 B.6.6 knn 269 B.6.7 CART 269 B.6.8 Neural Networks 270 B.6.9 Apply Model 271 BIBLIOGRAPHY 273 INDEX 279
Customer and Business Analytic Applied Data Mining for Business Decision Making Using R Daniel S. Putler Robert E. Krider
