Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING



Similar documents
Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

Contents. Dedication List of Figures List of Tables. Acknowledgments

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

NEURAL NETWORKS A Comprehensive Foundation

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

DYNAMIC FUZZY PATTERN RECOGNITION WITH APPLICATIONS TO FINANCE AND ENGINEERING LARISA ANGSTENBERGER

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

Software Development Training Camp 1 (0-3) Prerequisite : Program development skill enhancement camp, at least 48 person-hours.

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Chapter 12 Discovering New Knowledge Data Mining

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

Business Intelligence. Data Mining and Optimization for Decision Making

INFORMATION FILTERS SUPPLYING DATA WAREHOUSES WITH BENCHMARKING INFORMATION 1 Witold Abramowicz,

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Data Mining + Business Intelligence. Integration, Design and Implementation

An Overview of Knowledge Discovery Database and Data mining Techniques

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Federico Rajola. Customer Relationship. Management in the. Financial Industry. Organizational Processes and. Technology Innovation.

Schneps, Leila; Colmez, Coralie. Math on Trial : How Numbers Get Used and Abused in the Courtroom. New York, NY, USA: Basic Books, p i.

DATA MINING IN FINANCE

An Introduction to Data Mining

Course Syllabus. Purposes of Course:

Social Media Mining. Data Mining Essentials

Learning outcomes. Knowledge and understanding. Competence and skills

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r

Data Algorithms. Mahmoud Parsian. Tokyo O'REILLY. Beijing. Boston Farnham Sebastopol

Principles of Data Mining by Hand&Mannila&Smyth

Intrusion Detection. Jeffrey J.P. Tsai. Imperial College Press. A Machine Learning Approach. Zhenwei Yu. University of Illinois, Chicago, USA

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.

MS1b Statistical Data Mining

Experiments in Web Page Classification for Semantic Web

Comparison of K-means and Backpropagation Data Mining Algorithms

Customer and Business Analytic

The University of Jordan

Data Mining Techniques

Role of Neural network in data mining

1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining

Using Data Mining for Mobile Communication Clustering and Characterization

NEURAL NETWORK FUNDAMENTALS WITH GRAPHS, ALGORITHMS, AND APPLICATIONS

Predictive Dynamix Inc

Data Mining and Neural Networks in Stata

How To Prevent Network Attacks

The Data Mining Process

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

Predict Influencers in the Social Network

Model Deployment. Dr. Saed Sayad. University of Toronto

Knowledge Based Descriptive Neural Networks

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Introduction to Data Mining

Essential Components of an Integrated Data Mining Tool for the Oil & Gas Industry, With an Example Application in the DJ Basin.

ANALYTICS IN BIG DATA ERA

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

LVQ Plug-In Algorithm for SQL Server

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING

Statistical Models in Data Mining

Information Management course

Chapter 20: Data Analysis

Machine Learning using MapReduce

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Predictive Data modeling for health care: Comparative performance study of different prediction models

Data Mining. Concepts, Models, Methods, and Algorithms. 2nd Edition

A New Approach for Evaluation of Data Mining Techniques

Data Warehousing in the Age of Big Data

Master of Science in Health Information Technology Degree Curriculum

Exploratory Data Analysis with MATLAB

Customer Classification And Prediction Based On Data Mining Technique

Contents RELATIONAL DATABASES

How To Cluster

Reinventing Business Intelligence through Big Data

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

Explorer's Guide to the Semantic Web

Chapter ML:XI (continued)

Analytics on Big Data

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

A SURVEY OF TEXT CLASSIFICATION ALGORITHMS

A Review of Anomaly Detection Techniques in Network Intrusion Detection System

Impelling Heart Attack Prediction System using Data Mining and Artificial Neural Network

Data Warehousing and Data Mining in Business Applications

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina

Introduction to Data Mining Techniques

Advanced Database Marketing Innovative Methodologies and Applications for Managing Customer Relationships

Theoretical Perspective

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

Data Mining Analytics for Business Intelligence and Decision Support

Rule based Classification of BSE Stock Data with Data Mining

Business Intelligence: Data Mining and Optimization for Decision Making

Transcription:

Practical Applications of DATA MINING Sang C Suh Texas A&M University Commerce r 3 JONES & BARTLETT LEARNING

Contents Preface xi Foreword by Murat M.Tanik xvii Foreword by John Kocur xix Chapter 1 Introduction to Data Mining 1 1.1 Traditional Database Management Systems 1 1.2 Knowledge Discovery in Databases 3 1.2.1 Pre-Processing 5 1.2.2 Data Warehousing 6 1.2.3 Post-Processing 6 1.3 Data-Mining Methods 6 1.3.1 Association Rules 7 1.3.2 Classification Learning 8 1.3.3 Statistical Data Mining 10 1.3.4 Rough Sets for Data Mining 11 1.3.5 Neural Networks for Data Mining 12 1.3.6 Clustering for Data Mining 14 1.3.7 Fuzzy Sets for Data Mining 16 1.4 Integrated Framework for Intelligent Databases 17 1.5 Practical Applications of Data Mining 20 1.5.1 Healthcare Services 20 1.5.2 Banking 22 1.5.3 Supermarket Applications 23 1.5.4 Medical Image Classification 25 1.6 Chapter Summary 27

vi CONTENTS Chapter 2 Association Rules 29 2.1 Introduction 29 2.2 Mining of Association Rules in Market Basket Data 29 2.2.1 Apriori Algorithm 30 2.2.2 Apriori-gen( ) Function 32 2.2.3 Apriori Example 32 2.2.4 AprioriTid Algorithm 33 2.3 Attribute-Oriented Rule Generalization 35 2.3.1 Concept Hierarchies 36 2.3.2 Basic Strategies for Attribute-Oriented Induction 38 2.3.3 Basic Attribute-Oriented Induction Algorithm 42 2.3.4 Generation of Discrimination Rules through Attribute-Oriented Induction 43 2.4 Association Rules in Hypertext Databases 46 2.4.1 Formal Model 47 2.4.2 Algorithms for Generating Composite Association Rules 49 2.5 Quantitative Association Rules 53 2.5.1 Mapping of Quantitative Association Rules 53 2.5.2 Problem Decomposition 55 2.5.3 Partitioning of Quantitative Attributes 56 2.6 Mining of Compact Rules 59 2.6.1 Semantic Association Relationships 59 2.6.2 Generalization Algorithm 60 2.6.3 Learning Process 61 2.6.4 Learning Algorithm 63 2.7 Mining of Tmie-Constrained Association Rules 67 2.7.1 Time-Constrained Association Rules 67 2.7.2 Properties oftime Constraints 69 2.7.3 Potential Applications 70 2.8 Chapter Summary 70 2.9 Exercises 71 2.10 Selected Bibliographic Notes 74 2.11 Chapter Bibliography 75 Chapter 3 Classification Learning 79 3.1 Introduction 79 3.2 Knowledge Representation 81 3.2.1 Classification Rules 81 3.2.2 Decision Trees 81 3.3 Separate-and-Conquer Approach 82 3.3.1 Prism 83 3.3.2 Induct 86 3.3.3 REP, IREP, RIPPER 97

CONTENTS vii 3.4 Divide-and-Conquer Approach 99 3.4.1 ID3 100 3.4.2 C4.5 and C5.0 106 3.5 Partial Decision Tree 123 3.6 Chapter Summary 129 3.7 Exercises 129 3.8 Selected Bibliographic Notes 137 3.9 Chapter Bibliography 138 Chapter 4 Statistics for Data Mining 143 4.1 Introduction 143 4.2 House Sales Data 145 4.3 Conditional Probability 146 4.4 Equality Tests 148 4.5 Correlation Coefficient 152 4.6 Contingency Table and the %2 Test 157 4.7 Linear Regression 164 4.8 House Sales Database Revisited 172 4.9 Chapter Summary 175 4.10 Exercises 175 4.11 Selected Bibliographic Notes 178 4.12 Chapter Bibliography 178 Chapter 5 Rough Sets and Bayes' Theories 181 5.1 Introduction 181 5.2 Bayes'Theorem 183 5.3 Rough Sets 184 5.3.1 Data Analysis and Representation 184 5.3.2 Reduction of Condition Attributes and Generation of Decision Rules 188 5.4 Applications Based on Bayes'and Rough Sets 190 5.4.1 Customer Tendency Analysis Using Bayes'Theory 190 5.4.2 Contact Lens Prescription Using Rough Set Theory 190 5.4.3 Welding Procedure Using Rough-Set Theory 195 5.4.4 Classification ofautomobiles Using Both Bayes' and Rough Set Theory 202 5.5 Chapter Summary 212 5.6 Exercises 213 5.7 Selected Bibliographic Notes 220 5.8 Chapter Bibliography 221 Chapter 6 Neural Networks 225 6.1 Introduction 225 6.2 Neural Computing and Databases 226

viii CONTENTS 6.3 Network Classification 228 6.3.1 Unsupervised Learning Models 228 6.3.2 Supervised Learning Models 230 6.4 Parameters of the Learning Process 231 6.4.1 Number of Hidden Layers 231 6.4.2 Number of Hidden Nodes 232 6.4.3 Early Stopping 232 6.4.4 Convergence Curve (Back-Propagation Neural Network) 233 6.5 Network Structures 234 6.5.1 Neural Net andtraditional Classifiers 235 6.6 Knowledge Discovery 6.6.1 Normalization 236 in Databases 235 6.7 Backpropagation Neural Network (BPNN) 6.7.1 Network Architecture 239 6.7.2 Algorithm 240 6.7.3 Example I 242 Model 239 6.7.4 Example II (Retrieval ofdata Using the BPNN Model) 243 6.8 Bidirectional Associative Memory (BAM) Model 246 6.8.1 Network Architecture 247 6.8.2 Algorithm 247 6.8.3 Example with Four TrainingVectors 248 6.9 Learning Vector Quantization (LVQ) Model 250 6.9.1 Network Architecture 251 6.9.2 Algorithm 252 6.9.3 Example 253 6.10 Probabilistic Neural Network (PNN) Model 255 6.10.1 Network Architecture 256 6.10.2 Algorithm 259 6.10.3 Example 260 6.10.4 Parameter Adjustment Using a Smoothing Factor 265 6.11 Chapter Summary 267 6.12 Exercises 268 6.13 Selected Bibliographic Notes 274 6.14 Chapter Bibliography 275 Chapter 7 Clustering 279 7.1 Introduction 279 7.2 Definition of Clusters and Clustering 280 7.3 Clustering Procedures 283 7.4 Clustering Concepts 284 7.4.1 Choosing Variables 284 7.4.2 Similarity and Dissimilarity Measurement 285

CONTENTS ix 7.4.3 Standardization of Variables 287 7.4.4 Weights and Threshold Values 288 7.4.5 Association Rules 289 7.5 Clustering Algorithms 290 7.5.1 Hierarchical Algorithms 291 7.5.2 Graph Theory Algorithm with the Single-link Method 304 7.5.3 Partition Algorithms: K"-means Algorithm 307 7.5.4 Density-Search Algorithms 310 7.5.5 Association Rule Algorithms 313 7.6 Chapter Summary 329 7.7 Exercises 329 7.8 Selected Bibliographic Notes 333 7.9 Chapter Bibliography 335 Chapter 8 Fuzzy Information Retrieval 339 8.1 Introduction 339 8.2 Fuzzy Set Basics 340 8.3 Fuzzy Set Applications 341 8.3.1 Project Management 342 8.3.2 Data Analysis 342 8.3.3 Nuanced Information Systems 346 8.4 Linguistic Variables 347 8.5 Fuzzy Query Processing 348 8.6 Fuzzy Query Processing Using Fuzzy Tables 363 8.6.1 Convert Raw Data to Fuzzy Member Functions 363 8.6.2 Fuzzy Table 368 8.6.3 Fuzzy Search Engine 369 8.6.4 Fuzzy Table Construction 370 8.6.5 Fuzzy Query Processing 371 8.7 Role of Relational Division for Information Retrieval 374 8.7.1 Information Retrieval through Relational Division 375 8.7.2 Information Retrieval through Fuzzy Relational Division 376 8.8 Alpha-Cut Thresholds 379 8.9 Chapter Summary 383 8.10 Exercises 384 8.11 Selected Bibliographic Notes 391 8.12 Chapter Bibliography 392 Appendix 395 Index 409