Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms



Similar documents
Sensitivity Analysis for Data Mining

Fundations of Data Mining

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis

Introduction to Data Mining

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing

Comparison of K-means and Backpropagation Data Mining Algorithms

A RESEARCH STUDY ON DATA MINING TECHNIQUES AND ALGORTHMS

Data Mining to Recognize Fail Parts in Manufacturing Process

Standardization of Components, Products and Processes with Data Mining

ISSUES IN RULE BASED KNOWLEDGE DISCOVERING PROCESS

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: X DATA MINING TECHNIQUES AND STOCK MARKET

On Interactive Data Mining

Mining an Online Auctions Data Warehouse

Assessing Data Mining: The State of the Practice

A Mechanism for Selecting Appropriate Data Mining Techniques

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

THE ANALYSIS OF THE TELECOMMUNICATIONS SECTOR BY THE MEANS OF DATA MINING TECHNIQUES

Data Mining Solutions for the Business Environment

Introduction to Data Mining Techniques

Healthcare Measurement Analysis Using Data mining Techniques

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Gold. Mining for Information

ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS

An Empirical Study of Application of Data Mining Techniques in Library System

Mining Association Rules: A Database Perspective

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

Healthcare Data Mining: Prediction Inpatient Length of Stay

PROTRADER: An Expert System for Program Trading

Towards applying Data Mining Techniques for Talent Mangement

Rule based Classification of BSE Stock Data with Data Mining


Postprocessing in Machine Learning and Data Mining

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset

Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI

EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS

DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress)

Implementation of Data Mining Techniques to Perform Market Analysis

Knowledge Discovery from Databases

Subjective Measures and their Role in Data Mining Process

Mining On-line Newspaper Web Access Logs

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Building A Smart Academic Advising System Using Association Rule Mining

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Top 10 Algorithms in Data Mining

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

How To Use Data Mining For Loyalty Based Management

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

D A T A M I N I N G C L A S S I F I C A T I O N

Use of Data Mining in the field of Library and Information Science : An Overview

Web Mining Patterns Discovery and Analysis Using Custom-Built Apriori Algorithm

College information system research based on data mining

WebAdaptor: Designing Adaptive Web Sites Using Data Mining Techniques

Towards the Optimization of Data Mining Execution Process in Distributed Environments

Syllabus. HMI 7437: Data Warehousing and Data/Text Mining for Healthcare

Data Mining: A Preprocessing Engine

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Quality Assessment in Spatial Clustering of Data Mining

Data Mining Applications in Fund Raising

CONTEMPORARY DECISION SUPPORT AND KNOWLEDGE MANAGEMENT TECHNOLOGIES

Revenue Recovering with Insolvency Prevention on a Brazilian Telecom Operator

Cost Drivers of a Parametric Cost Estimation Model for Data Mining Projects (DMCOMO)

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

Top Top 10 Algorithms in Data Mining

A Business Intelligence Training Document Using the Walton College Enterprise Systems Platform and Teradata University Network Tools Abstract

Integrating Pattern Mining in Relational Databases

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

How To Use Neural Networks In Data Mining

A New Marketing Channel Management Strategy Based on Frequent Subtree Mining

INTRODUCTION TO KNOWLEDGE DISCOVERY IN DATABASES

Data Mining and Neural Networks in Stata

Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad

Effective Data Mining Using Neural Networks

A Spatial Decision Support System for Property Valuation

IMPROVING PIPELINE RISK MODELS BY USING DATA MINING TECHNIQUES

Knowledge Discovery from Data Bases Proposal for a MAP-I UC

A General Approach to Incorporate Data Quality Matrices into Data Mining Algorithms

NEURAL NETWORKS IN DATA MINING

Subject Description Form

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Adaptive Incremental Framework for Performance- Driven Data Mining

Knowledge Based Descriptive Neural Networks

J48 Algebra And Spatial Data Mining - Models

Selection of Optimal Discount of Retail Assortments with Data Mining Approach

Data Mining Analytics for Business Intelligence and Decision Support

Binary Coded Web Access Pattern Tree in Education Domain

Analyzing Polls and News Headlines Using Business Intelligence Techniques

MINING CLICKSTREAM-BASED DATA CUBES

Prediction of Heart Disease Using Naïve Bayes Algorithm

NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Introduction to Data Mining

Data Mining and Soft Computing. Francisco Herrera

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Data Mining and KDD: A Shifting Mosaic. Joseph M. Firestone, Ph.D. White Paper No. Two. March 12, 1997

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations

2.1. Data Mining for Biomedical and DNA data analysis

FEATURE EXTRACTION FOR CLASSIFICATION IN THE DATA MINING PROCESS M. Pechenizkiy, S. Puuronen, A. Tsymbal

Transcription:

Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms Y.Y. Yao, Y. Zhao, R.B. Maguire Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: {yyao, yanzhao, rbm}@cs.uregina.ca Abstract. We propose a new framework of explanation-oriented data mining by adding an explanation construction and evaluation phase to the data mining process. While traditional approaches concentrate on mining algorithms, we focus on explaining mined results. The mining task can be viewed as unsupervised learning that searches for interesting patterns. The construction and evaluation of mined patterns can be formulated as supervised learning that builds explanations. The proposed framework is therefore a simple combination of unsupervised and supervised learning. The basic ideas are illustrated using association mining. The notion of conditional association is used to represent plausible explanations of an association. The condition in a conditional association explicitly expresses the plausible explanations of an association. 1 Introduction Data mining is a discipline concerning theories, methodologies, and in particular, computer systems for exploring and analyzing a large amount of data. A data mining system is designed with an objective to automatically discover, or to assist a human expert to discover, knowledge embedded in data [2, 6, 21]. Results, experiences and lessons from artificial intelligence, and particularly intelligent information systems, are immediately applicable to the study of data mining. By putting data mining systems in the wide context of intelligent information systems, one can easily identify certain limitations of current data mining studies. In this paper, we focus on the explanation facility of intelligent systems, which has not received much attention in data mining community. We present a new explanation-oriented framework for data mining by combining unsupervised and supervised learning. For clarity, we use association mining to demonstrate the basic ideas. The notion of conditional association is used to explicitly state the conditions under which the association occurs. An algorithm is suggested. Conceptually, it consists of two parts and uses two data tables. A transaction data table is used to learn an association in the first step. An explanation table is used to construct an explanation of the association in the second step.

2 Motivations In the development of many branches of science such as mathematics, physics, chemistry, and biology, the discovery of a natural phenomenon is only the first step. The important subsequent tasks for scientists are to build a theory accounting for the phenomenon and to provide justifications, interpretations, and explanations of the theory. The interpretations and explanations enhance our understanding of the phenomenon and guide us to make rational decisions [22]. Explanation plays an important role in learning and is an important functionality of many intelligent information systems [5, 8, 9, 11, 15]. Dhaliwal and Benbasat argue that the role of constructing explanation is to clarify, teach, and convince [5]. Human experts are often asked to explain their views, recommendations, decisions or actions. Users would not accept recommendations that emerge from reasoning that they do not understand [9]. In an expert system, an explanation facility serves several purposes [17]. It makes the system more intelligible to the user, helps an expert to uncover shortcomings of the system, and help a user to feel more assured about the recommendations and actions of the system. Typically, the system provides two basic types of explanations: the why and the how. A why type question is normally posed by a user when the system asks the user to provide some information. A how type question is posed by a user if the user wants to know how a certain conclusion is reached. Wick and Slagle [19] proposed a journalistic explanation facility which include the six elements who, what, where, when, why, and how. A data mining system may be viewed as an intermediate system between a database or data warehouse and an application, whose main purpose is to change data into usable knowledge [21]. To achieve this goal, the data mining system should provide necessary explanations of mined knowledge. A piece of discovered knowledge is meaningful and trustful only if we have an explanation. An association does not immediately offer an explanation. One needs to find explanations regarding when, where, and why an association occurs. If a data mining system is an interactive system, it must also provide explanations for its recommendations and actions. For a knowledge-based data mining systems, explanation of the use of knowledge is also necessary to make the mining process more understandable by a user. The observations and results regarding explanations in expert systems are applicable to data mining systems. In order to make data mining a well-accepted technology, more attention must be paid to the needs and wishes for explanations from its end users. Without the explanation functionality, the effectiveness of data mining systems is limited. On the other hand, studies in data mining have been focused on the preparation, process and analysis of data. Little attention is paid to the task of explaining discovered results. There is clearly a need for the incorporation of an explanation facility into a data mining process. It is commonly accepted that a data mining process consists of the following steps: data selection, data preprocessing, data transformation, pattern discovery, and pattern evaluation [6]. Several variations have been studied by many authors [7, 10,16]. By adding an extra step, explanation construction and eval- 2

uation, we can obtain a framework of explanation-oriented data mining. This leads to a significant step from detecting the existence of a pattern to searching for the underlying reasons that explain the existence of the pattern. 3 Explanation-oriented association mining Association mining was first introduced using transaction databases and deals with purchasing patterns of customers [1]. A set of items are associated if they are bought together by many customers. Some authors extended the original associations to negative associations [20]. 3.1 Conditional associations and explanation evaluation The reasons for the occurrence of an association can not be provided by the association itself. One needs to construct and represent explanations using other information. More specifically, if one can identify some conditions under which the occurrence of the association is more pronounced, the condition may provide some explanation. By adding time, place, customer features (profiles), and item features as conditions, we may identify when, where and why an association occurs, respectively. The notion of conditional associations has been discussed by many authors in different contexts [4, 14, 18]. Typically, conditions in conditional associations mining are used as constraints to restrict a portion of the database to mine useful associations. For explanation-oriented association mining, we take a reverse process. We first mine association and then search for conditions. We can profile transactions by customers, places, and time ranges. Domain specific knowledge is used to select a set of profiles and to form an explanation table. Different explanation tables can be constructed, which lead to different explanations. Each explanation table may or may not be able to provide a satisfactory explanation. It may also happen that each table may be able to explain only some aspects of the association. Let φψ denote an association discovered in a transaction table. Let χ denote a condition expressible in the explanation table. A conditional association is written by φψ χ. Suppose s is a measure that quantifies the strength of the association. An example of such measures is the support measure used in association mining [1]. Plausible explanations may be obtained by comparing the values s(φψ) and s(φψ χ). If s(φψ) > s(φψ χ), namely, the association φψ is more pronounced under the condition χ, we say that χ provides a plausible explanation for φψ, otherwise, χ does not. We may also introduce another measure g to quantify the quality of conditions [22]. Explanations are evaluated jointly by the two measures. 3.2 Explanation construction Construction of explanations is equivalent to finding conditions in conditional associations from an explanation table. 3

Suppose φψ is an association of interest. We can classify transactions into two classes, those that satisfy the association, and those that do not satisfy the association. With this transformation, searching for conditions in conditional associations can be stated as learning of classification rules in the explanation table. Any supervised learning algorithm, such as ID3 [12], its later version C4.5 [13], or PRISM [3], may be used to perform this task. 3.3 An algorithm for explanation-oriented association mining Explanation-oriented associating mining consists of two steps. In the first step, an unsupervised learning algorithm, such Apriori [1] or a clustering algorithm, is used to discover an association. In the second step, an association of interest is used to create a label in the explanation table. Any supervised learning algorithm, such as ID3 [12] or PRISM [3], is used to learn classification rules, which are in fact conditional associations. The framework of explanation-oriented association mining is thus a simple combination of existing unsupervised and supervised learning algorithms. As an illustration, the combined Apriori-ID3 algorithm is described below: Input: A transaction table and explanation profiles. Output: Conditional associations (explanations). 1 Use the Apriori algorithm to generate a set of frequent itemsets in the transaction table. For each φψ in the set, support(φψ) minsup. 2 If φψ is interesting 2.a Introduce a binary attribute named Decision. Given a transaction x U, its value on Decision is + if it satisfies φψ in the transaction table. Otherwise, its value is -. 2.b Construct an information table by using the attribute Decision and explanation profiles. The new table is called an explanation table. 2.c By treating Decision as the target class, we can apply the ID3 Algorithm to derive classification rules of the form: χ Decision = +, which corresponds to the conditional association φψ χ. The condition χ is a formula in the explanation table, which states the condition χ under which the association φψ occurs. 2.d Evaluate conditional associations based on statistical measures. 4 Conclusion By drawing results from artificial intelligence in general and intelligent information systems in specific, we demonstrate the needs for explanations of mined results in a data mining process. We show that explanation-oriented association mining can be easily achieved by combining existing unsupervised and supervised learning methods. The main contribution is the introduction of a new point of view to data mining research. An explanation facility may greatly increase the effectiveness of data mining systems. 4

References 1. Agrawal, R. and Srikant, R., Fast algorithms for mining association rules in large databases, Proceedings of VLDB, 487-499, 1994. 2. Berry, M.J.A. and Linoff, G.S. Mastering Data Mining: the Art and Science of Customer Relationship Management, John Wiley & Sons, New York, 2000. 3. Cendrowska, J., PRISM: an algorithm for inducing modular rules, International Journal of Man-Machine Studies, 27, 349-370, 1987. 4. Chen, L., Discovery of Conditional Association Rules, Master thesis, Utah State University, 2001. 5. Dhaliwal, J.S. and Benbasat, I., The use and effects of knowledge-based system explanations: theoretical foundations and a framework for empirical evaluation, Information Systems Research, 7, 342-362, 1996. 6. Fayyad, U.M., Piatetsky-Shapiro, G. and Smyth, P. From data mining to knowledge discovery: an overview, in: Advances in knowledge discovery and data mining, Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P. and Uthurusamy, R. (Eds.), 1-34, AAAI/MIT Press, Menlo Park, California, 1996. 7. Han, J. and Kamber, M., Data mining: Concept and Techniques, Morgan Kaufmann, Palo Alto, CA, 2000. 8. Hasling, D.W., Clancey, W.J. and Rennels, G., Strategic explanations for a diagnostic consultation system, International Journal of Man-Machine Studies, 20, 3-19, 1984. 9. Haynes, S.R., Explanation in Information Systems: A Design Rationale Approach, Ph.D. Dissertation, The London School of Economics, University of London, 2001. 10. Mannila, H. Methods and problems in data mining, Proceedings of International Conference on Database Theory, 41-55, 1997. 11. Pitt, J., Theory of Explanation, Oxford University Press, Oxford, 1988. 12. Quinlan, J.R., Learning efficient classification procedures, in: Machine Learning: An Artificial Intelligence Approach I, Michalski, J.S., Carbonell, J.G., and Mirchell, T.M. (Eds.), Morgan Kaufmann, Palo Alto, CA, 463-482, 1983. 13. Quinlan, J.R., C4.5: programs for machine learning, Morgan Kaufmann, Palo Alto, CA, 1993. 14. Rauch, J., Association rules and mechanizing hypotheses formation, Proceedings of ECML workshop proceedings: machine learning as experimental philosophy of science, 2001. 15. Schank, R. and Kass, A. Explanations, machine learning, and creativity, in: Machine Learning: An Artificial Intelligence Approach, III, Kodratoff, Y. and Michalski, R. (Eds.), Morgan Kaufmann, Palo Alto, CA, 31-48, 1990. 16. Simoudis, E. Reality check for data mining. IEEE Expert, 11, 1996. 17. Turban, E. and Aronson, J.E. Decision Support Systems and Intelligent System, Prentice Hall, New Jersey, 2001. 18. Wang, K. and He, Y., User-defined association mining, Proceedings of PAKDD, 387-399, 2001. 19. Wick, M.R. amd Slagle, J.R. An explanation facility for today s expert systems, IEEE Expert, 4, 1989, 26-36. 20. Wu, X., Zhang, C. and Zhang, S. Mining both positive and negative association rules, Proceedings of ICML, 1997. 21. Yao, Y.Y. A step toward foundations of data mining, manuscript, 2003. 22. Yao, Y.Y., Zhao, Y. and Maguire, R.B. Explanation oriented association mining using rough set theory, Proceedings of International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, to appear, 2003. 5