Suggestions to Extract Association Rules with Multidimensional Database Prof. Dr. Hillal Hadi Salih Dr. Soukaena Hassan Hashem Shaimaa Akram

Similar documents
Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Data Mining Solutions for the Business Environment

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

Healthcare Measurement Analysis Using Data mining Techniques

Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms

Future Trend Prediction of Indian IT Stock Market using Association Rule Mining of Transaction data

College information system research based on data mining

Data Mining Applications in Manufacturing

Selection of Optimal Discount of Retail Assortments with Data Mining Approach

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

Gold. Mining for Information

Knowledge-Driven Decision Support System Based on Knowledge Warehouse and Data Mining for Market Management

DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE

MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis

Analytics on Big Data

Data Mining to Recognize Fail Parts in Manufacturing Process

ISSN: A Review: Image Retrieval Using Web Multimedia Mining

A Knowledge Management Framework Using Business Intelligence Solutions

DATA MINING AND WAREHOUSING CONCEPTS

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS

What is Customer Relationship Management? Customer Relationship Management Analytics. Customer Life Cycle. Objectives of CRM. Three Types of CRM

FREQUENT PATTERN MINING FOR EFFICIENT LIBRARY MANAGEMENT

The Role of Visualization in Effective Data Cleaning

A RESEARCH STUDY ON DATA MINING TECHNIQUES AND ALGORTHMS

Data Mining: Concepts and Techniques

ETPL Extract, Transform, Predict and Load

A New Approach for Evaluation of Data Mining Techniques

An Overview of Database management System, Data warehousing and Data Mining

A Survey on Association Rule Mining in Market Basket Analysis

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

Data Mining System, Functionalities and Applications: A Radical Review

COURSE RECOMMENDER SYSTEM IN E-LEARNING

Implementation of Data Mining Techniques to Perform Market Analysis

Use of Data Mining in the field of Library and Information Science : An Overview

Concept and Applications of Data Mining. Week 1

Chapter ML:XI. XI. Cluster Analysis

2.1. Data Mining for Biomedical and DNA data analysis

Improving Apriori Algorithm to get better performance with Cloud Computing

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: X DATA MINING TECHNIQUES AND STOCK MARKET

Overview Applications of Data Mining In Health Care: The Case Study of Arusha Region

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing

Enhancing e-business Through Web Data Mining

How To Write An Association Rules Mining For Business Intelligence

An Overview of Knowledge Discovery Database and Data mining Techniques

Data Mining Introduction

Viral Marketing in Social Network Using Data Mining

Three Perspectives of Data Mining

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

Project Report. 1. Application Scenario

Introduction to Data Mining

Mining changes in customer behavior in retail marketing

Data Mining Approach in Security Information and Event Management

Data Mining Algorithms And Medical Sciences

Introduction to Data Mining

Computer Science in Education

Data mining in the e-learning domain

An Empirical Study of Application of Data Mining Techniques in Library System

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Discovery of students academic patterns using data mining techniques

DATA MINING CONCEPTS AND TECHNIQUES. Marek Maurizio E-commerce, winter 2011

Data Mining Algorithms and Techniques Research in CRM Systems

MINING CLICKSTREAM-BASED DATA CUBES

TIBCO Spotfire Guided Analytics. Transferring Best Practice Analytics from Experts to Everyone

Data Mining Governance for Service Oriented Architecture

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 1

Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining

Discovery of Maximal Frequent Item Sets using Subset Creation

A Framework for Dynamic Faculty Support System to Analyze Student Course Data

New Matrix Approach to Improve Apriori Algorithm

Bisecting K-Means for Clustering Web Log data

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Statistics for BIG data

Session 10 : E-business models, Big Data, Data Mining, Cloud Computing

Overview of Data Mining

Customer Classification And Prediction Based On Data Mining Technique

Neural Networks in Data Mining

Knowledge Discovery Process and Data Mining - Final remarks

Mining On-line Newspaper Web Access Logs

Information Management course

Nuggets and Data Mining

Data Warehousing and Data Mining

Apply On-Line Analytical Processing (OLAP)With Data Mining For Clinical Decision Support

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA

Introduction to Data Mining Techniques

(b) How data mining is different from knowledge discovery in databases (KDD)? Explain.

Data Mining and KDD: A Shifting Mosaic. Joseph M. Firestone, Ph.D. White Paper No. Two. March 12, 1997

Analyzing Customer Behavior using Data Mining Techniques: Optimizing Relationships with Customer

Analyzing the footsteps of your customers

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

The Theory of Concept Analysis and Customer Relationship Mining

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Introduction to Data Mining

Mining an Online Auctions Data Warehouse

Sensitivity Analysis for Data Mining

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Chapter 2 Literature Review

Transcription:

Suggestions to Extract Association Rules with Multidimensional Database Prof. Dr. Hillal Hadi Salih Dr. Soukaena Hassan Hashem Shaimaa Akram Abstract: Data mining derives its name from the similarities between searching for valuable information in a large database and mining rocks for a vein of valuable ore. Both imply either sifting through a large amount of material or ingeniously probing the material to exactly pinpoint where the values reside. It is, however, a misnomer, since mining for gold in rocks is usually called gold mining and not rock mining, thus by analogy, data mining should have been called knowledge mining instead. The more general terms such as Knowledge Discovery in Databases (KDD) describes a more complete process. Other similar terms referring to data mining are: data dredging, knowledge extraction and pattern discovery. In our research we concentrate on one particular aspects is to extract the association rules with multidimensional database by the following: We propose to deal with multidimensional database to extract the association rules that by divide this database to two databases the first one for TIDs with the items and the second one for TIDs with dimensions. Then extract the frequent itemsets for each one separately, the frequent itemsets for the first extracted by the traditional apriori algorithm but for the second we propose algorithm for that. Then combine the two obtained sets of frequent itemsets to one set. Finally apply the generation association rule algorithm to get the final rules of the multidimensional database. Keywords: multidimensional database, association rules algorithm. 1

1- Introduction: Data mining is a process that uses a variety of data analysis tools to discover patterns and relationships that can be hidden among vast amount of data. From these patterns and relationships, businesses and organizations can make valid predictions about future trends in all areas of business. Data mining tools can answer business questions that had traditionally been very time consuming to resolve. These tools find hidden patterns and predictive information that experts might have missed because they laid outside of their expectations. Data mining can be divided in two sub groups: discovery and exploration. Discovery in the sense that meaningful patterns are uncovered in data and distinguished properly and exploration means that these meaningful patterns are used to create useful applications for data modeling purposes. The process of discovering knowledge in data is referred to as knowledge discovery. It is when data is being worked on from a conventional data store not a data warehouse or an online transaction system. Knowledge discovery can be performed in two ways either manually or through an automated process. This process of knowledge discovery is finding a connection between data and facts. It can, in fact, be said to be a relationship between prior facts and subsequent facts. The prior fact is called the antecedent and the subsequent fact is known as the consequent. There should be no assumption of an underlying cause and effect connection between the antecedent and consequent. Through the process of data mining, it should always be kept in mind that data is not facts, in order to have knowledge and facts one needs data. Knowledge is the main connection between antecedent facts and consequent facts. This knowledge plays the main role between antecedent and consequent facts and takes the form of several shapes in data mining. Knowledge can be in the form if-then-else rules that are known as Knowledge Based Expert Systems (KBES) or it can be in form of a complex mathematical transformation. There are four main techniques used in data mining: 2

Auto-clustering, link analysis, visualization, and the rule induction. These four techniques are used in the creation of knowledge information based on data from the database [1, 2, 3]. 2- Apriori Association Rule: The efficient discovery of such rules has been a major focus in the data mining research community. Many algorithms and approaches have been proposed to deal with the discovery of different types of association rules discovered from a variety of databases. However, typically, the databases relied upon are alphanumerical and often transaction-based. The problem of discovering association rules is to find relationships between the existence of an object (or characteristic) and the existence of other objects (or characteristics) in a large repetitive collection. Such a repetitive collection can be a set of transactions for example, also known as the market basket. Typically, association rules are found from sets of transactions, each transaction being a different assortment of items, like in a shopping store ({milk, bread, etc}). Association rules would give the probability that some items appear with others based on the processed transactions, for example milk bread [50%], meaning that there is a probability 0.5 that bread is bought when milk is bought. Essentially, the problem consists of finding items that frequently appear together, known as frequent or large item-sets. The problem is stated as follows, see table 1, Let I = {i1, i2,...im} be a set of literals, called items. Let D be a set of transactions, where each transaction T is a set of items such that T I. A unique identifier TID is given to each transaction. A transaction T is said to contain X, a set of items in I, if X T. An association rule is an implication of the form X Y, where X I, Y I, and X Y =. The rule X Y has a support s in the transaction set D is s% of the transactions in D contain X Y. In other words, the support of the rule is the probability that X and Y hold 3

together among all the possible presented cases. It is said that the rule X Y holds in the transaction set D with confidence c if c% of transactions in D that contain X also contain Y. In other words, the confidence of the rule is the conditional probability that the consequent Y is true under the condition of the antecedent X. The problem of discovering all association rules from a set of transactions D consists of generating the rules that have a support and confidence greater that given thresholds. These rules are called strong rules [4, 5]. 4

3- The Proposed System: points: To explain our research in details we will organized it in the following First step: get the multidimensional database, see figure (1). TID D1 D2 items 1 A 1 ABC 2 A 1 AB 3 B 1 AB 4 B 3 ABC Figure (1) multidimensional database has two dimensions with items. Second step: divide the database in figure (1) in to two databases, the first one contain the TID and the two dimensions where the second will contain the TID and items, see figure (2). 5

TID D1 D2 1 A 1 2 A 1 3 B 1 4 B 3 TID items 1 ABC 2 AB 3 AB 4 ABC Figure (2): the first and second databases. Third step: apply the traditional apriori association rule on the second database to extract the frequent itemset. These frequent itemsets are {A, B, C, AB, AC, BC, ABC} since we propose the minimum support is equal to 2 TIDs. Fourth step: apply the proposed algorithm to extract the frequent itemsets from the first database. 1. Take the first dimension D1 and find the frequent items in it, for example the threshold of minimum support is equal to 2 TIDs, so we found two frequent items they are (a, *) and (b, *) because the value a and b occurs two times; value * for the other dimension shows that they are not relevant. Repeat the operation for the second dimension we found the frequent item in it is (*, 1) only because the value 1 occurs three times; value * for the other dimension shows that they are not relevant. 2. Now take the first dimension with concern of the second dimension we will get the following frequents items they are: (a, 1) and (b, 1). Since there are no more dimensions in our example, the search will end. 3. if there are more than two dimensions then we will apply the same thing take each dimension alone then take each dimension with others. 6

Fifth step: the general frequent itemsets for the multidimensional database will be as in the follow: {A, B, C, AB, AC, BC, ABC, (a, *), (b, *), (*, 1), (a, 1), (b, 1)} Sixth step: generate the association rules according apriori algorithms using the frequent itemsets displayed in the previous step. 4- The Implementation: To explain our proposed system and prove it is quality we implement the proposed system using visual basic programming language as in follow: Figure (1): the main window for the proposed system. In figure (1) the multidimensional database will be divided to two database and each one will stored in notepad file we will give the name of them in the two first textboxes. In the other two textboxes we will give the value of the minimum 7

support and minimum confidence. The first command when it clicked will display the generated frequent itemsets of the database contain the TIDs and items, see figure (2). The second command when it clicked will display the generated frequent itemsets of the database contain the TIDs and dimensions, see figure (3). The third command when it clicked will display the generated association rules of the two types of frequent itemsets, see figure (4). Figure (2): the main window forgenerated frequent itemsets for TIDs and items. Figure (2): the main window forgenerated frequent itemsets for TIDs and dimensions. Figure (3): the main window for the generated association rules 8

5- Conclusions: In our proposed research we conclude the following: 1. Mining the multidimensional database directly is not efficient because the last has the values of items and the values of dimensionand the both of these values are inconsistent. 2. for more efficient we suggest to divide the multidimensional database to two databases the first one contain the TIDs and items and the other contain the TIDs and dimensions. 3. for each database got after division we will find the frequent itemset. The TIDs and items base we will find the frequents for it by applying the traditional apriori algorithms. The other we propose an algorithm to treatment that part. 4. combine the both resulted frequent will strength the proposal system of mining since we will generate the association rule for complete database. This make the multidimensional as one block. 6- References: 1- M. S. Chen, J. Han, and P. S. Yu. Data mining: An overview from a database perspective. IEEE Trans. Knowledge and Data Engineering, 8:866-883, 1996. 2- U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy. Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, 1996. 3- J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000. 9

4- Mitra S., and Ahharya T., "Data Mining Multimedia, Soft Computing, and Bioinformatics", John Wiley and Sons, Inc., 2003. 5- Mohammadian M., Intelligent Agent for DM and Information Retrieval, Idea Group Publisher, 2004. 10