CLUSTBIGFIM-FREQUENT ITEMSET MINING OF BIG DATA USING PRE-PROCESSING BASED ON MAPREDUCE FRAMEWORK



Similar documents
Frequent Itemset Mining for Big Data

A Spam Message Filtering Method: focus on run time

Performance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster

DISTRIBUTED DATA PARALLEL TECHNIQUES FOR CONTENT-MATCHING INTRUSION DETECTION SYSTEMS. G. Chapman J. Cleese E. Idle

DISTRIBUTED DATA PARALLEL TECHNIQUES FOR CONTENT-MATCHING INTRUSION DETECTION SYSTEMS

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH

Searching frequent itemsets by clustering data

Exploiting A Support-based Upper Bound of Pearson s Correlation Coefficient for Efficiently Identifying Strongly Correlated Pairs

Performance of Multiple TFRC in Heterogeneous Wireless Networks

Cluster-Aware Cache for Network Attached Storage *

Optical Illusion. Sara Bolouki, Roger Grosse, Honglak Lee, Andrew Ng

Performance Evaluation of some Online Association Rule Mining Algorithms for sorted and unsorted Data sets

AN OVERVIEW ON CLUSTERING METHODS

Performance of a Browser-Based JavaScript Bandwidth Test

CASE STUDY BRIDGE.

Novel Framework for Distributed Data Stream Mining in Big data Analytics Using Time Sensitive Sliding Window

Assessing the Discriminatory Power of Credit Scores

Comparison of Data Mining Techniques for Money Laundering Detection System

Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm


A New Optimum Jitter Protection for Conversational VoIP

Bi-Objective Optimization for the Clinical Trial Supply Chain Management

Advances in Natural and Applied Sciences

MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM

CHARACTERISTICS OF WAITING LINE MODELS THE INDICATORS OF THE CUSTOMER FLOW MANAGEMENT SYSTEMS EFFICIENCY

Mining Interesting Medical Knowledge from Big Data

Two Dimensional FEM Simulation of Ultrasonic Wave Propagation in Isotropic Solid Media using COMSOL

Directed Graph based Distributed Sequential Pattern Mining Using Hadoop Map Reduce

Distributed Framework for Data Mining As a Service on Private Cloud

Improving Apriori Algorithm to get better performance with Cloud Computing

A technical guide to 2014 key stage 2 to key stage 4 value added measures

Implementing Improved Algorithm Over APRIORI Data Mining Association Rule Algorithm

A note on profit maximization and monotonicity for inbound call centers

Project Management Basics

Optimizing a Semantic Comparator using CUDA-enabled Graphics Hardware

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop

Fuzzy Logic -based Pre-processing for Fuzzy Association Rule Mining

A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains

IMPLEMENTATION OF P-PIC ALGORITHM IN MAP REDUCE TO HANDLE BIG DATA

Improving the Performance of Web Service Recommenders Using Semantic Similarity

processed parallely over the cluster nodes. Mapreduce thus provides a distributed approach to solve complex and lengthy problems

BUILT-IN DUAL FREQUENCY ANTENNA WITH AN EMBEDDED CAMERA AND A VERTICAL GROUND PLANE

Association Rule Mining using Apriori Algorithm for Distributed System: a Survey

Mixed Method of Model Reduction for Uncertain Systems

Binary Coded Web Access Pattern Tree in Education Domain

SPMF: a Java Open-Source Pattern Mining Library

DUE to the small size and low cost of a sensor node, a

A hybrid algorithm combining weighted and hasht apriori algorithms in Map Reduce model using Eucalyptus cloud platform

SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY

Turbulent Mixing and Chemical Reaction in Stirred Tanks

Queueing systems with scheduled arrivals, i.e., appointment systems, are typical for frontal service systems,

Simulation of Sensorless Speed Control of Induction Motor Using APFO Technique

Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques

Growing Self-Organizing Maps for Surface Reconstruction from Unstructured Point Clouds

Future Trend Prediction of Indian IT Stock Market using Association Rule Mining of Transaction data

International Journal of Heat and Mass Transfer

Log Mining Based on Hadoop s Map and Reduce Technique

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

The Cash Flow Statement: Problems with the Current Rules

Top Top 10 Algorithms in Data Mining

FEDERATION OF ARAB SCIENTIFIC RESEARCH COUNCILS

CASE STUDY ALLOCATE SOFTWARE

A COMPARATIVE STUDY OF THREE-PHASE AND SINGLE-PHASE PLL ALGORITHMS FOR GRID-CONNECTED SYSTEMS

Redesigning Ratings: Assessing the Discriminatory Power of Credit Scores under Censoring

SELF-MANAGING PERFORMANCE IN APPLICATION SERVERS MODELLING AND DATA ARCHITECTURE

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES

A Note on Profit Maximization and Monotonicity for Inbound Call Centers

ANALYSING THE FEATURES OF JAVA AND MAP/REDUCE ON HADOOP

CHAPTER 5 BROADBAND CLASS-E AMPLIFIER

Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2

A Parallel Spatial Co-location Mining Algorithm Based on MapReduce

KNOWLEDGE DISCOVERY and SAMPLING TECHNIQUES with DATA MINING for IDENTIFYING TRENDS in DATA SETS

Nimble Storage Exchange ,000-Mailbox Resiliency Storage Solution

International Journal of Engineering Research ISSN: & Management Technology November-2015 Volume 2, Issue-6

A Resolution Approach to a Hierarchical Multiobjective Routing Model for MPLS Networks

Support Vector Machine Based Electricity Price Forecasting For Electricity Markets utilising Projected Assessment of System Adequacy Data.

Top 10 Algorithms in Data Mining

KEYWORD SEARCH IN RELATIONAL DATABASES

Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture

Research Article An (s, S) Production Inventory Controlled Self-Service Queuing System

DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE

Selection of Optimal Discount of Retail Assortments with Data Mining Approach

A Survey on Association Rule Mining in Market Basket Analysis

CLOUD BASED PEER TO PEER NETWORK FOR ENTERPRISE DATAWAREHOUSE SHARING

MSc Financial Economics: International Finance. Bubbles in the Foreign Exchange Market. Anne Sibert. Revised Spring Contents

Map/Reduce Affinity Propagation Clustering Algorithm

Risk Management for a Global Supply Chain Planning under Uncertainty: Models and Algorithms

Mobile Network Configuration for Large-scale Multimedia Delivery on a Single WLAN

Transcription:

CLUSTBIGFIM-FREQUENT ITEMSET MINING OF BIG DATA USING PRE-PROCESSING BASED ON MAPREDUCE FRAMEWORK Sheela Gole 1 and Bharat Tidke 2 1 Department of Computer Engineering, Flora Intitute of Technology, Pune, India ABSTRACT Now a day enormou amount of data i getting explored through Internet of Thing (IoT) a technologie are advancing and people ue thee technologie in day to day activitie, thi data i termed a Big Data having it characteritic and challenge. Frequent Itemet Mining algorithm are aimed to dicloe frequent itemet from tranactional databae but a the dataet ize increae, it cannot be handled by traditional frequent itemet mining. MapReduce programming model olve the problem of large dataet but it ha large communication cot which reduce execution efficiency. Thi propoed new pre-proceed k-mean technique applied on BigFIM algorithm. ClutBigFIM ue hybrid approach, clutering uing k- mean algorithm to generate Cluter from huge dataet and Apriori and Eclat to mine frequent itemet from generated cluter uing MapReduce programming model. Reult hown that execution efficiency of ClutBigFIM algorithm i increaed by applying k-mean clutering algorithm before BigFIM algorithm a one of the pre-proceing technique. KEYWORDS Aociation Rule Mining, Big Data, Clutering, Frequent Itemet Mining, MapReduce. 1. INTRODUCTION Data mining and KDD (Knowledge Dicovery in Databae) are eential technique to dicover hidden information from large dataet with variou characteritic. Now a day Big Data ha bloom in variou area uch a ocial networking, retail, web blog, forum, online group [1]. Frequent Itemet Mining i one of the important technique of ARM. Goal of FIM technique i to reveal frequent itemet from tranactional databae. Agrawal et al. [2] put forward Apriori algorithm which generate frequent itemet having frequency greater than minimum upport given. It i not efficient on ingle computer when dataet ize increae. Enormou amount of work ha been put forward to uncover frequent item. There exit variou parallel and ditributed algorithm which work on large dataet but having memory and I/O cot limitation and cannot handle Big Data [3] [4]. MapReduce developed by Google [5] along with hadoop ditributed file ytem i exploited to find out frequent itemet from Big Data on large cluter. MapReduce ue parallel computing approach and HDFS i fault tolerant ytem. MapReduce ha Map and Reduce function; data flow in MapReduce i hown in below figure. DOI:10.5121/ijfct.2015.5307 79

Figure 1. Map-Reduce Data flow. In thi paper, baed on BigFIM algorithm, a new algorithm optimizing the peed of BigFIM algorithm i propoed. Firtly uing parallel K-Mean clutering cluter are generated from Big Dataet. Then cluter are mined uing ClutBigFIM algorithm, effectively increaing the execution efficiency. Thi paper i organized a follow ection 2 give overview of related work done on frequent itemet mining. Section 3 give overview of background theory for ClutBigFIM. Section 4 explain peudo code of ClutBigFIM. The experimental reult with comparative analyi are given in ection 5. Section 6 conclude the paper. 2. RELATED WORK Variou equential and parallel frequent itemet parallel algorithm are available [5] [6] [7] [8] [9] [10]. But there i need of FIM algorithm which can handle Big Data. Thi ection give an inight into frequent itemet mining which exploit MapReduce framework. The exiting algorithm have challenge while dealing with Big Data. Parallel implementation of traditional Apriori algorithm baed on MapReduce framework i put forward by Lin et al. [11] and Li et al. [12] alo propoed parallel implementation of Apriori algorithm. Hammoud [13] ha put forward MRApriori algorithm which i baed on MapReduce programming model and claic Apriori algorithm. It doe not require repetitive can of databae which ue iterative horizontal and vertical witching. Parallel implementation of FP-Growth algorithm ha been put forward in [14]. Liu et al. [15] ha been put forward IOMRA algorithm which i a modified FAMR algorithm optimize execution efficiency by pre-proceing uing Apriori TID which remove all low frequency 1-item itemet from given databae. Then poible longet candidate itemet ize i determined uing length of each tranaction and minimum upport. 80

Moen et al. [16] ha been put forward two algorithm uch a DitEclat and BigFIM, DitEclat i ditributed verion of Eclat algorithm which mine prefix tree and extract frequent itemet fater but not calable enough. BigFIM applie Apriori algorithm before DitEclat to handle frequent itemet till ize k and next k+1 item are extracted uing Eclat algorithm but BigFIM algorithm ha limitation on peed. Both algorithm are baed on MapReduce framework. Currently Moen alo propoed implementation of DitEclat and BigFIM algorithm uing Mahout. Approximate frequent itemet are mined uing PARMA algorithm which ha been put forward by Riondato et al. [17]. K-mean clutering algorithm i ued for finding cluter which i called a ample lit. Frequent item et are extracted very fat, reducing execution time. Malek and Kadima [18] ha been put forward parallel k-mean clutering which ue MapReduce programming model for generating cluter parallel by increaing performance of traditional K- Mean algorithm. It ha Map, Combine and Reduce function which ue (key, value) pair. Ditance between ample point and random centre are calculated for all point uing map function. Intermediate output value from map function are combined uing combiner function. All ample are aigned to cloet cluter uing reduce function. 3. BACKGROUND 3.1. Problem Statement Let I be a et of item, I = {i 1,i 2,i 3,,i n }, X i a et of item, X = {i 1,i 2,i 3,,i k } I called k - itemet. A tranaction T = {t 1, t 2, t 3,,t m }, denoted a T = (tid, I) where tid i tranaction ID. T D, where D i a tranactional databae. The cover of itemet X in D i the et of tranaction ID containing item from X. Cover(X, D) = {tid (tid, I) D, X I} The upport of an itemet X in D i count of tranaction containing item from X. Support (X, D) = Cover(X, D) An itemet i called frequent when it abolute minimum upport threhold σ ab, with 0 σ ab D. Partitioning of tranaction into et of group i called clutering. Let be the number of cluter then {C 1, C2, C3 C} i a et of cluter from {t 1, t 2, t 3,,t m }, where m i number of tranaction. Each tranaction i aigned to only one cluter i.e. C p φ C p C q for 1 p, q, C p i called a cluter. Let µ z be the mean of cluter C z, quared error between mean of cluter and tranaction in cluter i given a below, J (C ) = ti C t i µ k-mean i ued for minimizing um of quared error over all S cluter and i given by, S J (C ) = = 1 ti C 2 2 t i µ k-mean algorithm tart with one cluter and aign each tranaction to cluter with minimum quared error. 81

3.2. Apriori Algorithm Apriori i the firt frequent itemet mining algorithm which ha been put forward by Agarwal et al. [19]. Tranactional databae ha tranaction identifier and et of item preenting tranaction. Apriori algorithm can the horizontal databae and find frequent item of ize 1-item uing minimum upport condition. From thee frequent item dicovered in iteration 1 candidate itemet are formed and frequent itemet of ize two are extracted uing minimum upport condition. Thi proce i repeated till either lit of candidate itemet or frequent itemet i empty. It require repetitive can of databae. Monotonicity property i ued for removing frequent item. 3.3. Eclat Algorithm Eclat algorithm i propoed by Zaki et al. [20] which work on vertical databae. TID lit of each item i calculated and interection of TID lit of item i ued for extracting frequent itemet of ize k+1. No need of iterative can of databae but expenive to manipulate large TID lit. 3.4. k-mean Algorithm The k-mean algorithm [21] i well known technique of clutering which take number of cluter a input, random point are choen a centre of gravity and ditance meaure to calculate ditance of each point from centre of gravity. Each point i aigned to only one cluter baed on high intra-cluter imilarity and low inter-cluter imilarity. 4. CLUSTBIGFIM ALGORITHM Thi ection give high level architecture of ClutBigFIM algorithm and peudo code of phae ued in ClutBigFIM algorithm. 4.1. High Level Architecture Figure 2. High Level Architecture of ClutBigFIM Algorithm Clutering i applied on large dataet a one of the pre-proceing technique and then frequent itemet are mined from clutered data uing frequent itemet mining algorithm, Apriori and Eclat. 82

4.2. ClutBigFIM on MapReduce ClutBigFIM algorithm ha below phae, a. Find Cluter b. Finding k-fi c. Generate ingle global TID lit d. Mining of ubtree 4.2.1. Find Cluter K-mean clutering algorithm i ued for finding cluter from given large dataet. Cluter of tranaction are formed baed on below formula which calculate minimum quared error, J (C ) = ti C t i µ and aign each tranaction to the cluter. Input to thi phae i tranaction dataet and number of cluter, cluter of tranaction are generated like C={t 1,t 10,...t 40000 }. 2 Input : Cluter Size and Dataet Output : Cluter with ize z Step : 1. Find ditance between centre and tranaction id in map phae. 2. Ue combiner function to combine reult of above tep. 3. Compute MSE uing below formula and aign all point to cluter in reduce phae, J (C ) = S J (C ) = = 1 ti C t i µ ti C 2 2 t i µ 4. Repeat tep 1-3 by changing Centre and top when convergence criteria i reached. 4.2.2. Finding k-fi Tranaction ID lit for large dataet cannot be handled by Eclat algorithm, So frequent itemet of ize k are mined from generated cluter in above phae uing Apriori algorithm baed on minimum upport condition which handle problem of large dataet. Prefix tree i generated uing frequent itemet. 83

Input : Cluter Size, Minimum threhold σ, prefix length(l) Output : Prefixe with length l and k-fi Step : 5. Find upport of all item in a cluter uing Apriori algorithm. 6. Apply Support (x i )> σ and calculate FI uing monotonic property. 7. Repeat tep 5-6 till calculating all k-fi uing mapper and reducer. 8. Repeat tep 5-7 for cluter (1 To S) and find final k-fi. 9. Keep created prefixe in lexicographic order uing lexicographic prefix tree. 4.2.3. Generate ingle global TID lit Eclat algorithm ue vertical databae, item and lit of tranaction where item i preent. The global TID lit i generated by combining local TID lit uing mapper and reducer. Generated TID lit i ued in next phae. Input : Prefix Tree, Min Supportσ Output : Single TID lit of all item Step : 10. Calculate TID lit uing prefix tree in map phae 11. Create ingle TID lit from TID lit generated in above tep. Perform pruning with upport( i a ) upport( i b ) a < b 12. Generate prefix group, P k = (P k 1, P k 2,, P k n ) 4.2.4. Mining of Subtree Next (k+1) FI are mined uing Eclat algorithm. Prefix tree generated in phae2 i mined independently by mapper and frequent itemet are generated. Input : Prefix tree, Minimum upportσ Output : k-fi Step : 13. Apply Eclat algorithm and find FI till ize k. 14. Repeat tep 13 for each Subtree in map phae. 15. Find all frequent item of ize k and tore them in compreed trie format. 84

5. EXPERIMENTS Thi ection give overview of dataet ued and experimental reult with comparative analyi. For experiment 2 machine are going to be ued. Each machine contain Intel Core i5-3230m CPU@2.60GHz proceing unit and 6.00GB RAM with Ubuntu 12.04 and Hadoop 1.1.2. Currently algorithm run on ingle peudo ditributed hadoop cluter. Dataet ued from tandard UCI repoitory and FIMI repoitory in order to compare reult with exiting ytem uch a DitEclat and BigFIM. 5.1. Dataet Information Experiment are performed on below dataet, Muhroom Provided by FIMI repoitory [22] ha 119 item and 8,124 tranaction. T10I4D100K- Provided by UCI repoitory [23] ha 870 item and 100,000 tranaction. Retail - Provided by UCI repoitory [23]. Pumb - Provided by FIMI repoitory [22] ha 49,046 tranaction. 5.2. Reult Analyi Experiment are performed on T10I4D100K, Retail, Muhroom and Pumb dataet and execution time required for generating k-fi i compared baed on number of mapper and Minimum Support. Reult hown that Dit-Eclat i fater than BigFIM and ClutBigFIM algorithm on T10I4D100K but Dit-Eclat algorithm i not working on large dataet uch a Pumb. Dit-Eclat i not calable enough and face memory problem a the dataet ize increae. Experiment performed on T10I4D100K dataet in order to compare execution time with different Minimum Support and number of mapper on Dit-Eclat, BigFIM and ClutBigFIM. Table 1. how Execution Time (Sec) for T10I4D100K dataet with different value of Minimum Support and 6 number of mapper. Figure 3. how timing comparion for variou method on T10I4D100K dataet which how that Dit-Eclat ha fater performance over BigFIM and ClutBigFIM algorithm. Execution time decreae a Minimum Support value increae which how effect of Minimum Support on execution time. Table 2. how Execution Time (Sec) for T10I4D100K dataet with different value of Number of mapper and Minimum Support 100. Figure 4. how timing comparion for variou method on T10I4D100K dataet which how that Dit-Eclat ha fater performance over BigFIM and ClutBigFIM algorithm. Execution time increae a number of mapper increae a communication cot between mapper and reducer increae. Table 1. Execution Time (Sec) for T10I4D100K with different Support. Dataet T10I4D100K Algorithm Min. Support 100 150 200 250 300 Dit-Eclat 12 10 9 9 10 BigFIM 33 22 19 16 15 ClutBigFIM 30 21 18 15 15 No. of Mapper - 6 85

Table 2. Execution Time (Sec) for T10I4D100K with different No. of Mapper Dataet T10I4D100K Algorithm Number of Mapper 3 4 5 6 7 Dit-Eclat 6 7 7 9 9 BigFIM 21 25 29 32 37 ClutBigFIM 19 23 25 30 36 Minimum Support - 100 Figure 3. Timing comparion for variou method and Minimum Support on T10I4D100K Figure 4. Timing comparion for different method and No. of Mapper on T10I4D100K 86

Reult have been hown that ClutBigFIM algorithm work on Big Data. Experiment are performed on Pumb dataet. Dit-Eclat algorithm faced memory problem with Pumb dataet. Reult of ClutBigFIM are compared with BigFIM algorithm which i calable. Table 3. and Table 4. how execution time taken for BigFIM and ClutBigFIM algorithm on Pumb dataet with variable Minimum Support and No. of Mapper. Number of mapper i 20 and Minimum Support i 40000 for the experiment. Figure 3. And Figure 5 and Figure 6. how that ClutBigFIM algorithm ha better performance over BigFIM algorithm due to preproceing. Table 3. Execution Time (Sec) for Pumb with different Support. Dataet Pumb Algorithm Min. Support 25000 30000 35000 40000 45000 BigFIM 19462 6464 1256 453 36 ClutBigFIM 18500 5049 1100 440 30 No. of Mapper - 20 Table 4. Execution Time (Sec) for Pumb with different No. of Mapper Dataet Pumb Algorithm Number of Mapper 10 15 20 25 30 BigFIM 390 422 439 441 442 ClutBigFIM 385 419 435 438 438 Minimum Support - 40000. Figure 5. Timing comparion for different method and Minimum Support on Pumb 87

Figure 6. Timing comparion for different method and No. of Mapper on Pumb 6. CONCLUSIONS In thi paper we implemented FIM algorithm baed on MapReduce programming model. K- mean clutering algorithm focue on pre-proceing, frequent itemet of ize k are mined uing Apriori algorithm and dicovered frequent itemet are mined uing Eclat algorithm. ClutBigFIM work on large dataet with increaed execution efficiency uing pre-proceing. Experiment are done on tranactional dataet, reult hown that ClutBigFIM work on Big Data very efficiently and with higher peed. We are planning to run ClutBigFIM algorithm on different dataet for further comparative analyi. REFERENCES [1] Uama Fayyad, Gregory Piatetky-Shapiro, and Padhraic Smyth. 1996. The KDD proce for extracting ueful knowledge from volume of data. Commun. ACM 39, 11 (November 1996), 27-34. DOI=10.1145/240455.240464 [2] Rakeh Agrawal, Tomaz Imielińki, and Arun Swami. 1993. Mining aociation rule between et of item in large databae. SIGMOD Rec. 22, 2 (June 1993), 207-216. DOI=10.1145/170036.170072. [3] M. Zaki, S. Parthaarathy, M. Ogihara, and W. Li. Parallel algorithm for dicovery of aociation rule. Data Min. and Knowl. Dic., page 343 373, 1997. [4] G. A. Andrew. Foundation of Multithreaded, Parallel, and Ditributed Programming. Addion- Weley, 2000. [5] J. Li, Y. Liu, W. k. Liao, and A. Choudhary. Parallel data mining algorithm for aociation rule and clutering. In Intl. Conf. on Management of Data, 2008. [6] E. Ozkural, B. Ucar, and C. Aykanat. Parallel frequent item et mining with elective item replication. IEEE Tran. Parallel Ditrib. Syt., page 1632 1640, 2011. [7] M. J. Zaki. Parallel and ditributed aociation mining: A urvey. IEEE Concurrency, page 14 25, 1999. [8] L. Zeng, L. Li, L. Duan, K. Lu, Z. Shi, M. Wang, W. Wu, and P. Luo. Ditributed data mining: a urvey. Information Technology and Management, page 403 409, 2012. [9] J. Han, J. Pei, and Y. Yin. Mining frequent pattern without candidate generation. SIGMOD Rec., page 1 12, 2000. 88

[10] L. Liu, E. Li, Y. Zhang, and Z. Tang. Optimization of frequent itemet mining on multiple-core proceor. In Proceeding of the 33rd international conference on Very large data bae, VLDB 07, page 1275 1285. VLDB Endowment, 2007. [11] M.-Y. Lin, P.-Y. Lee and S.C. Hueh. Apriori-baed frequent itemet mining algorithm on MapReduce. In Proc. ICUIMC, page 26 30. ACM, 2012. [12] N. Li, L. Zeng, Q. He, and Z. Shi. Parallel implementation of Apriori algorithm baed on MapReduce. In Proc. SNPD, page 236 241, 2012. [13] S. Hammoud. MapReduce Network Enabled Algorithm for Claification Baed on Aociation Rule. Thei, 2011. [14] L. Zhou, Z. Zhong, J. Chang, J. Li, J. Huang, and S. Feng. Balanced parallel FP-Growth with MapReduce. In Proc. YC-ICT, page 243 246, 2010. [15] Sheng-Hui Liu; Shi-Jia Liu; Shi-Xuan Chen; Kun-Ming Yu, "IOMRA - A High Efficiency Frequent Itemet Mining Algorithm Baed on the MapReduce Computation Model," Computational Science and Engineering (CSE), 2014 IEEE 17th International Conference on, vol., no., pp.1290,1295, 19-21 Dec. 2014.doi: 10.1109/CSE.2014.247 [16] Moen, S.; Akehirli, E.; Goethal, B., "Frequent Itemet Mining for Big Data," Big Data, 2013 IEEE International Conference on, vol., no., pp.111,118, 6-9 Oct. 2013 doi: 10.1109/BigData.2013.6691742 [17] M. Riondato, J. A. DeBrabant, R. Foneca, and E. Upfal. PARMA: a parallel randomized algorithm for approximate aociation rule mining in MapReduce. In Proc. CIKM, page 85 94. ACM, 2012. [18] M. Malek and H. Kadima. Searching frequent itemet by clutering data: toward a parallel approach uing mapreduce. In Proc. WISE 2011 and 2012 Workhop, page 251 258. Springer Berlin Heidelberg, 2013. [19] R. Agrawal and R. Srikant. Fat algorithm for mining aociation rule in large databae. In Proc. VLDB, page 487 499, 1994. [20] M. Zaki, S. Parthaarathy, M. Ogihara, and W. Li. Parallel algorithm for dicovery of aociation rule. Data Min. and Knowl. Dic., page 343 373, 1997. [21] A K Jain, M N Murty, P. J. Flynn, Data Clutering: A Review, ACM COMPUTING SURVEYS, 1999. [22] Frequent itemet mining dataet repoitory. http://fimi.ua.ac.be/data, 2004. [23] T. De Bie. An information theoretic framework for data mining. In Proc. ACM SIGKDD, page 564 572, 2011. 89