Directed Graph based Distributed Sequential Pattern Mining Using Hadoop Map Reduce
|
|
|
- Gilbert French
- 9 years ago
- Views:
Transcription
1 Directed Graph based Distributed Sequential Pattern Mining Using Hadoop Map Reduce Sushila S. Shelke, Suhasini A. Itkar, PES s Modern College of Engineering, Shivajinagar, Pune Abstract - Usual sequential pattern mining algorithms experiences the scalability problem when trade with very big data sets. In existing systems like PrefixSpan, UDDAG major time is needed to generate projected databases like prefix and suffix projected database from given sequential database. In DSPM (Distributed Sequential Pattern Mining) Directed Graph is introduced to generate prefix and suffix projected database which reduces the execution time for scanning large database. In UDDAG, for each unique id UDDAG is created to find next level sequential patterns. So it requires maximum storage for each UDDAG. In DSPM single directed graph is used to generate projected database and finding patterns. To improve the scanning time and scalability problem we introduce a distributed sequential pattern mining algorithm on Hadoop platform using MapReduce programming model. We use transformed database to reduce scanning time and directed graph to optimize the memory storage. Mapper is used to construct prefix and suffix projected databases for each length-1 frequent item parallel. The Reducer combines all intermediary outcomes to get final sequential patterns. Experiment results are compared against UDDAG, different values of minimum support, different massive data sets and with and without Hadoop platform which improves the scaling and speed performances. Experimental results show that DSPM using Hadoop MapReduce solves the scaling problem as well as storage problem of UDDAG. Index Terms Distributed Environment, Directed Graph, Sequential Pattern Mining, Transformed Database. ***** I. INTRODUCTION Sequential pattern mining is an vital data mining practice widely used in many applications, like Bioinformatics, customer behavior guesses in transaction history, web mining, intrusion detection in network attack. Usually, the main intention of sequential pattern mining is to find frequent sequences within a sequential database. The problem was first proposed in [1]. There is good progress towards mining sequential patterns like GSP [2], SPAM [3] Apriori-based algorithm, FreeSpan [4], PSPM [5] projection-based, SPADE [6] vertical data format based algorithm and pattern growth based approach in PrefixSpan [7], UDDAG [8], have been proposed. Pattern growth approach is significantly used in many algorithms owing to its advantages like Divide and conquer strategy, compact database and without candidate generation. Mainly execution of all above algorithms is on standalone environment which has some weaknesses like scalability problem, less efficient for huge dataset, large scanning time for database. To increase the performance and resolve the scalability issues of sequential pattern mining many researchers have provide different methods to work on distributed environment like parallel mining, distribute the mining computation over different nodes, apply different distributed environments like grid computing, cluster, cloud, Hadoop, etc. In this paper we have proposed Directed Graph as a data structure to store the database efficiently which optimizes the memory and reduce the number of scans. Our algorithm is extended from UDDAG algorithm which is implemented using MapReduce programming model on hadoop distributed environment. Due to its execution on hadoop distributed environment, it solves the scalability problem. The rest of the paper is structured as section 2 defines the problem statement and related works. Section 3 explains preliminary review of UDDAG and Hadoop MapReduce model. Section 4 gives algorithm for Directed Graph based distributed Sequential Pattern Mining on Hadoop MapReduce. Experiment results and performance evaluation are mentioned in section 5. Section 6 conclude the paper and discuss about issues that need to be consider in future work. II. PROBLEM DEFINITION AND RELATED WORK 2.1 Problem Definition: Let D S be a sequence database, and I = x 1,, x m be a set of m different items, S = s 1,, s i is a sequence which contains an ordered list of itemsets. An itemset s i is a subset of items I. A sequence S a = a 1,, a n is a subsequence of sequence S b = b 1,, b m where 1 i 1 <... < i n m such that a 1 b i1, a 2 b i2,, a n b in. Sequential pattern mining is to find out the complete set of sequential patterns whose frequency count min_sup * D, where min_sup is a minimum support threshold. Table 1 represents the example of sequence database. Finding sequential patterns from massive dataset is not efficient on standalone system due to large scanning time and more memory storage. So, to discover the complete set of sequential patterns from massive datasets this paper proposed a system which uses MapReduce programming model on distributed environment Hadoop to increase the speed and scalability performance. Proposed system uses the directed graph for memory optimization and less scanning time. 431
2 TABLE 1 the algorithm creates a root vertex for x detects all the Example of Sequence Database patterns from prefix and suffix projected database of x and then it creates directed acyclic graph to represent the Seq_Id Sequence enclose relationship of frequent items. UDDAG generates less projected database compare to PrefixSpan as UDDAG 1 1 1, 2, 3 1, 3 4 3, 6 consider prefix and suffix projected database at the same 2 1, 4 3 2, 3 1, 5 time. UDDAG eliminates unnecessary candidate generation. 3 5, 6 1, 2 4, Data structure in UDDAG requires more memory storage than PrefixSpan but it involves major cost of storing , projected database. UDDAG has some challenging issues like independent frequent item detection, large memory usage for data structure UDDAG. The new system uses directed graph to store transformed database and it allows bidirectional pattern growth without creating projected database as like UDDAG. 2.2 Related Work: Large research has been done and going on in the area of parallel and distributed sequential pattern mining on different environments like Grid, cluster, cloud, Hadoop, Distributed processors or nodes, etc. A frequent sequence mining algorithm based on variant of the parallel tree projection is designed by Guralnik and Karypis [11]. The Par-CSP algorithm (parallel closed sequential pattern mining) which runs on distributed memory system was introduced in [12]. Their parallel algorithm is based on the modern frequent closed sequence mining algorithm BIDE [13]. In [9] [] by using Apriori method, GSP algorithm is executed on grid computing environment which is straightforward for loosely coupled methods. Grid computing platform requires a sophisticated method of data partitioning, stratagem to determine and allot the job to grid node dynamically and a smaller amount controlling grid can decrease the performance of entire structure. Cluster based algorithm is developed in [14], in which matching definition algorithm collects the sequence data into several clusters and then on parallel computers which are distributed memory modes clusters are distributed. SPAMC is nothing but Sequential pattern mining on the cloud algorithm which is extended from SPAM [15] is implemented on MapReduce framework in [16]. Pattern growth based PrefixSpan algorithm which performs pattern mining on huge data set on Hadoop environment by using MapReduce programming model has been projected in [17]. Multidimensional sequential pattern mining is performed on distributed sites in [18][21]. This paper is extended from UDDAG algorithm and concept of MapReduce programming model on Hadoop environment. III. PRELIMINARIES 3.1 Up Down Directed Acyclic Graph: In UDDAG, a new data structure, UpDown Directed Acyclic Graph for efficient mining of sequential patterns is used. DAG (Directed Acyclic Graph) represents patterns as vertexes with ids of transaction containing the pattern and directed edge as relationship of patterns. Up DAG represents DAG for patterns found in Prefix projected database while Down DAG represents DAG for patterns found in Suffix projected database of Frequent item x and by merging both DAG represents UDDAG with root vertex x. UDDAG allows bidirectional pattern growth from both the ends for detected sequential pattern. UDDAG algorithm uses transformed database where for each frequent item x, 3.2 Hadoop MapReduce: Hadoop MapReduce is a software framework which allows distributed processing of huge amounts of data (multi-terabyte data-sets) on thousands of nodes of service hardware in a consistent, fault-tolerant manner [19]. A MapReduce job usually divides the input data-set into independent structures which are executed by the map tasks in a fully parallel manner. The outputs of the map functions are sorted by framework, which are then given as input to the reduce tasks. Typically inputs and the outputs of the job are stored in terms of file system. The framework takes care of tasks like scheduling, monitoring and re-execution of failure tasks. Map Reduce is trigger by the map and reduce operations in LISP like functional languages. This model solves the computation problem through two functions: map and reduce. Essentially, the MapReduce model permits users to write map and reduce components with functional-style code. Finally, these components are scheduled by the MapReduce system to scattered assets for execution while handling many tough problems such as network communication, parallelization and fault tolerance. FIGURE 1 MAP REDUCE ARCHITECTURE Input for the map function is a key-value pair which produces a list of key-value couples as output which is characterized as: 432
3 map key 1, value 1 list key 2, value 2 Transformed Database A reduce function acquires all the output with similar key values and produce a single list as output which is characterized as: reduce key 2, list value 2 list value 3 The architecture of MapReduce model is shown in figure1. A MapReduce program mainly executes in two phases for parallel execution. In first phase, the input reader from master node splits the data into small parts and submits them to selected mapper programs randomly. In this process mappers perform their task in parallel processing stage and produces a group of [key, value] pairs. Each generated item is sorted and forwarded to the reducer. In second phase, reducer program combines all the items with given key and finds a single entity as a result. All reduce operations are performed independently similar to map operations. Input, final outputs are stored on a distributed file system whereas intermediate results are stored on local file system of map and reducer worker. Seq_Id Sequence 1 1 1, 2, 3, 4, 5 1, 5 6 5, 8 2 1, 6 5 3, 4, 5 1, 7 3 7, 8 1, 2, 3 6, , Construction of Directed Graph: Construct Directed graph for DT which contains vertex and directed edges. Vertex represents unique id assigned for frequent item. Each vertex also stores transaction ids where it appears in database and element id where frequent item appears. Figure 2 shows directed graph for table 2. IV. DIRECTED GRAPH BASED SEQUENTIAL PATTERN MINING ON HADOOP MAPREDUCE This section explains the directed graph based sequential pattern mining on Hadoop MapReduce, which transforms the original sequence database and then constructs the directed graph for storing transformed database. The construction of directed graph is done centrally and distributed at each node. Instead of projecting the database from transformed database and then constructing prefix and suffix projected database for finding sequential patterns as like UDDAG, directed graph based system make use of directed graph for construction of prefix and suffix database without constructing projected database. Construction of prefix and suffix database is done parallel by mapper and intermediate results are given as input to reducer to find the sequential patterns. 4.1 Database Transformation: Definition 1 (Frequent item set): The absolute support for an item set in a sequence database is the frequency count of transactions whose sequences involve the item set. Frequent item (FI) set is an item set with a support count larger than minimum support frequency.s First we assign unique id to all frequent items in FI set in sequence database D. For example, Frequent items of length 1 for Table 1database are: [1], [1, 2], [2], [2, 3], [3], [4], [5], [6]. By assigning a unique id to each FI, we get [1] - 1, [1, 2] - 2, [2] - 3, [2, 3] - 4, [3] - 5, [4] - 6, [5] - 7, [6] - 8. Definition 2 (Item pattern): An item pattern is a sequential pattern with accurately one item in each item set it contains. Definition 3 (Transformed Database): Let D S be a sequence database, by replacing item sets in each sequence with ids of FIs contained in the item set we get transformed database DT. Transformed database is shown in Table 2. TABLE 2 FIGURE 2 DIRECTED GRAPH Let, set of unique ids for each FI contains n ids are I 1, I 2,, I n. Vertex I 1 stores transaction ids and each transaction id stores element id which is represented as: I 1 T 1, T 2,, T m 1 m number of transactions in the database T 1 T 11,T 12,T 13, 2 T 2 T 21,T 24 3 T m T m1,,t mn, For example, for vertex 1 as per transformed database 1 1,2,3, , 1 2, , Edges in the graph are represented by item sequence in element in the sequence database which is used to traverse the graph while detecting the in between pattern. 433
4 Give unique ids to each frequent item in P and S then intersection of each element in P and a (the item pattern) with each element in S and find common transaction ids as P i a S j where i and j are ids of frequent item in P and S. For example, a = 8P = 1,2,3,7 S = 3,5 4.3 Prefix and Suffix Projected Database: Prefix and suffix database is created for each vertex by using directed graph. Suppose for vertex I1 transaction ids are as shown in equation 1 and element ids for each transaction are shown in equation 2,3,etc. While constructing Prefix database it ignore higher element id means ignore T13 and T24 from equation 2 and 3. However while constructing suffix database it ignore lower element id means ignore T11 and T21 from equation 2 and 3. For example let I1= 1 and set of transaction ids and element ids are as shown in equation 4then prefix and suffix database for 1 is as shown below: Pre (1 D ) = 1. < 1, 1 > 2. < 1 > Suf (1 D ) = 1. < 1, 1 > 2. < 1 > 4.4 Sequential Pattern Mining: Sequential pattern mining is performed by considering following three cases 1. Each vertex in directed graph is itself a sequential pattern. 2. Sequential pattern from Prefix and Suffix Projected Database Find the frequent items from prefix and suffix projected database by checking the support count value then detect the frequent patterns as below. Take intersection as P 1 8 S 1 that is = 3,4 Now check the sequence of element ids for 3 rd and 4 th transaction for vertex 1, 8 and 3 in directed graph. If the sequence for all above vertex is not in ascending order then there is support count for this pattern else count the frequency count as 1. For vertex 1, 8 and 3 the sequence is , , 3 5 no frequency count. For Vertex 7, 8, 3, the common transaction ids are = {3, 4} and the sequence for 3 rd is , , 3 5. Consider as ascending order so frequency count for pattern 783 for 3 rd transaction is Architecture of for Directed Graph based Distributed Sequential Pattern Mining on Hadoop MapReduce To find the sequential patterns from prefix Projected database; if FI in Pre (a D ) for pattern a then append a after each FI. For example, a = 8, FI set for Pre 8 D = 1, 2, 3, 7 Then sequential patterns are 1,8, 2,8, 3,8, 7,8 To find the sequential patterns from suffix Projected database; if FI in Suf (a D ) for pattern a then append each FI after a. For example, a = 8, FI set for Suf 8 D = 3, 5 Then sequential patterns are 8,3, 8,5 3. Discover Sequential patterns where length-1 pattern exist in between. Sequential patterns in this case are like length-1 pattern exist in between the beginning and ending of sequential patterns by considering its prefix and suffix database. First find the frequent pattern as in case 2. Let, Pre a D = b, c, d, e P Suf a D = d, f S FIGURE 3 DSPM ARCHITECTURE Architecture diagram for DSPM on hadoop Map reduce is shown in figure 3. MapReduce model is applied in two 434
5 phases. Phase 1 finds length-1 frequent items from sequence SP 1 = 1, 1,1 database and minimum support threshold value using map reduce model. Transformed database is generated from SP 2 = 2 length-1 frequent items. Directed graph is constructed to SP 3 = 3, 1,3, 3,1, 1,3,1 store transformed database which is used as input for next map reduce model in phase 2. In this prefix and suffix SP 4 = 4, 1,4, 4,1, 1,4,1 databases are generated and frequent item which support 5, 1,5, 2,5, 3,5, 5,5, 1,3,5, 1,5,5, SP 5 = minimum support count value are generated by mapper and 5,1, 5,3, 1,5,1, 1,5,3 reducer. Combined output of all reducer is nothing but final sequential patterns. 4.6 Algorithm for Directed Graph based Distributed Sequential Pattern Mining: Input: Sequence Database D, Minimum Support value as min_sup Output: P the complete set of patterns in D Algorithm: 1. Input is given to mapreduce framework where it scans the database D and find length-1 FI as FISet=D.getFI(min_sup). 2. Transform the sequence database by calling D.transform(D,FISet). 3. Construct the directed graph by calling D.constructDG(Transformed Database, FISet) 4. Assign task of generating prefix and suffix database to mapper function as genpresufdb(dgraph,min_sup) parallelely on multiple nodes. 5. Intermediate result of step 5 is passed as input to reducer function to find the sequential patterns by following 3 ways a. Copy all length-1 frequent items to list of sequential pattern P. b. Find FI from prefix and suffix database by using case 2 and append it to pattern P. c. Find FI for in between pattern using case 3. The algorithm first scans the database and generates length-1 FI set which is used to create transformed database using mapreduce. Next step in is to construct directed graph by using transformed database and FI set. Mapper function on different node generate prefix and suffix projected database parallel. Intermediate result of mapper function is passed as input to reducer function to find the sequential patterns as mentioned in section 4.4. Task of finding length- 1 frequent item and finding patterns from prefix and suffix database is done on hadoop mapreduce platform to perform the task on distributed environment where it achieves maximum scalability. Final sequential patterns for database in Table 1 by using above algorithm are discovered for each vertex in directed graph are listed below as Sequential Pattern (SP i ) where i indicates vertex in graph. SP 6 = SP 7 = SP 8 = 6, 1,6, 2,6, 3,6, 6,3, 6,5, 6,5,3, 3,6,5, 2,6,5, 1,6,5 7, 7,1, 7,3, 7,1,3, 7,5, 7,1,5, 7,3,5, 7,5,3 8, 1,8, 2,8, 3,8, 7,8, 8,3, 8,5, 8,3,5, 8,5,3, 7,8,3, 7,8,5, 7,8,5,3 In Proposed system task of constructing prefix and suffix database is done on multiple nodes parallel so, it reduces the execution time as compare to old algorithms. Also, it uses directed graph to store database which gives better memory optimization and less scanning time. V. EXPERIMENTAL RESULTS The proposed system is tested on synthetic data sets which are generated by IBM Quest Synthetic Data Generator [21]. Experimental result shows the scalability performance between Directed Graph based system verses existing sequential pattern algorithm UDDAG, comparison of Directed Graph with and without Hadoop platform. Performance of the system will be calculated against different sizes of datasets, different values of minimum support and number of nodes in distributed environment. The data sets were generated for parameters shown in table 3. Symbol C N S T I TABLE 3 Parameters for generating Datasets Name Number of Sequences Number of Different Items Average number of items per sequence Average number of items in transactions Average number of items in transaction in pattern Experiment Result1 shows the performance for dataset C5NS8T3I3 having 5k sequences, distinct items and 5% min_support in figure 4. It highlights the scalability performance for UDDAG, Directed Graph 435
6 Execution Time (S) Execution Time (Min) Execution time (S) Execution Time (Min) International Journal on Recent and Innovation Trends in Computing and Communication ISSN: and Directed Graph on Hadoop C5NS8T3I3 with Minimum Support 5 % UDDAG Directed GraphDirected Graph on Hadoop Performance on Different Nodes Minimum Support % 2 Nodes 4 Nodes FIGURE 4 EXECUTION TIME COMPARISONS WITH MIN_SUP 5% Experiment 2 display the performance for same dataset but with different min_sup values as 5%, % and 2% for all three algorithms. It is given table 4 as well as in figure 5. TABLE 4 EXECUTION TIME COMPARISONS FOR UDDAG, DG AND DG ON Minimum Support % UDDAG (Seconds) HADOOP 5 2 Minimum Support % FIGURE 5 EXECUTION TIME COMPARISONS FOR UDDAG, DG AND DG ON HADOOP Directed Graph (Seconds) Directed Graph on Hadoop (Seconds) UDDAG DG DGH Experiment 3 shows the performance for C8S8I8 dataset having 8k sequences, distinct items with min_sup %, 2% and 5% and the result is taken on 2 and 4 nodes. It is shown in figure 6. FIGURE 6 COMPARISONS ON DIFFERENT NODES Experiment 4 shows the performance on different nodes like 2 and 4 for two different datasets as C8S8I8and CS8T5I8 with minimum support 5%. It is displayed in figure 7. By increasing the different number of nodes system gives better speed and scalability performance. FIGURE 7 COMPARISONS ON DIFFERENT NODES WITH DIFFERENT DATASETS VI. CONCLUSION AND FUTURE WORK This paper studies many existing sequential pattern mining algorithm which are implemented on distributed environment like Grid, Cloud, Cluster and Hadoop MapReduce. New system is implemented in this paper which uses Hadoop MapReduce environment to find sequential patterns. DSPM first converts the sequence database into transformed database which is represented by directed graph as data structure to reduce memory storage and scanning time of database frequently. Because of distributed environment like Hadoop the scaling problem is improved in the new system. DSPM is compared with UDDAG on single system and it gives better performance using directed graph data structure Number of Nodes C8S8T3I8 CS8T5I8 436
7 DSPM with directed graph is tested on 2 as well as 4 Int l Conf. knowledge Discovery and Data Mining, pp. nodes and it shows good performance. There are some , 22. challenging issues that need to be solved in future like [16] Chun-Chieh Chen, Chi-Yao Tseng,Ming-Syan Chen, finding constraint based, closed sequences sequential "Highly Scalable Sequential Pattern Mining Based on MapReduce Model on the Cloud", IEEE International patterns on distributed environment. Congress on Big Data, 213. REFERENCES [1] R. Agrawal and R. Srikant, Mining Sequential Patterns, Proc. of the 11th Int l Conf. on Data Engineering (ICDE 95), [2] Srikant R. and Agrawal R., Mining sequential patterns: Generalizations and performance improvements, In Proceedings of the 5th International Conference Extending Database Technology, 1996, pp 57, [3] J. Ayres, J. Gehrke, T. Yu, and J. Flannick, Sequential Pattern Mining Using a Bitmap Representation, Proc. Int l Conf. knowledge Discovery and Data Mining 22, pp , 22. [4] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M.C. Hsu, FreeSpan: Frequent Pattern-Projected Sequential Pattern Mining, Proc. of the 6th ACM SIGKDD Int l Conf. on Knowledge Discovery and Data Mining (KDD ), 2. [5] Taoshen Li, Weina Wang, Qingfeng Chen, On the Sequential Pattern Mining Algorithm Based on Projection position, The 8th International Conference on Computer Science & Education (ICCSE 213) April 26-28, 213. Colombo, Sri Lanka. [6] ZakiM, SPADE: an efficient algorithm for mining frequent sequences, Mach Learn, 21. [7] Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Jianyong Wang, Helen Pinto, Qiming Chen, Umeshwar Dayal, Mei-Chun Mining Sequential Patterns by Pattern Growth: The PrefixSpan Approach, IEEE Transaction on Knowledge and Data Engineering, Vol. 16, No., Oct 24. [8] Jinlin Chen, An UpDown Directed Acyclic Graph Approach for Sequential Pattern Mining, IEEE Transaction on Knowledge and Data Engineering, Vol. 22, No. 7, Oct 2. [9] Chih-Hung Wu, Yu-Chieh Lo, Mining Sequential Patterns on a Grid-Computing Environment, IEEE International Conference on Systems, Man, and Cybernetics October 8-11, 26, Taipei, Taiwan. [] Chih-Hung Wu, Chih-Chin Lai, Yu-Chieh Lo, An empirical study on mining sequential patterns in a grid computing environment, Expert Systems with Applications, [11] Guralnik V, Karypis G., Parallel tree-projection-based sequence mining algorithms, Parallel Computing, 24. [12] Cong S, Han J, Padua D., Parallel mining of closed sequential patterns, In Proceeding of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, Chicago, USA, 25: [13] Wang J, Han J., BIDE: Efficient mining of frequent closed sequences, In: Proceedings of the 2th International Conference on Data Engineering. Boston, USA, 24: [14] Shaochun Wu, Genfeng Wu, Shenjie Jin, Pre-Clustering based Sequential Pattern mining, IEEE, 24. [15] J. Ayres, J. Gehrke, T. Yu, and J. Flannick, Sequential Pattern Mining Using a Bitmap Representation, Proc. [17] Wei Yong-qing, Liu Dong, Duan Lin-shan," Distributed PrefixSpan Algorithm Based on MapReduce", International Symposium on Information Technology in Medicine and Education, 212. [18] Kong-Fa-Hu, Chang-Hai Zhang, Ling Chen, "A Scalable Method of Mining Approximate Multidimensional Sequential Patterns on Distributed Systems", Proceedings of the Sixth International Conference on Machine Learning and Cybernetics, Hong Kong, August 27. [19] Apache Hadoop [2] IBM Quest Synthetic Data Generator [21] S. Itkar, S. Shelke, Sequential Pattern Mining and Distributed sequential pattern mining- A Survey and Future Scope,ICACCE-214, PP
A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains
A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains Dr. Kanak Saxena Professor & Head, Computer Application SATI, Vidisha, [email protected] D.S. Rajpoot Registrar,
SPMF: a Java Open-Source Pattern Mining Library
Journal of Machine Learning Research 1 (2014) 1-5 Submitted 4/12; Published 10/14 SPMF: a Java Open-Source Pattern Mining Library Philippe Fournier-Viger [email protected] Department
Mining Interesting Medical Knowledge from Big Data
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 1, Ver. II (Jan Feb. 2016), PP 06-10 www.iosrjournals.org Mining Interesting Medical Knowledge from
Binary Coded Web Access Pattern Tree in Education Domain
Binary Coded Web Access Pattern Tree in Education Domain C. Gomathi P.G. Department of Computer Science Kongu Arts and Science College Erode-638-107, Tamil Nadu, India E-mail: [email protected] M. Moorthi
Comparison of Data Mining Techniques for Money Laundering Detection System
Comparison of Data Mining Techniques for Money Laundering Detection System Rafał Dreżewski, Grzegorz Dziuban, Łukasz Hernik, Michał Pączek AGH University of Science and Technology, Department of Computer
Performance Analysis of Sequential Pattern Mining Algorithms on Large Dense Datasets
Performance Analysis of Sequential Pattern Mining Algorithms on Large Dense Datasets Dr. Sunita Mahajan 1, Prajakta Pawar 2 and Alpa Reshamwala 3 1 Principal, 2 M.Tech Student, 3 Assistant Professor, 1
Log Mining Based on Hadoop s Map and Reduce Technique
Log Mining Based on Hadoop s Map and Reduce Technique ABSTRACT: Anuja Pandit Department of Computer Science, [email protected] Amruta Deshpande Department of Computer Science, [email protected]
A Comparative study of Techniques in Data Mining
A Comparative study of Techniques in Data Mining Manika Verma 1, Dr. Devarshi Mehta 2 1 Asst. Professor, Department of Computer Science, Kadi Sarva Vishwavidyalaya, Gandhinagar, India 2 Associate Professor
Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm
R. Sridevi et al Int. Journal of Engineering Research and Applications RESEARCH ARTICLE OPEN ACCESS Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi,*
Searching frequent itemsets by clustering data
Towards a parallel approach using MapReduce Maria Malek Hubert Kadima LARIS-EISTI Ave du Parc, 95011 Cergy-Pontoise, FRANCE [email protected], [email protected] 1 Introduction and Related Work
Static Data Mining Algorithm with Progressive Approach for Mining Knowledge
Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive
Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2
Advanced Engineering Forum Vols. 6-7 (2012) pp 82-87 Online: 2012-09-26 (2012) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/aef.6-7.82 Research on Clustering Analysis of Big Data
Task Scheduling in Hadoop
Task Scheduling in Hadoop Sagar Mamdapure Munira Ginwala Neha Papat SAE,Kondhwa SAE,Kondhwa SAE,Kondhwa Abstract Hadoop is widely used for storing large datasets and processing them efficiently under distributed
Knowledge Based Context Awareness Network Security For Wireless Networks
Knowledge Based Context Awareness Network Security For Wireless Networks Deepa U. Mishra Department of Computer Engineering Smt. Kashibai Navale College of Engineering Pune, India Abstract - Context awareness
Improving Apriori Algorithm to get better performance with Cloud Computing
Improving Apriori Algorithm to get better performance with Cloud Computing Zeba Qureshi 1 ; Sanjay Bansal 2 Affiliation: A.I.T.R, RGPV, India 1, A.I.T.R, RGPV, India 2 ABSTRACT Cloud computing has become
Classification On The Clouds Using MapReduce
Classification On The Clouds Using MapReduce Simão Martins Instituto Superior Técnico Lisbon, Portugal [email protected] Cláudia Antunes Instituto Superior Técnico Lisbon, Portugal [email protected]
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology Email: [email protected] November-2015 Volume 2, Issue-6 www.ijermt.org Modeling Big Data Characteristics for Discovering
MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH
MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH M.Rajalakshmi 1, Dr.T.Purusothaman 2, Dr.R.Nedunchezhian 3 1 Assistant Professor (SG), Coimbatore Institute of Technology, India, [email protected]
A Fraud Detection Approach in Telecommunication using Cluster GA
A Fraud Detection Approach in Telecommunication using Cluster GA V.Umayaparvathi Dept of Computer Science, DDE, MKU Dr.K.Iyakutti CSIR Emeritus Scientist, School of Physics, MKU Abstract: In trend mobile
Top 10 Algorithms in Data Mining
Top 10 Algorithms in Data Mining Xindong Wu ( 吴 信 东 ) Department of Computer Science University of Vermont, USA; 合 肥 工 业 大 学 计 算 机 与 信 息 学 院 1 Top 10 Algorithms in Data Mining by the IEEE ICDM Conference
Top Top 10 Algorithms in Data Mining
ICDM 06 Panel on Top Top 10 Algorithms in Data Mining 1. The 3-step identification process 2. The 18 identified candidates 3. Algorithm presentations 4. Top 10 algorithms: summary 5. Open discussions ICDM
Distributed Apriori in Hadoop MapReduce Framework
Distributed Apriori in Hadoop MapReduce Framework By Shulei Zhao (sz2352) and Rongxin Du (rd2537) Individual Contribution: Shulei Zhao: Implements centralized Apriori algorithm and input preprocessing
Advances in Natural and Applied Sciences
AENSI Journals Advances in Natural and Applied Sciences ISSN:1995-0772 EISSN: 1998-1090 Journal home page: www.aensiweb.com/anas Clustering Algorithm Based On Hadoop for Big Data 1 Jayalatchumy D. and
ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL
International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR
DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE
DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE SK MD OBAIDULLAH Department of Computer Science & Engineering, Aliah University, Saltlake, Sector-V, Kol-900091, West Bengal, India [email protected]
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington
WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS
WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS Biswajit Biswal Oracle Corporation [email protected] ABSTRACT With the World Wide Web (www) s ubiquity increase and the rapid development
Association Rule Mining using Apriori Algorithm for Distributed System: a Survey
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. VIII (Mar-Apr. 2014), PP 112-118 Association Rule Mining using Apriori Algorithm for Distributed
Detection of Distributed Denial of Service Attack with Hadoop on Live Network
Detection of Distributed Denial of Service Attack with Hadoop on Live Network Suchita Korad 1, Shubhada Kadam 2, Prajakta Deore 3, Madhuri Jadhav 4, Prof.Rahul Patil 5 Students, Dept. of Computer, PCCOE,
International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET
DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand
What is Analytic Infrastructure and Why Should You Care?
What is Analytic Infrastructure and Why Should You Care? Robert L Grossman University of Illinois at Chicago and Open Data Group [email protected] ABSTRACT We define analytic infrastructure to be the services,
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data
Novel Framework for Distributed Data Stream Mining in Big data Analytics Using Time Sensitive Sliding Window
ISSN(Print): 2377-0430 ISSN(Online): 2377-0449 JOURNAL OF COMPUTER SCIENCE AND SOFTWARE APPLICATION In Press Novel Framework for Distributed Data Stream Mining in Big data Analytics Using Time Sensitive
Implementing Graph Pattern Mining for Big Data in the Cloud
Implementing Graph Pattern Mining for Big Data in the Cloud Chandana Ojah M.Tech in Computer Science & Engineering Department of Computer Science & Engineering, PES College of Engineering, Mandya [email protected]
An Efficient GA-Based Algorithm for Mining Negative Sequential Patterns
An Efficient GA-Based Algorithm for Mining Negative Sequential Patterns Zhigang Zheng 1, Yanchang Zhao 1,2,ZiyeZuo 1, and Longbing Cao 1 1 Data Sciences & Knowledge Discovery Research Lab Centre for Quantum
Introduction to Parallel Programming and MapReduce
Introduction to Parallel Programming and MapReduce Audience and Pre-Requisites This tutorial covers the basics of parallel programming and the MapReduce programming model. The pre-requisites are significant
Big Data with Rough Set Using Map- Reduce
Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,
Fast Discovery of Sequential Patterns through Memory Indexing and Database Partitioning
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 21, 109-128 (2005) Fast Discovery of Sequential Patterns through Memory Indexing and Database Partitioning MING-YEN LIN AND SUH-YIN LEE * * Department of
MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu
1 MapReduce on GPUs Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu 2 MapReduce MAP Shuffle Reduce 3 Hadoop Open-source MapReduce framework from Apache, written in Java Used by Yahoo!, Facebook, Ebay,
16.1 MAPREDUCE. For personal use only, not for distribution. 333
For personal use only, not for distribution. 333 16.1 MAPREDUCE Initially designed by the Google labs and used internally by Google, the MAPREDUCE distributed programming model is now promoted by several
Performance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster
Performance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster Sudhakar Singh Dept. of Computer Science Faculty of Science Banaras Hindu University Rakhi Garg Dept. of Computer
TOWARDS EFFICIENT AND SCALABLE DATA MINING USING SPARK
TOWARDS EFFICIENT AND SCALABLE DATA MINING USING SPARK Jie Deng 1, Zhiguo Qu 2, Yongxu Zhu 1, Gabriel-Miro Muntean 2 and Xiaojun Wang 2 The Rince Institute, Dublin City University, Ireland E-mail: 1 {jie.deng3,zhu.zhuyonz2}@mail.dcu.ie
A Small-time Scale Netflow-based Anomaly Traffic Detecting Method Using MapReduce
, pp.231-242 http://dx.doi.org/10.14257/ijsia.2014.8.2.24 A Small-time Scale Netflow-based Anomaly Traffic Detecting Method Using MapReduce Wang Jin-Song, Zhang Long, Shi Kai and Zhang Hong-hao School
MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM
MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM J. Arokia Renjit Asst. Professor/ CSE Department, Jeppiaar Engineering College, Chennai, TamilNadu,India 600119. Dr.K.L.Shunmuganathan
A Locality Enhanced Scheduling Method for Multiple MapReduce Jobs In a Workflow Application
2012 International Conference on Information and Computer Applications (ICICA 2012) IPCSIT vol. 24 (2012) (2012) IACSIT Press, Singapore A Locality Enhanced Scheduling Method for Multiple MapReduce Jobs
Mining various patterns in sequential data in an SQL-like manner *
Mining various patterns in sequential data in an SQL-like manner * Marek Wojciechowski Poznan University of Technology, Institute of Computing Science, ul. Piotrowo 3a, 60-965 Poznan, Poland [email protected]
A hybrid algorithm combining weighted and hasht apriori algorithms in Map Reduce model using Eucalyptus cloud platform
A hybrid algorithm combining weighted and hasht apriori algorithms in Map Reduce model using Eucalyptus cloud platform 1 R. SUMITHRA, 2 SUJNI PAUL AND 3 D. PONMARY PUSHPA LATHA 1 School of Computer Science,
Analysis of Large Web Sequences using AprioriAll_Set Algorithm
Analysis of Large Web Sequences using AprioriAll_Set Algorithm Dr. Sunita Mahajan 1, Prajakta Pawar 2 and Alpa Reshamwala 3 1 Principal,Institute of Computer Science, MET, Mumbai University, Bandra, Mumbai,
Chapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
A Time Efficient Algorithm for Web Log Analysis
A Time Efficient Algorithm for Web Log Analysis Santosh Shakya Anju Singh Divakar Singh Student [M.Tech.6 th sem (CSE)] Asst.Proff, Dept. of CSE BU HOD (CSE), BUIT, BUIT,BU Bhopal Barkatullah University,
Mining Large Datasets: Case of Mining Graph Data in the Cloud
Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large
Resource Scalability for Efficient Parallel Processing in Cloud
Resource Scalability for Efficient Parallel Processing in Cloud ABSTRACT Govinda.K #1, Abirami.M #2, Divya Mercy Silva.J #3 #1 SCSE, VIT University #2 SITE, VIT University #3 SITE, VIT University In the
Part 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
Survey on Load Rebalancing for Distributed File System in Cloud
Survey on Load Rebalancing for Distributed File System in Cloud Prof. Pranalini S. Ketkar Ankita Bhimrao Patkure IT Department, DCOER, PG Scholar, Computer Department DCOER, Pune University Pune university
UPS battery remote monitoring system in cloud computing
, pp.11-15 http://dx.doi.org/10.14257/astl.2014.53.03 UPS battery remote monitoring system in cloud computing Shiwei Li, Haiying Wang, Qi Fan School of Automation, Harbin University of Science and Technology
Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies
Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,
Distributed Dynamic Load Balancing for Iterative-Stencil Applications
Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,
Optimization and analysis of large scale data sorting algorithm based on Hadoop
Optimization and analysis of large scale sorting algorithm based on Hadoop Zhuo Wang, Longlong Tian, Dianjie Guo, Xiaoming Jiang Institute of Information Engineering, Chinese Academy of Sciences {wangzhuo,
Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
A computational model for MapReduce job flow
A computational model for MapReduce job flow Tommaso Di Noia, Marina Mongiello, Eugenio Di Sciascio Dipartimento di Ingegneria Elettrica e Dell informazione Politecnico di Bari Via E. Orabona, 4 70125
A Survey on Association Rule Mining in Market Basket Analysis
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 4 (2014), pp. 409-414 International Research Publications House http://www. irphouse.com /ijict.htm A Survey
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
A Survey of Classification Techniques in the Area of Big Data.
A Survey of Classification Techniques in the Area of Big Data. 1PrafulKoturwar, 2 SheetalGirase, 3 Debajyoti Mukhopadhyay 1Reseach Scholar, Department of Information Technology 2Assistance Professor,Department
Survey on Scheduling Algorithm in MapReduce Framework
Survey on Scheduling Algorithm in MapReduce Framework Pravin P. Nimbalkar 1, Devendra P.Gadekar 2 1,2 Department of Computer Engineering, JSPM s Imperial College of Engineering and Research, Pune, India
Optimizing Online Spatial Data Analysis with Sequential Query Patterns
Optimizing Online Spatial Data Analysis with Sequential Query Patterns Chunqiu Zeng, Hongtai Li, Huibo Wang, Yudong Guang, Chang Liu, Tao Li, Mingjin Zhang, Shu-Ching Chen, Naphtali Rishe School of Computing
Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications
Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Ahmed Abdulhakim Al-Absi, Dae-Ki Kang and Myong-Jong Kim Abstract In Hadoop MapReduce distributed file system, as the input
ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS
CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS A.Divya *1, A.M.Saravanan *2, I. Anette Regina *3 MPhil, Research Scholar, Muthurangam Govt. Arts College, Vellore, Tamilnadu, India Assistant
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT Samira Daneshyar 1 and Majid Razmjoo 2 1,2 School of Computer Science, Centre of Software Technology and Management (SOFTEM),
Building A Smart Academic Advising System Using Association Rule Mining
Building A Smart Academic Advising System Using Association Rule Mining Raed Shatnawi +962795285056 [email protected] Qutaibah Althebyan +962796536277 [email protected] Baraq Ghalib & Mohammed
Efficient Analysis of Big Data Using Map Reduce Framework
Efficient Analysis of Big Data Using Map Reduce Framework Dr. Siddaraju 1, Sowmya C L 2, Rashmi K 3, Rahul M 4 1 Professor & Head of Department of Computer Science & Engineering, 2,3,4 Assistant Professor,
SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA
SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA J.RAVI RAJESH PG Scholar Rajalakshmi engineering college Thandalam, Chennai. [email protected] Mrs.
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,
An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework
An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework Jakrarin Therdphapiyanak Dept. of Computer Engineering Chulalongkorn University
Distributed Framework for Data Mining As a Service on Private Cloud
RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.
BSPCloud: A Hybrid Programming Library for Cloud Computing *
BSPCloud: A Hybrid Programming Library for Cloud Computing * Xiaodong Liu, Weiqin Tong and Yan Hou Department of Computer Engineering and Science Shanghai University, Shanghai, China [email protected],
VILNIUS UNIVERSITY FREQUENT PATTERN ANALYSIS FOR DECISION MAKING IN BIG DATA
VILNIUS UNIVERSITY JULIJA PRAGARAUSKAITĖ FREQUENT PATTERN ANALYSIS FOR DECISION MAKING IN BIG DATA Doctoral Dissertation Physical Sciences, Informatics (09 P) Vilnius, 2013 Doctoral dissertation was prepared
MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT
MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT 1 SARIKA K B, 2 S SUBASREE 1 Department of Computer Science, Nehru College of Engineering and Research Centre, Thrissur, Kerala 2 Professor and Head,
Comparison of Different Implementation of Inverted Indexes in Hadoop
Comparison of Different Implementation of Inverted Indexes in Hadoop Hediyeh Baban, S. Kami Makki, and Stefan Andrei Department of Computer Science Lamar University Beaumont, Texas (hbaban, kami.makki,
Association Rule Mining
Association Rule Mining Association Rules and Frequent Patterns Frequent Pattern Mining Algorithms Apriori FP-growth Correlation Analysis Constraint-based Mining Using Frequent Patterns for Classification
HadoopSPARQL : A Hadoop-based Engine for Multiple SPARQL Query Answering
HadoopSPARQL : A Hadoop-based Engine for Multiple SPARQL Query Answering Chang Liu 1 Jun Qu 1 Guilin Qi 2 Haofen Wang 1 Yong Yu 1 1 Shanghai Jiaotong University, China {liuchang,qujun51319, whfcarter,yyu}@apex.sjtu.edu.cn
Keywords: Big Data, HDFS, Map Reduce, Hadoop
Volume 5, Issue 7, July 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Configuration Tuning
An Efficient Frequent Item Mining using Various Hybrid Data Mining Techniques in Super Market Dataset
An Efficient Frequent Item Mining using Various Hybrid Data Mining Techniques in Super Market Dataset P.Abinaya 1, Dr. (Mrs) D.Suganyadevi 2 M.Phil. Scholar 1, Department of Computer Science,STC,Pollachi
