An OpenCL Candidate Slicing Frequent Pattern Mining Algorithm on Graphic Processing Units*
|
|
|
- Eugene Hood
- 9 years ago
- Views:
Transcription
1 An OpenCL Candidate Slicing Frequent Pattern Mining Algorithm on Graphic Processing Units* Che-Yu Lin Science and Information Engineering Chung Hua University Kun-Ming Yu Science and Information Engineering Chung Hua University Wen Ouyang Science and Information Engineering Chung Hua University Jiayi Zhou Science National Tsing Hua University Abstract Frequent pattern mining (FPM) is important in data mining field with Apriori algorithm to be one of the commonly used approaches to solve it. However, Aprori algorithm encounters an issue that the computation time increases dramatically when data size increases and when the threshold is small. Many parallel algorithms have been proposed to speed up the computation using computer clusters or grid systems. GPUs have also been applied on FPM with only few adopting OpenCL although OpenCL has the advantage of being platform independent. Thus, the aim of this research is to develop efficient parallel Aprori strategy using GPU and OpenCL. Our novel method, Candidate Slicing Frequent Pattern Mining (CSFPM) algorithm, improves over the previous method by slicing candidate information to better balance the load between processing units. This strategy is proved to be more efficient according to our experiments. For example, CSFPM is at most 2.6 times faster than the previous method. Therefore, CSFPM is an efficient parallel Apriori algorithm which can reduce computation time and improve overall performance. Keywords-frequent pattern mining, parallel processing, graphic processing unit (GPU), OpenCL I. INTRODUCTION Frequent Patterns Mining (FPM) extracts implicit patterns which show correlations between items from enormous transactional data sets. FPM has many applications such as the exploration of correlations between various diseases, medicine, and death. Another popular usage is, by mining the sales records of a store, to discover the correlations between customers and sale items and among sales items. FPM could be used to solve many problems, while different methods have been proposed. Apriori algorithm [2] and Frequent-Pattern Tree (FP-tree) algorithm [3-4] are two commonly used methods for FPM. Apriori algorithm, commonly applied in Market Basket [1], is simple, easy to understand, and easy to implement although it requires repeatedly scanning of database. Apriori algorithm is more suitable for parallel processing than FP-tree method is since balancing the load while parallelizing tree structures has been an issue of FP-tree [6]. However, Apriori algorithm experiences the fast computation acceleration problem when * This work is partially supported by the National Science Council NSC E MY2 The corresponding author /11/$ IEEE 2344 data set size increases and when the threshold is small. The maturity of data storage technology has tremendously increased the size of searched databases. While mining the database, the concern that some important information may be removed with high threshold value encourages low threshold pattern mining which costs much computing time. The hardware development technology has been advanced so quickly that the CPU structure has moved from single-core to multi-core which can be applied in parallel processing. Another approach to enhance the performance is to make use of multiple hosts or clusters. Many parallel algorithms have been proposed to speed up the computation time of Apriori algorithm, but most of them are designed for computer clusters or grid systems. If the use of OpenMP can be changed to General-purpose computing on graphics processing units (GPGPU) [7-8] technology, it would achieve even better performance [9-10]. Compute Unified Device Architecture (CUDA) [11] is a GPGPU language which is used to solve FPM [12]. On the other side, OpenCL (Open Computing Language) [13-14] is a cross-platform and cross-operating system language which is more convenient than CUDA is. Nevertheless, OpenCL has not been as widely used as CUDA by developers. In this study, Slicing Frequent Pattern Mining (CSFPM) algorithm was proposed to use GPU and OpenCL to develop efficient parallel Apriori algorithm for FPM problem. GPU is a computing device with many features different from CPU such as slower clock speed and limited memory size. Therefore, we need to fine tune the algorithm to get better performance. Since GPU clock is slower than CPU clock, assigning a thread to check one candidate is a good start to achieve parallel efficiency. Moreover, whether more speed up can be reached is interesting to explore. Thus, CSFPM tries to assign a thread to only checking one transaction in a candidate item. This strategy can reduce the processor waiting time since the load between processing units is more balanced. The results shows that, on the same platform, the CSFPM is faster than the previous method [16] Parallel Frequent Patterns Mining Algorithm on GPU (GPU-FPM) which parallelizes the algorithm using coarser approach.
2 The paper is organized as follows. Section 2 describes related work including Apriori Algorithm, Graphic Processing Unit, OpenCL and GPU-FPM method. Section 3 explains our method CSFPM Algorithm. Section 4 displays the experimental results and Section 5 concludes this paper. II. A. Apriori Algorithm RELATED WORK Apriori Algorithm was proposed by R. Agrawal and R. Srikant in 1994 [2]. Its goal is to find those implicit and nontrivial frequent data patterns of interest from the data set or database. Whether a pattern is frequent or not is determined by a predefined threshold. A frequent pattern must occur more often in the database than, if not as often as, what the threshold defines. The Apriori method starts from finding frequent patterns with only one item. Two frequent patterns can be merged into a higher-level candidate pattern with more items. Then, each newly-generated candidate pattern is verified by checking whether its occurrence count exceeds the specified threshold or not. This step requires database scanning. The candidate generation and verification process repeats until no more new patterns can be generated. For example, a transaction set in database D is {T 1, T 2, T 3, T 4 }. The item information for each transaction is listed in Table 1. The number of occurrence for each item is displayed in Table 2. If the threshold is 50%, a frequent pattern needs to appear in at least two transactions. Among all the one-item candidate patterns, only {A}, {B}, and {D} are frequent. Then, the candidate set for 2-item frequent patterns is {{A, B}, {A, D}, {B, D}} and only {A, B} satisfies the frequent pattern condition as shown in Table 3. The process of generating frequent patterns stops here since no more new candidate can be generated. Table 1: Transaction detail information Transaction id Items T1 { A,B,C } T2 { A,B } T3 { A,D } T4 { D,E } Table 2: Occurrence of items from Table 1 Items Times A 3 B 2 C 1 D 2 E 1 Table 3: Only {A, B} appears at least two times Items Times A,B 2 A,D 1 B,D 0 The creation of Transaction Identification (Tid) set was proposed [2] to save the time of scanning database after the first pass. For example, if the transaction list for Item 1 is {1, 2, 3, 4, 5} which means Item 1 appears in transactions 1, 2, 3, 4, and 5. The transaction list for Item 2 is {2, 4, 6}, and the transaction list for Item 3 is {1, 3, 5}. The Tid tables TidValue and TidIndex can then be built as in Figure 1 so that the verification of candidate patterns can be done by reading them. These two tables become smaller and smaller when higher-level candidate pattern sets are generated. Figure 1: The Transaction Identification Set B. Graphic Processing Unit (GPU) A GPU, a microprocessor for image processing, helps the CPU for graphics processing. There are hundreds or even thousands of computing units in a GPU. Each computing unit is like a simplified core of CPU. General-purpose computing on graphics processing units (GPGPU) was proposed to use a GPU as a CPU to provide non-graphic computing capabilities. The current GPGPU technologies include OpenCL and CUDA (Compute Unified Device Architecture). Since a GPU core is slower than a CPU core, it is normally assigned simple computing functions. The number of GPU cores is more than that of CPU cores, so GPU is suitable for parallel computing. C. Open Computing Language (OpenCL) The preliminary work of OpenCL was finished by AMD, IBM, Intel, and NVIDIA while it was initially developed by Apple Inc. The draft was submitted to the Khronos Group by the Apple Inc and then the GPGPU Khronos working group was established on June 16, Then OpenCL 1.1 was released on June 14, OpenCL is a framework which allows C program development in heterogeneous platforms; that is, the framework can be applied in any system which is composed of different CPUs, GPUs, and other computing platforms. It is able to perform in different operating systems as long as the OpenCL library is installed. The CPU and GPU can then communicate with each other and work together by applying the appropriate C++ file for CPU and Kernel file for OpenCL on GPUs to perform parallel computation with GPU. D. Parallel Frequent Patterns Mining Algorithm on GPU In 2010, Zhou et al. [16] proposed an OpenCL GPGPU frequent pattern mining algorithm GPU-FPM. It is a coarse grain parallel Apriori algorithm using Tid tables. In the method, the Tid table entries of the items for each candidate pattern are assigned to one computing thread for comparison. For example, when the number of threads is 1024, it processes 2345
3 1024 candidate patterns at the same time with each thread responsible for one candidate pattern verification. This method is suitable when using fast computation units. Since each candidate pattern may have different number of entries to process, some threads may need to wait for other threads to finish their tasks. Therefore, they are idle until all threads are done with the current processing. III. CANDIDATE SLICING FREQUENT PATTERN MINING ALGORITHM (CSFPM) Parallel Apriori Algorithm has been the focus of many researchers. Its issues are the massive amount of computation involved with large data set and the difficulty of load balancing between processing units. This work tries to efficiently parallelize the Apriori algorithm using GPU s highly parallelized feature in a personal computer. Note that it can be more efficient if the Apriori Algorithm is parallelized in a GPU with further task decomposition, since each GPU core is slow although it has more cores than CPU. The GPUs have highly parallel structure which makes them efficient in manipulating large set of data in parallel. Inspection of candidate patterns takes the most time in Apriori Algorithm and thus can be improved by applying GPU s parallel computation capability. There are several basic steps when applying Apriori algorithm. 1. Scan the database and calculate the Tidset. 2. Find the first-level candidate set. 3. If the candidate set is not empty, find out whether each candidate pattern in the set is frequent or not. Otherwise, got to step Compose one level higher candidate set and go to step 3 5. All frequent patterns are found. Tid set (Tidset) can be used to reduce the time of data access, compared with the database scanning for each level of candidate set processing. However, if Tidset in step 1 is generated by the GPU, it takes more time than that by only using CPU. Thus, data pre-treatment is not suitable for GPU in this case. It is better to use CPU to compute Tidset and to store it in GPU memory. It s natural to use one thread to verify whether a candidate pattern is frequent or not. There is a drawback in this type of task assignment; that is, the numbers of comparisons required in verifying each pattern is not the same and the variation can be very large. For example, if using a straightforward method, the amount of matches to check whether items 1 and 2 is a frequent pattern or not is 5*3 = 15 while that for checking items 2 and 3 is 3*3 = 9 as shown in Figure 1. Since only when all threads finishes their processing, will the control come back to CPU, many computation units are in idle state which decreases the parallization efficiency and effectiveness. How to parallelize the frequent pattern matching more efficiently becomes an essential issue. Consider three level-one frequent patterns: item 1, item 2, and item 3. Then, the next level candidate patterns are item 1-2, item 1-3, and item 2-3. When verifying each candidate pattern such as item 1-2, we call the first item (Item 1) compared item and the rest of the items in the same candidate pattern comparing item(s), i.e. item 2 in this example. If a candidate pattern owns three or more items, then there are two or more comparing items. The idea of this slicing algorithm is to slice the compared item and dedicate one thread to each transaction of it. This transaction is called sliced transaction for this thread. For item1-2 in the example, there are five threads assigned for five transactions of item 1. To facilitate the processing, two arrays of information are passed to the GPU. One array stores the candidate pattern pair (CPP) information such as {1, 2, 1, 3, 2, 3} meaning item 1-2, item 1-3, and item 2-3. Another array is the sliced thread assignment (STA) number {4, 9, 12} which indicates the thread numbers for the last transaction of all compared items in the CPP array. As Figure 2 shows, there are three compared items for three entries in STA. Five transactions 1, 2, 3, 4, and 5 are for item 1, so the first entry of array STA is 4 (0 to 4 for item 1). The second compared item is also item 1, so the thread numbers for it are from 5 to 9. The third compared item is item 2 with transactions 2, 4, and 6, so the third entry in STA is 12. Thus, for example, thread 10 is only responsible to verify whether transaction 2 is also a member of TidValue, {1, 3, 5}, for item 3 or not. Figure 2: The thread assignment for CPP {1,2,1,3,2,3}. A thread returns the sliced transaction number if there is a match; it returns 0 otherwise. Considering the result, threads 0 to 4 only check whether item 1 and item 2 share any transaction or not. If any one of these five threads returns the compared transaction number, we get one occurrence of this pattern item 1-2. The result of the threads 0 to 12 is shown in Figure 3. Figure 3: The result from the thread. From the result, item 1-2 has two occurrences whereas item 1-3 has three and item 2-3 has none. If the threshold is 40%, then both item 1-2 and item 1-3 are frequent patterns. The algorithm of CSFPM is described as below. Algorithm CSFPM Input: a transaction database D and a given minimum threshold. Output: a complete set of frequent patterns. 2346
4 CPU: 1. Start the CL program to be executed by the GPU 2. Load D from disk. 3. Generate Tidset via scanning D and store it on CPU memory. 4. Generate and verify first level frequent patterns. 5. Transform the 2D Tidset table for first level frequent patterns into 1D arrays TidValue and TidIndex. 6. Allocate memory space in GPU for TidValue and TidIndex 7. Stores arrays TidValue and TidIndex into GPU memory 8. Generate enough candidate patterns (itemset) arrays CPP and STA for the GPU threads to process 9. Allocate memory space in GPU for candidate patterns 10. Store candidate patterns in GPU 11. Allocate memory space in GPU to save the results 12. Wait until GPU finishes its program execution. 13. Result manipulation: a. Retrieve the results from GPU and save them in CPU memory b. Calculate the number of nonzero entries of each candidate itemset comparison. c. If this number is larger than or equal to what the threshold indicates, this pattern is frequent. 14. Repeat the process from step 8 to 13 until all the same level candidate patterns are done. 15. Move to next level candidate set generation and perform Steps 8-14 until all candidates are generated and verified. processing units and 1G DDR5 memory OS Microsoft Windows 7 Compiler Microsoft Visual C SDK ATI Stream SDK 2.3 / OpenCL 1.1 Table 5: Statistical Characteristic of Datasets Dataset Avg Trans Avg Len of No of Trans Len Max Pattern T10I4D100K T10I4D1M A. Smaller data set and different number of threads. The computation time is shown on Figure 3 when using the same data and threshold but different number of threads. The time used is the least when the number of threads is It only takes seconds. When the number of threads increases, large amount of data is fed to the GPU at all times which causes threads to be busy in swapping that wastes much time. Also, the amount of information internal to GPU also increases which consumes more time. GPU kernel thread function: 1. Retrieve the sliced transaction associated with this task. 2. Retrieve the comparing item transaction set(s) associated with this task. 3. Verify whether the sliced transaction exists in all comparing item transaction set(s) or not. 4. Skip the rest of verification step if no match, in any intermediate step, can be found. 5. Write down the result of this thread function. IV. EXPERIMENTAL RESULTS Experiments are conducted to verify the performance of the method CSFPM on GPU. The languages used are OpenCL and C++ on Visual Studio while the operating system is Microsoft Windows 7. Input data is from the IBM data generator [16]. Table 4 depicts the hardware and software configurations of the experiments. Table 5 is statistical characteristic of datasets. The time is averaged from 10 runs. Table 4: Hardware and software configurations Item Description CPU AMD Phenom II X GHz Memory 4G DDR3 memory GPU ATI Radeon HD 5850 with 1440 stream Figure 3: Computation time for smaller data set and different number of threads B. Larger data set and different number of threads When increasing the amount of data, CSFPM also effectively reduces the computing time as in Figure 4. The computing time is the least when the number of threads is It only takes seconds. More time is needed when the number of threads is raised. When the number of thread is 2048 or 4096, the speedup is not that much until the number of threads goes to
5 Figure 6: CSFPM compared with previous method Figure 4: Computation time for smaller data set and different number of threads C. CSFPM compared with CPU only This experiment compares the computation time of CSFPM with that of using only CPU with data T10I4D100K and threshold 200. As in Figure 5, it takes seconds on CPU while CSFPM (CPU+GPU) only takes seconds. The speedup is To use both GPU and CPU performs better than to only use CPU. V. CONCLUSIONS Frequent pattern mining (FPM) is an important problem and Apriori algorithm is a commonly used approach for it. However, the computation time suffers when the size of the data is increased or the threshold is very small. GPUs are highly parallel devices with limited memory and slow input and output. GPGPU is a trend and most of the existing methods for FPM cannot be directly converted for GPGPM usage efficiently. Thus, there is big room for improvement as this study has shown. We proposed a novel approach, CSFPM, to parallelize the candidate pattern matching in which each compared item is sliced into transaction granule for one GPU thread to process. CSFPM performs much better when compared with previous method GPU-FPM. The speedup is 2.6 when using ATI Radeon HD 5850, data T10I4D100K and threshold 200. Another experiment shows that if both GPU and CPU work together (CSFPM), the speedup is compared with only using CPU. Thus, CSFPM is an efficient way to parallelize the FPM tasks in GPU. REFERENCES Figure 5: CSFPM with CPU only D. CSFPM Compared with GPU-FPM on the same platform This experiment is conducted using ATI Radeon HD 5850, data T10I4D100K and threshold 200 on the same CPU and GPU, but with different methods. As in Figure 6, CSFPM is better than the previous method GPU-FPM on the same platform and number of data. This is true when applying different number of threads. For example, when the number of threads is 1024, it takes seconds while GPU-FPM takes seconds. The speedup is 2.6. [1] Sergey Brin, Rajeev Motwani, Jeffrey D. Unman and Shalom Tsur, Dynamic itemset counting and implication rules for market basket data ACM SIGMOD international conference on Management of data, vol. 26 Issue 2, pp , June [2] R. Agrawal, and R. Srikant, Fast algorithms for mining association rules, in International Conference on Very Large Data Bases, 1994, pp [3] A. Javed, and A. Khokhar, Frequent pattern mining on message passing multiprocessor systems, Distributed and Parallel Databases, vol. 16, no. 3, pp , [4] J. Han, J. Pei, Y. Yin et al., Mining frequent patterns without candidate generation: A frequent-pattern tree approach, Data Mining and Knowledge Discovery, vol. 8, no. 1, pp , [5] J. Park, M. Chen, and P. Yu, An effective hash-based algorithm for mining association rules, ACM SIGMOD Record, vol. 24, no. 2, pp ,
6 [6] K.-M. Yu, J. Zhou, T.-P. Hong et al., A Load-Balanced Distributed Parallel Mining Algorithm, Expert Systems with Applications, vol. 37, no. 3, pp , [7] D. Luebke, M. Harris, N. Govindaraju et al., GPGPU: general-purpose computation on graphics hardware. In SIGGRAPH '04: ACM SIGGRAPH 2004 Course Notes, 2005 [8] Seyong Lee, Seung-Jai Min, and Rudolf Eigenmann, OpenMP to GPGPU: A Compiler Framework for Automatic Translation and Optimization, ACM SIGPLAN Notices- PPoPP '09 vol. 44 Issue 4, April [9] Wenbin Fang, Ka Keung Lau, Mian Lu et al., Parallel Data Mining on Graphics Processors, Technical report, Hong Kong University of Science and Technology, [10] Christian Bohm, Robert Noll, Claudia Plant, et al., Data Mining Using Graphics Processing Units, Transactions on Large-Scale Data- and Knowledge-Centered Systems I, 2009 [11] NVIDIA. Compute Unified Device Architecture (CUDA), [12] Syed Hasan Adil and Sadaf Qamar, Implementation of Association Rule Mining using CUDA, Emerging Technologies ICET International Conference, pp , [13] ATI. "Stream SDK," [14] OpenCL. OpenCL, [15] R. Agrawal, and R. Srikant, Quest Synthetic Data Generator. IBM Almaden Research Center, San Jose, California, [16] J. Zhou, K.-M. Yu, and B.-C. Wu, Parallel frequent patterns mining algorithm on GPU, IEEE International Conference on Systems Man and Cybernetics, pp ,
ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU
Computer Science 14 (2) 2013 http://dx.doi.org/10.7494/csci.2013.14.2.243 Marcin Pietroń Pawe l Russek Kazimierz Wiatr ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Abstract This paper presents
Static Data Mining Algorithm with Progressive Approach for Mining Knowledge
Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive
MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu
1 MapReduce on GPUs Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu 2 MapReduce MAP Shuffle Reduce 3 Hadoop Open-source MapReduce framework from Apache, written in Java Used by Yahoo!, Facebook, Ebay,
Introduction to GPU Computing
Matthis Hauschild Universität Hamburg Fakultät für Mathematik, Informatik und Naturwissenschaften Technische Aspekte Multimodaler Systeme December 4, 2014 M. Hauschild - 1 Table of Contents 1. Architecture
Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism
Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism Jianqiang Dong, Fei Wang and Bo Yuan Intelligent Computing Lab, Division of Informatics Graduate School at Shenzhen,
GPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
Introduction to GPU Programming Languages
CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure
Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o [email protected]
Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o [email protected] Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket
ultra fast SOM using CUDA
ultra fast SOM using CUDA SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. Sijo Mathew Preetha Joy Sibi Rajendra Manoj A
Enhancing Cloud-based Servers by GPU/CPU Virtualization Management
Enhancing Cloud-based Servers by GPU/CPU Virtualiz Management Tin-Yu Wu 1, Wei-Tsong Lee 2, Chien-Yu Duan 2 Department of Computer Science and Inform Engineering, Nal Ilan University, Taiwan, ROC 1 Department
Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011
Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis
Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm
R. Sridevi et al Int. Journal of Engineering Research and Applications RESEARCH ARTICLE OPEN ACCESS Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi,*
GPUs for Scientific Computing
GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles [email protected] Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research
Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview
Dynamic resource management for energy saving in the cloud computing environment
Dynamic resource management for energy saving in the cloud computing environment Liang-Teh Lee, Kang-Yuan Liu, and Hui-Yang Huang Department of Computer Science and Engineering, Tatung University, Taiwan
Next Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
L20: GPU Architecture and Models
L20: GPU Architecture and Models scribe(s): Abdul Khalifa 20.1 Overview GPUs (Graphics Processing Units) are large parallel structure of processing cores capable of rendering graphics efficiently on displays.
MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH
MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH M.Rajalakshmi 1, Dr.T.Purusothaman 2, Dr.R.Nedunchezhian 3 1 Assistant Professor (SG), Coimbatore Institute of Technology, India, [email protected]
Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers
Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Haohuan Fu [email protected] High Performance Geo-Computing (HPGC) Group Center for Earth System Science Tsinghua University
Introduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware
Selection of Optimal Discount of Retail Assortments with Data Mining Approach
Available online at www.interscience.in Selection of Optimal Discount of Retail Assortments with Data Mining Approach Padmalatha Eddla, Ravinder Reddy, Mamatha Computer Science Department,CBIT, Gandipet,Hyderabad,A.P,India.
GPGPU Computing. Yong Cao
GPGPU Computing Yong Cao Why Graphics Card? It s powerful! A quiet trend Copyright 2009 by Yong Cao Why Graphics Card? It s powerful! Processor Processing Units FLOPs per Unit Clock Speed Processing Power
Binary Coded Web Access Pattern Tree in Education Domain
Binary Coded Web Access Pattern Tree in Education Domain C. Gomathi P.G. Department of Computer Science Kongu Arts and Science College Erode-638-107, Tamil Nadu, India E-mail: [email protected] M. Moorthi
Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga
Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.
Interactive Level-Set Deformation On the GPU
Interactive Level-Set Deformation On the GPU Institute for Data Analysis and Visualization University of California, Davis Problem Statement Goal Interactive system for deformable surface manipulation
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.
Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
A Survey on Association Rule Mining in Market Basket Analysis
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 4 (2014), pp. 409-414 International Research Publications House http://www. irphouse.com /ijict.htm A Survey
Intro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1
Intro to GPU computing Spring 2015 Mark Silberstein, 048661, Technion 1 Serial vs. parallel program One instruction at a time Multiple instructions in parallel Spring 2015 Mark Silberstein, 048661, Technion
ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING
ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING Sonam Mahajan 1 and Maninder Singh 2 1 Department of Computer Science Engineering, Thapar University, Patiala, India 2 Department of Computer Science Engineering,
Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software
GPU Computing Numerical Simulation - from Models to Software Andreas Barthels JASS 2009, Course 2, St. Petersburg, Russia Prof. Dr. Sergey Y. Slavyanov St. Petersburg State University Prof. Dr. Thomas
APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE
APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE Tuyou Peng 1, Jun Peng 2 1 Electronics and information Technology Department Jiangmen Polytechnic, Jiangmen, Guangdong, China, [email protected]
Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1
Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?
Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles [email protected] hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
Optimizing a 3D-FWT code in a cluster of CPUs+GPUs
Optimizing a 3D-FWT code in a cluster of CPUs+GPUs Gregorio Bernabé Javier Cuenca Domingo Giménez Universidad de Murcia Scientific Computing and Parallel Programming Group XXIX Simposium Nacional de la
The Methodology of Application Development for Hybrid Architectures
Computer Technology and Application 4 (2013) 543-547 D DAVID PUBLISHING The Methodology of Application Development for Hybrid Architectures Vladimir Orekhov, Alexander Bogdanov and Vladimir Gaiduchok Department
Accelerate Cloud Computing with the Xilinx Zynq SoC
X C E L L E N C E I N N E W A P P L I C AT I O N S Accelerate Cloud Computing with the Xilinx Zynq SoC A novel reconfigurable hardware accelerator speeds the processing of applications based on the MapReduce
Fuzzy Logic -based Pre-processing for Fuzzy Association Rule Mining
Fuzzy Logic -based Pre-processing for Fuzzy Association Rule Mining by Ashish Mangalampalli, Vikram Pudi Report No: IIIT/TR/2008/127 Centre for Data Engineering International Institute of Information Technology
NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist
NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get
GeoImaging Accelerator Pansharp Test Results
GeoImaging Accelerator Pansharp Test Results Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance
Exploiting GPU Hardware Saturation for Fast Compiler Optimization
Exploiting GPU Hardware Saturation for Fast Compiler Optimization Alberto Magni School of Informatics University of Edinburgh United Kingdom [email protected] Christophe Dubach School of Informatics
Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute [email protected].
Medical Image Processing on the GPU Past, Present and Future Anders Eklund, PhD Virginia Tech Carilion Research Institute [email protected] Outline Motivation why do we need GPUs? Past - how was GPU programming
LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR
LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:
ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL
International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR
Course Development of Programming for General-Purpose Multicore Processors
Course Development of Programming for General-Purpose Multicore Processors Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University Richmond, VA 23284 [email protected]
An examination of the dual-core capability of the new HP xw4300 Workstation
An examination of the dual-core capability of the new HP xw4300 Workstation By employing single- and dual-core Intel Pentium processor technology, users have a choice of processing power options in a compact,
Get an Easy Performance Boost Even with Unthreaded Apps. with Intel Parallel Studio XE for Windows*
Get an Easy Performance Boost Even with Unthreaded Apps for Windows* Can recompiling just one file make a difference? Yes, in many cases it can! Often, you can achieve a major performance boost by recompiling
Data Mining for Data Cloud and Compute Cloud
Data Mining for Data Cloud and Compute Cloud Prof. Uzma Ali 1, Prof. Punam Khandar 2 Assistant Professor, Dept. Of Computer Application, SRCOEM, Nagpur, India 1 Assistant Professor, Dept. Of Computer Application,
DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE
DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE SK MD OBAIDULLAH Department of Computer Science & Engineering, Aliah University, Saltlake, Sector-V, Kol-900091, West Bengal, India [email protected]
Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data
Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:
Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration
Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration Jinglin Zhang, Jean François Nezan, Jean-Gabriel Cousin, Erwan Raffin To cite this version: Jinglin Zhang,
A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing
A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing Liang-Teh Lee, Kang-Yuan Liu, Hui-Yang Huang and Chia-Ying Tseng Department of Computer Science and Engineering,
New Matrix Approach to Improve Apriori Algorithm
New Matrix Approach to Improve Apriori Algorithm A. Rehab H. Alwa, B. Anasuya V Patil Associate Prof., IT Faculty, Majan College-University College Muscat, Oman, [email protected] Associate
Comparison of Data Mining Techniques for Money Laundering Detection System
Comparison of Data Mining Techniques for Money Laundering Detection System Rafał Dreżewski, Grzegorz Dziuban, Łukasz Hernik, Michał Pączek AGH University of Science and Technology, Department of Computer
FPGA-based Multithreading for In-Memory Hash Joins
FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded
ST810 Advanced Computing
ST810 Advanced Computing Lecture 17: Parallel computing part I Eric B. Laber Hua Zhou Department of Statistics North Carolina State University Mar 13, 2013 Outline computing Hardware computing overview
The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.
White Paper 021313-3 Page 1 : A Software Framework for Parallel Programming* The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud. ABSTRACT Programming for Multicore,
~ Greetings from WSU CAPPLab ~
~ Greetings from WSU CAPPLab ~ Multicore with SMT/GPGPU provides the ultimate performance; at WSU CAPPLab, we can help! Dr. Abu Asaduzzaman, Assistant Professor and Director Wichita State University (WSU)
GPU File System Encryption Kartik Kulkarni and Eugene Linkov
GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through
ArcGIS Pro: Virtualizing in Citrix XenApp and XenDesktop. Emily Apsey Performance Engineer
ArcGIS Pro: Virtualizing in Citrix XenApp and XenDesktop Emily Apsey Performance Engineer Presentation Overview What it takes to successfully virtualize ArcGIS Pro in Citrix XenApp and XenDesktop - Shareable
GPGPU Parallel Merge Sort Algorithm
GPGPU Parallel Merge Sort Algorithm Jim Kukunas and James Devine May 4, 2009 Abstract The increasingly high data throughput and computational power of today s Graphics Processing Units (GPUs), has led
Recent Advances in Periscope for Performance Analysis and Tuning
Recent Advances in Periscope for Performance Analysis and Tuning Isaias Compres, Michael Firbach, Michael Gerndt Robert Mijakovic, Yury Oleynik, Ventsislav Petkov Technische Universität München Yury Oleynik,
E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices
E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,
NVIDIA Tools For Profiling And Monitoring. David Goodwin
NVIDIA Tools For Profiling And Monitoring David Goodwin Outline CUDA Profiling and Monitoring Libraries Tools Technologies Directions CScADS Summer 2012 Workshop on Performance Tools for Extreme Scale
Keywords: Big Data, HDFS, Map Reduce, Hadoop
Volume 5, Issue 7, July 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Configuration Tuning
NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X
NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X DU-05348-001_v5.5 July 2013 Installation and Verification on Mac OS X TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. System Requirements... 1 1.2. About
GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics
GPU Architectures A CPU Perspective Derek Hower AMD Research 5/21/2013 Goals Data Parallelism: What is it, and how to exploit it? Workload characteristics Execution Models / GPU Architectures MIMD (SPMD),
Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server
Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server Technology brief Introduction... 2 GPU-based computing... 2 ProLiant SL390s GPU-enabled architecture... 2 Optimizing
Using In-Memory Computing to Simplify Big Data Analytics
SCALEOUT SOFTWARE Using In-Memory Computing to Simplify Big Data Analytics by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T he big data revolution is upon us, fed
The Design and Implement of Ultra-scale Data Parallel. In-situ Visualization System
The Design and Implement of Ultra-scale Data Parallel In-situ Visualization System Liu Ning [email protected] Gao Guoxian [email protected] Zhang Yingping [email protected] Zhu Dengming [email protected]
Chapter 1 Computer System Overview
Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides
GPU Computing - CUDA
GPU Computing - CUDA A short overview of hardware and programing model Pierre Kestener 1 1 CEA Saclay, DSM, Maison de la Simulation Saclay, June 12, 2012 Atelier AO and GPU 1 / 37 Content Historical perspective
Multi-core Programming System Overview
Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,
Introduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
Clustering Billions of Data Points Using GPUs
Clustering Billions of Data Points Using GPUs Ren Wu [email protected] Bin Zhang [email protected] Meichun Hsu [email protected] ABSTRACT In this paper, we report our research on using GPUs to accelerate
Local Alignment Tool Based on Hadoop Framework and GPU Architecture
Local Alignment Tool Based on Hadoop Framework and GPU Architecture Che-Lun Hung * Department of Computer Science and Communication Engineering Providence University Taichung, Taiwan [email protected] *
Intelligent Heuristic Construction with Active Learning
Intelligent Heuristic Construction with Active Learning William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather E H U N I V E R S I T Y T O H F G R E D I N B U Space is BIG! Hubble Ultra-Deep Field
Distributed Dynamic Load Balancing for Iterative-Stencil Applications
Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,
SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs
SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs Fabian Hueske, TU Berlin June 26, 21 1 Review This document is a review report on the paper Towards Proximity Pattern Mining in Large
A Pattern-Based Approach to. Automated Application Performance Analysis
A Pattern-Based Approach to Automated Application Performance Analysis Nikhil Bhatia, Shirley Moore, Felix Wolf, and Jack Dongarra Innovative Computing Laboratory University of Tennessee (bhatia, shirley,
Applying Parallel and Distributed Computing for Image Reconstruction in 3D Electrical Capacitance Tomography
AUTOMATYKA 2010 Tom 14 Zeszyt 3/2 Pawe³ Kapusta*, Micha³ Majchrowicz*, Robert Banasiak* Applying Parallel and Distributed Computing for Image Reconstruction in 3D Electrical Capacitance Tomography 1. Introduction
Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2
Advanced Engineering Forum Vols. 6-7 (2012) pp 82-87 Online: 2012-09-26 (2012) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/aef.6-7.82 Research on Clustering Analysis of Big Data
USING COMPLEX EVENT PROCESSING TO MANAGE PATTERNS IN DISTRIBUTION NETWORKS
USING COMPLEX EVENT PROCESSING TO MANAGE PATTERNS IN DISTRIBUTION NETWORKS Foued BAROUNI Eaton Canada [email protected] Bernard MOULIN Laval University Canada [email protected] ABSTRACT
A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains
A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains Dr. Kanak Saxena Professor & Head, Computer Application SATI, Vidisha, [email protected] D.S. Rajpoot Registrar,
OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA
OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization
Task Scheduling in Hadoop
Task Scheduling in Hadoop Sagar Mamdapure Munira Ginwala Neha Papat SAE,Kondhwa SAE,Kondhwa SAE,Kondhwa Abstract Hadoop is widely used for storing large datasets and processing them efficiently under distributed
Automatic CUDA Code Synthesis Framework for Multicore CPU and GPU architectures
Automatic CUDA Code Synthesis Framework for Multicore CPU and GPU architectures 1 Hanwoong Jung, and 2 Youngmin Yi, 1 Soonhoi Ha 1 School of EECS, Seoul National University, Seoul, Korea {jhw7884, sha}@iris.snu.ac.kr
MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM
MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM J. Arokia Renjit Asst. Professor/ CSE Department, Jeppiaar Engineering College, Chennai, TamilNadu,India 600119. Dr.K.L.Shunmuganathan
Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures
Chapter 18: Database System Architectures Centralized Systems! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types! Run on a single computer system and do
2020 Design Update 11.3. Release Notes November 10, 2015
2020 Design Update 11.3 Release Notes November 10, 2015 Contents Introduction... 1 System Requirements... 2 Actively Supported Operating Systems... 2 Hardware Requirements (Minimum)... 2 Hardware Requirements
Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries
Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Shin Morishima 1 and Hiroki Matsutani 1,2,3 1Keio University, 3 14 1 Hiyoshi, Kohoku ku, Yokohama, Japan 2National Institute
