A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment
|
|
|
- Clifford Allison
- 10 years ago
- Views:
Transcription
1 A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed Processing Laboratory Department of Applied Informatics, University of Macedonia 16 Egnatia str., P.O. Box 191, 4006 Thessaloniki, Greece Abstract. In this paper, we present three parallel approximate string matching methods on a parallel architecture with heterogeneous workstations to gain supercomputer power at low cost. The first method is the static master-worker with uniform distribution strategy, the second one is the dynamic master-worker with allocation of subtexts and the third one is the dynamic master-worker with allocation of text pointers. Further, we propose a hybrid parallel method that combines the advantages of static and dynamic parallel methods in order to reduce the load imbalance and communication overhead. This hybrid method is based on the following optimal distribution strategy: the text collection is distributed proportional to workstation s speed. We evaluated the performance of four methods with clusters 1, 2, 4, 6 and 8 heterogeneous workstations. The experimental results demonstrate that the dynamic allocation of text pointers and hybrid methods achieve better performance than the two original ones. 1 Introduction Approximate string matching is one of the main problems in classical string algorithms, with applications to information and multimedia retrieval, computational biology, pattern recognition, Web search engines and text mining. It is defined as follows: given a large text collection t = t 1 t 2...t n of length n, ashort pattern p = p 1 p 2...p m of length m and a maximal number of errors allowed k, we want to find all text positions where the pattern matches the text up to k errors. Errors can be substituting, deleting, or inserting a character. In the on-line version of the problem, it is possible to preprocess the pattern but not the text collection. The classical solution involves dynamic programming and needs O(mn) time[14]. Recently, a number of sequential algorithms improved the classical time consuming one; see for instance the surveys [7,11]. Some of them are sublinear in the sense that they do not inspect all the characters of the text collection. D. Kranzlmüller et al. (Eds.): Euro PVM/MPI 2002, LNCS 2474, pp , c Springer-Verlag Berlin Heidelberg 2002
2 A Performance Study of Load Balancing Strategies 433 We are particularly interested in information retrieval, where current free text collections is normally so very large that even the fastest on-line sequential algorithms are not practical, and therefore the parallel and distributed processing becomes necessary. There are two basic methods to improve the performance of approximate string matching on large text collections: one is based on the finegrain parallelization of the approximate string matching algorithm [2,12,13,6,4,] and the other is based on the distribution of the computation of character comparisons on supercomputers or network of workstations. As far as the second method, is concerned distributed implementations of approximate string matching algorithm are not available in the literature. However, we are aware of few attempts for implementing other similar problems on a cluster of workstations. In [3] a exact string matching implementation have been proposed and results are reported on a transputer based architecture. In [9,10] a exact string matching algorithm was parallelized and modeled on a homogeneous platform giving positive experimental results. Finally, in [,16] presented parallelizations of a biological sequence analysis algorithm on a homogenous cluster of workstations and on an Intel ipsc/860 parallel computer respectively. However, the general efficient algorithms for the master-worker paradigm on heterogeneous clusters have been widely developed in [1]. The main contribution of this work is three low-cost parallel approximate string matching approaches that can search in very large free textbases on inexpensive cluster of heterogeneous PCs or workstations running Linux operating system. These approaches are based on master-worker model using static and dynamic allocation of the text collection. Further, we propose a hybrid parallel approach that combines the advantages of three previous parallel approaches in order to reduce the load imbalance and communication overhead. This hybrid approach is based on the following optimal distribution strategy: the text collection is distributed proportional to workstation s speed. The four approaches are implemented using the MPI library [1] over a cluster of heterogeneous workstations. To the best of our knowledge, this is the first attempt the implementation of approximate string matching application using static and dynamic load balancing strategies on a network of heterogeneous workstations. 2 MPI Master-Worker Implementations of Approximate String Matching We follow master-worker programming model to develop our parallel and distributed approximate string matching implementations under MPI library [1]. 2.1 Static Master-Worker Implementation In order to present the static master-worker implementation we make the following assumptions: First, the workstations are numbered from 0 to p 1, second, the documents of our text collection are distributed among the various workstations and stored on their local disks and finally, the pattern and the number
3 434 Panagiotis D. Michailidis and Konstantinos G. Margaritis of errors k are stored in the main memory to all workstations. The partitioning strategy of this approach is to partition the entire text collection into a number of the subtext collections according to the number of workstations allocated. The size of each subtext collection should be equal to the size of the text collection divided by the number of allocated workstations. Therefore, the static master-worker implementation that is called P1 is composed of four phases. In first phase, the master broadcasts the pattern string and the number of errors k to all workers. In second phase, each worker reads its subtext collection from the local disk in the main memory. In third phase, each worker performs character comparisons using a local sequential approximate string matching algorithm to generate the number of occurrences. In fourth phase, the master collects the number of occurrences from each worker. The advantage of this simple approach is low communication overhead. This advantage was achieved, a priori, by the search computation, assigning each worker to search its own subtext independently without have to communicate with the other workers or the master. However, the main disadvantage is the possible load imbalance because of the poor partitioning technique. In other words, there is a significant idle time for faster or more lightly loaded workstations on a heterogeneous environment. 2.2 Dynamic Master-Worker Implementations In this subsection, we implement two versions of the dynamic master-worker model. The first version is based on the dynamic allocation of subtexts and the second one is based on the dynamic allocation of text pointers. Dynamic Allocation of Subtexts The dynamic master-worker strategy that we adopted is a known parallelization strategy and is known as workstation farm. Before, we present the dynamic implementation we make the following assumption: the entire text collection is stored on the local disk of the master workstation. The dynamic master-worker implementation that is called is composed of six phases. In first phase, the master broadcasts the pattern string and the number of errors k to all workers. In second phase, the master reads from the local disk the several chunks of the text collection. The size of each chunk (sb) is an important parameter which can be affect the overall performance. More specifically, this parameter is directly related to the I/O and communication factors. We selected several sizes of each chunk in order to find the best performance as we presented in our experiments [8]. In third phase, the master sends the first chunks of the text collection to corresponding worker workstations. In fourth phase, each worker workstation performs a sequential approximate string matching algorithm between the corresponding chunk of text and the pattern in order to generate the number of occurrences. In fifth phase, each worker sends the number of occurrences back to master workstation. In sixth phase, if there are still any chunks of the text collection left, the master reads and distributes next chunks of the text collection to workers and loops back to fourth phase.
4 A Performance Study of Load Balancing Strategies 43 The advantage of this dynamic approach is low load imbalance, while the disadvantage is higher inter-workstation communication overhead. Dynamic Allocation of Text Pointers Before, we present the dynamic implementation with the text pointers we make the following assumptions: First, the complete text collection is stored on the local disks of all workstations and second, the master workstation has a text pointer that shows the current position in the text collection. The dynamic allocation of text pointers that is called is composed of six phases. In first phase, the master broadcasts the pattern string and the number of errors k to all workers. In second phase, the master sends the first text pointers to corresponding workers. In third phase, each worker reads from the local disk the sb characters of text starting from the pointer that receives. In fourth phase, each worker performs a sequential approximate string matching procedure between the corresponding chunk of text and the pattern in order to generate the number of occurrences. In fifth phase, each worker sends the result back to master. In sixth phase, if the text pointer does not reach the end of the text, then master updates the text pointers for the next position of next chunks of text and sends the pointers to workers and loops back to third phase. The advantage of this simple implementation is that reduces the inter workstation communication overhead since each workstation in this scheme has an entire copy of the text collection on the local disk. However, this scheme requires more local space (or disk) requirements, but the size of the local disk in parallel and distributed architectures is large enough. 2.3 Hybrid Master-Worker Implementation Here, we develop a hybrid master-worker implementation that combines the advantages of static and dynamic approaches in order to reduce the load imbalance and communication overhead. This implementation is based on the optimal distribution strategy of the text collection that is performed statically. In the following subsection, we describe the optimal text distribution strategy and its implementation. Text Distribution and Load Balancing To avoid the slowest workstations to determine the parallel string matching time, the load should be distributed proportionally to the capacity of each workstation. The goal is to assign the same amount of time, which may not correspond to the same amount of the text collection. A balanced distribution is achieved by a static load distribution made prior to the execution of the parallel operation. To achieve a good balanced distribution among heterogeneous workstations, the amount of text distributed to each workstation should be proportional to its processing capacity compared to the entire network: S i l i = p 1 j=0 S (1) j
5 436 Panagiotis D. Michailidis and Konstantinos G. Margaritis where S j is the speed of the workstation j. Therefore, the amount of the text collection that is distributed to each workstation M i (1 i p) isl i n, wheren is the length of the complete text collection. The hybrid implementation that is called is same as the P1 implementation but we use the optimal distribution method instead of the uniform distribution one. The four entire parallel implementations are constructed so that alternative sequential approximate string matching algorithms can be substituted quite easily [7,11]. In this paper, we use the classical SEL dynamic programming algorithm [14]. 3 Experimental Results In this section, we discuss the experimental results for the performance of four parallel and distributed algorithms. These algorithms are implemented in C programming language using the MPI library [1]. 3.1 Experimental Environment The target platform for our experimental study is a cluster of heterogeneous workstations connected with 100 Mb/s Fast Ethernet network. More specifically, the cluster consists of 4 Pentium MMX 166 MHz with 32 MB RAM and 6 Pentium 100 MHz with 64 MB RAM. A Pentium MMX is used as master workstation. The average speeds of the two types of workstations, Pentium MMX and Pentium, for the four implementations are listed in Table 1. TheMPIimplementation used on the network is MPICH version 1.2. During all experiments, the cluster of workstations was dedicated. Finally, to get reliable performance results 10 executions occurred for each experiment and the reported values are the average ones. The text collection we used was composed of documents, which were portion of the various web pages. 3.2 Experimental Results In this subsection, we present the experimental results concluding from two sets of experiments. For the first experimental setup, we study the performance of four master-worker implementations P1,, and. For the second experimental setup, we examine the scalability issue of our implementations by doubling the text collection. Table 1. Average speeds (in chars per sec) of the two types of workstations Application Pentium MMX Pentium P1,
6 A Performance Study of Load Balancing Strategies 437 Comparing the Four Types of Approximate String Matching Implementations Before we present the results for four methods, we determined from the extensive experimental study [8] that the block size nearly sb=100,000 characters produces optimal performance for two dynamic master-worker methods and, later experiments are all performed using this optimal value for the and. Further, from [8] we observed that the worst performance is obtained for very small and large values of block size. This is because small values of block size increase the inter-workstation communication, while large values of block size produce poorly balanced load. Figures 1 and 2 show the execution times and the speedup factors with respect to the number of workstations respectively. It is important to note that the execution times and the speedups, which are plotted in Figures 1 and 2 are result of average for five pattern lengths (m=, 10, 20, 30 and 60) and four values of the number of errors (k=1, 3, 6 and 9). The speedup of a heterogeneous computation is defined as the ratio of the sequential execution time on the fastest workstation to the parallel execution time across the heterogeneous cluster. To have a fair comparison in terms of speedup, one defines the system computing power, which considers the power available instead of the number of workstations. The system computing power defines as follows: p 1 i=0 S i/s 0 for p workstations used, where S 0 is the speed of the master workstation. As we have expected, performance results show that the P1 implementation using static load balancing strategy is less effective than the other three implementations in case of heterogeneous network. This fact due to the presence of waiting time associated to communications. In other words, the slowest workstation is always the latest one in string matching computation. Further, the implementation using dynamic allocation of subtexts produces better results than the P1 one in case of heterogeneous cluster. Finally, the experimental results show that the and implementations seem to have the best performance compared with the others in case of heterogeneous cluster. These implementations give smaller execution times and higher speedups than in case of using SEL search algorithm, n=13mb and k=3 SEL search algorithm, n=13mb and m= P P Time (in seconds) 60 0 Time (in seconds) Fig. 1. Experimental execution times (in seconds) for text size of 13MB and k=3 using several pattern lengths (left) and m=10 using several values of k (right)
7 438 Panagiotis D. Michailidis and Konstantinos G. Margaritis SEL search algorithm, n=13mb and k=3 SEL search algorithm, n=13mb and m=10 6. P1 6. P Speedup 3. Speedup Fig. 2. Speedup of parallel approximate string matching with respect to the number of workstations for text size of 13MB and k=3 using several pattern lengths (left) and m=10 using several values of k (right) the P1 and ones when the network becomes heterogeneous, i.e. after the 3rd workstation. We now examine the performance of the, and parallel implementations. From the results, we see a clear reduction in the computation time of the algorithm when we use the three parallel implementations. For instance, with k=3 and several pattern lengths, we reduce the average computation time from 9.08 seconds in the sequential version to , and seconds in the distributed implementations, and respectively using 8 workstations. In other words, from the Figure 1 we observe that for constant total text size there is an expected inverse relation between the parallel execution times and the number of workstations. Further, the three master-worker implementations achieve reasonable speedups for all workstations. For example, with k=3 and several pattern lengths, we had an increasing speedup curves up to about.2,.86 and.91 in distributed methods, and respectively on the 8 workstations which had the computing power of.92,.93 and.97. Scalability Issue To study the scalability of three proposed parallel implementations, and, we setup the experiments in the following way. We simple double the old text size two times. This new text collection is around 27MB. Results from these experiments have been depicted in Figures 3 and 4. Theresults show that the three parallel implementations still scales well though the problem size has been increased two times (i.e. doubling the text collection). The average execution times for k=3 and several pattern lengths similarly decrease to , and seconds for the, and implementations respectively when the number of workstations have been added to 8. Moreover, speedup factors of three methods also linearly increase when the workstations are increased. Finally, the best performance results are obtained with the and load balancing methods.
8 A Performance Study of Load Balancing Strategies 439 SEL search algorithm, n=27mb and k=3 SEL search algorithm, n=27mb and m= Time (in seconds) Time (in seconds) Fig. 3. Experimental execution times (in seconds) for text size of 27MB and k=3 using several pattern lengths (left) and m=10 using several values of k (right) SEL search algorithm, n=27mb and k=3 SEL search algorithm, n=27mb and m= Speedup 3. Speedup Fig. 4. Speedup of parallel approximate string matching with respect to the number of workstations for text size of 27MB and k=3 using several pattern lengths (left) and m=10 using several values of k (right) 4 Conclusions In this paper, we have presented four parallel and distributed approximate string matching implementations and the performance results on a low-cost cluster of heterogeneous workstations. We have observed from this extensive study that the and implementations produce better performance results in terms execution times and speedups than the others. Higher gains in performance are expected for a larger number of varying speed workstations in the network. Variants of the approximate string matching algorithm can directly be implemented on a cluster of heterogeneous workstations using the four text distribution methods reported here. We plan to develop a theoretical performance model in order to confirm the experimental behaviour of four implementations on a heterogeneous cluster. Further, this model can be used to predict the execution time and similar performance metrics for the four approximate string matching implementations on larger clusters and problem sizes.
9 440 Panagiotis D. Michailidis and Konstantinos G. Margaritis References 1. O. Beaumont, A. Legrand and Y. Robert, The master-slave paradigm with heterogeneous processors, Report LIP RR , H. D. Cheng and K. S. Fu, VLSI architectures for string matching and pattern matching, Pattern Recognition, vol. 20, no. 1, pp , J. Cringean, R. England, G. Manson and P. Willett, Network design for the implementation of text searching using a multicomputer, Information Processing and Management, vol. 27, no. 4, pp , D. Lavenier, Speeding up genome computations with a systolic accelerator, SIAM News, vol. 31, no. 8, pp. 6-7, D. Lavenier and J. L. Pacherie, Parallel processing for scanning genomic databases, in Proc. PARCO 97, pp , K. G. Margaritis and D. J. Evans, A VLSI processor array for flexible string matching, Parallel Algorithms and Applications, vol. 11, no. 1-2, pp. 4-60, P. D. Michailidis and K. G. Margaritis, On-line approximate string searching algorithms: Survey and experimental results, International Journal of Computer Mathematics, vol. 79, no. 8, pp , , P. D. Michailidis and K. G. Margaritis, Performance evaluation of load balancing strategies for approximate string matching on a cluster of heterogeneous workstations, Tech. Report, Dept. of Applied Informatics, University of Macedonia, , P. D. Michailidis and K. G. Margaritis, String matching problem on a cluster of personal computers: Experimental results, in Proc. of the 1th International Conference Systems for Automation of Engineering and Research, pp. 71-7, P. D. Michailidis and K. G. Margaritis, String matching problem on a cluster of personal computers: Performance modeling, in Proc. of the 1th International Conference Systems for Automation of Engineering and Research, pp , G. Navarro, A guided tour to approximate string matching, ACM Computer Surveys, vol. 33, no. 1, pp , , N. Ranganathan and R. Sastry, VLSI architectures for pattern matching, International Journal of Pattern Recognition and Artificial Intelligence, vol.8,no.4, pp , R. Sastry, N. Ranganathan and K. Remedios, CASM: a VLSI chip for approximate string matching, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17, no. 8, pp , P. H. Sellers, The theory and computations of evolutionaly distances: pattern recognition, Journal of Algorithms, vol. 1, pp , , M. Snir, S. Otto, S. Huss-Lederman, D. W. Walker and J. Dongarra, MPI: The complete reference, The MIT Press, Cambridge, Massachusetts, , T. K. Yap, O. Frieder and R. L. Martino, Parallel computation in biological sequence analysis, IEEE Transactions on Parallel and Distributed Systems, vol. 9, no. 3, pp ,
Load Balancing on a Non-dedicated Heterogeneous Network of Workstations
Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Dr. Maurice Eggen Nathan Franklin Department of Computer Science Trinity University San Antonio, Texas 78212 Dr. Roger Eggen Department
Building an Inexpensive Parallel Computer
Res. Lett. Inf. Math. Sci., (2000) 1, 113-118 Available online at http://www.massey.ac.nz/~wwiims/rlims/ Building an Inexpensive Parallel Computer Lutz Grosz and Andre Barczak I.I.M.S., Massey University
Parallel Computing of Kernel Density Estimates with MPI
Parallel Computing of Kernel Density Estimates with MPI Szymon Lukasik Department of Automatic Control, Cracow University of Technology, ul. Warszawska 24, 31-155 Cracow, Poland [email protected]
Parallel Processing over Mobile Ad Hoc Networks of Handheld Machines
Parallel Processing over Mobile Ad Hoc Networks of Handheld Machines Michael J Jipping Department of Computer Science Hope College Holland, MI 49423 [email protected] Gary Lewandowski Department of Mathematics
MOSIX: High performance Linux farm
MOSIX: High performance Linux farm Paolo Mastroserio [[email protected]] Francesco Maria Taurino [[email protected]] Gennaro Tortone [[email protected]] Napoli Index overview on Linux farm farm
A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems*
A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems* Junho Jang, Saeyoung Han, Sungyong Park, and Jihoon Yang Department of Computer Science and Interdisciplinary Program
A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture
A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture Yangsuk Kee Department of Computer Engineering Seoul National University Seoul, 151-742, Korea Soonhoi
LOAD BALANCING FOR MULTIPLE PARALLEL JOBS
European Congress on Computational Methods in Applied Sciences and Engineering ECCOMAS 2000 Barcelona, 11-14 September 2000 ECCOMAS LOAD BALANCING FOR MULTIPLE PARALLEL JOBS A. Ecer, Y. P. Chien, H.U Akay
Overlapping Data Transfer With Application Execution on Clusters
Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm [email protected] [email protected] Department of Computer Science Department of Electrical and Computer
A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster
Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
Multilevel Load Balancing in NUMA Computers
FACULDADE DE INFORMÁTICA PUCRS - Brazil http://www.pucrs.br/inf/pos/ Multilevel Load Balancing in NUMA Computers M. Corrêa, R. Chanin, A. Sales, R. Scheer, A. Zorzo Technical Report Series Number 049 July,
Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach
Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach S. M. Ashraful Kadir 1 and Tazrian Khan 2 1 Scientific Computing, Royal Institute of Technology (KTH), Stockholm, Sweden [email protected],
Improved Single and Multiple Approximate String Matching
Improved Single and Multiple Approximate String Matching Kimmo Fredriksson Department of Computer Science, University of Joensuu, Finland Gonzalo Navarro Department of Computer Science, University of Chile
Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.
Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated
Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing
Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing James D. Jackson Philip J. Hatcher Department of Computer Science Kingsbury Hall University of New Hampshire Durham,
A STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti 2, Nidhi Rajak 3
A STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti, Nidhi Rajak 1 Department of Computer Science & Applications, Dr.H.S.Gour Central University, Sagar, India, [email protected]
Scalability and Classifications
Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static
Operating System Multilevel Load Balancing
Operating System Multilevel Load Balancing M. Corrêa, A. Zorzo Faculty of Informatics - PUCRS Porto Alegre, Brazil {mcorrea, zorzo}@inf.pucrs.br R. Scheer HP Brazil R&D Porto Alegre, Brazil [email protected]
A Review of Customized Dynamic Load Balancing for a Network of Workstations
A Review of Customized Dynamic Load Balancing for a Network of Workstations Taken from work done by: Mohammed Javeed Zaki, Wei Li, Srinivasan Parthasarathy Computer Science Department, University of Rochester
How To Balance In Cloud Computing
A Review on Load Balancing Algorithms in Cloud Hareesh M J Dept. of CSE, RSET, Kochi hareeshmjoseph@ gmail.com John P Martin Dept. of CSE, RSET, Kochi [email protected] Yedhu Sastri Dept. of IT, RSET,
Group Based Load Balancing Algorithm in Cloud Computing Virtualization
Group Based Load Balancing Algorithm in Cloud Computing Virtualization Rishi Bhardwaj, 2 Sangeeta Mittal, Student, 2 Assistant Professor, Department of Computer Science, Jaypee Institute of Information
How To Improve Performance On A Single Chip Computer
: Redundant Arrays of Inexpensive Disks this discussion is based on the paper:» A Case for Redundant Arrays of Inexpensive Disks (),» David A Patterson, Garth Gibson, and Randy H Katz,» In Proceedings
FPGA area allocation for parallel C applications
1 FPGA area allocation for parallel C applications Vlad-Mihai Sima, Elena Moscu Panainte, Koen Bertels Computer Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University
Distributed Dynamic Load Balancing for Iterative-Stencil Applications
Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,
Cellular Computing on a Linux Cluster
Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results
Load Balancing MPI Algorithm for High Throughput Applications
Load Balancing MPI Algorithm for High Throughput Applications Igor Grudenić, Stjepan Groš, Nikola Bogunović Faculty of Electrical Engineering and, University of Zagreb Unska 3, 10000 Zagreb, Croatia {igor.grudenic,
Source Code Transformations Strategies to Load-balance Grid Applications
Source Code Transformations Strategies to Load-balance Grid Applications Romaric David, Stéphane Genaud, Arnaud Giersch, Benjamin Schwarz, and Éric Violard LSIIT-ICPS, Université Louis Pasteur, Bd S. Brant,
A Flexible Cluster Infrastructure for Systems Research and Software Development
Award Number: CNS-551555 Title: CRI: Acquisition of an InfiniBand Cluster with SMP Nodes Institution: Florida State University PIs: Xin Yuan, Robert van Engelen, Kartik Gopalan A Flexible Cluster Infrastructure
Parallel Scalable Algorithms- Performance Parameters
www.bsc.es Parallel Scalable Algorithms- Performance Parameters Vassil Alexandrov, ICREA - Barcelona Supercomputing Center, Spain Overview Sources of Overhead in Parallel Programs Performance Metrics for
Figure 1. The cloud scales: Amazon EC2 growth [2].
- Chung-Cheng Li and Kuochen Wang Department of Computer Science National Chiao Tung University Hsinchu, Taiwan 300 [email protected], [email protected] Abstract One of the most important issues
A Comparison of General Approaches to Multiprocessor Scheduling
A Comparison of General Approaches to Multiprocessor Scheduling Jing-Chiou Liou AT&T Laboratories Middletown, NJ 0778, USA [email protected] Michael A. Palis Department of Computer Science Rutgers University
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada [email protected] Micaela Serra
Optimization of Cluster Web Server Scheduling from Site Access Statistics
Optimization of Cluster Web Server Scheduling from Site Access Statistics Nartpong Ampornaramveth, Surasak Sanguanpong Faculty of Computer Engineering, Kasetsart University, Bangkhen Bangkok, Thailand
The Efficiency Analysis of the Object Oriented Realization of the Client-Server Systems Based on the CORBA Standard 1
S C H E D A E I N F O R M A T I C A E VOLUME 20 2011 The Efficiency Analysis of the Object Oriented Realization of the Client-Server Systems Based on the CORBA Standard 1 Zdzis law Onderka AGH University
Performance Metrics and Scalability Analysis. Performance Metrics and Scalability Analysis
Performance Metrics and Scalability Analysis 1 Performance Metrics and Scalability Analysis Lecture Outline Following Topics will be discussed Requirements in performance and cost Performance metrics Work
A Load Balancing Technique for Some Coarse-Grained Multicomputer Algorithms
A Load Balancing Technique for Some Coarse-Grained Multicomputer Algorithms Thierry Garcia and David Semé LaRIA Université de Picardie Jules Verne, CURI, 5, rue du Moulin Neuf 80000 Amiens, France, E-mail:
Performance Modeling and Analysis of a Database Server with Write-Heavy Workload
Performance Modeling and Analysis of a Database Server with Write-Heavy Workload Manfred Dellkrantz, Maria Kihl 2, and Anders Robertsson Department of Automatic Control, Lund University 2 Department of
Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer
Res. Lett. Inf. Math. Sci., 2003, Vol.5, pp 1-10 Available online at http://iims.massey.ac.nz/research/letters/ 1 Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer
RevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
Locality-Preserving Dynamic Load Balancing for Data-Parallel Applications on Distributed-Memory Multiprocessors
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 18, 1037-1048 (2002) Short Paper Locality-Preserving Dynamic Load Balancing for Data-Parallel Applications on Distributed-Memory Multiprocessors PANGFENG
Dynamic load balancing of parallel cellular automata
Dynamic load balancing of parallel cellular automata Marc Mazzariol, Benoit A. Gennart, Roger D. Hersch Ecole Polytechnique Fédérale de Lausanne, EPFL * ABSTRACT We are interested in running in parallel
Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing
www.ijcsi.org 227 Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing Dhuha Basheer Abdullah 1, Zeena Abdulgafar Thanoon 2, 1 Computer Science Department, Mosul University,
Introduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
Advances in Smart Systems Research : ISSN 2050-8662 : http://nimbusvault.net/publications/koala/assr/ Vol. 3. No. 3 : pp.
Advances in Smart Systems Research : ISSN 2050-8662 : http://nimbusvault.net/publications/koala/assr/ Vol. 3. No. 3 : pp.49-54 : isrp13-005 Optimized Communications on Cloud Computer Processor by Using
Load Balancing on a Grid Using Data Characteristics
Load Balancing on a Grid Using Data Characteristics Jonathan White and Dale R. Thompson Computer Science and Computer Engineering Department University of Arkansas Fayetteville, AR 72701, USA {jlw09, drt}@uark.edu
Energy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
Benchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
How To Compare Load Sharing And Job Scheduling In A Network Of Workstations
A COMPARISON OF LOAD SHARING AND JOB SCHEDULING IN A NETWORK OF WORKSTATIONS HELEN D. KARATZA Department of Informatics Aristotle University of Thessaloniki 546 Thessaloniki, GREECE Email: [email protected]
Scalable Parallel Clustering for Data Mining on Multicomputers
Scalable Parallel Clustering for Data Mining on Multicomputers D. Foti, D. Lipari, C. Pizzuti and D. Talia ISI-CNR c/o DEIS, UNICAL 87036 Rende (CS), Italy {pizzuti,talia}@si.deis.unical.it Abstract. This
Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age.
Volume 3, Issue 10, October 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Load Measurement
Control 2004, University of Bath, UK, September 2004
Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of
A Performance Comparison of Five Algorithms for Graph Isomorphism
A Performance Comparison of Five Algorithms for Graph Isomorphism P. Foggia, C.Sansone, M. Vento Dipartimento di Informatica e Sistemistica Via Claudio, 21 - I 80125 - Napoli, Italy {foggiapa, carlosan,
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.
Resource Allocation Schemes for Gang Scheduling
Resource Allocation Schemes for Gang Scheduling B. B. Zhou School of Computing and Mathematics Deakin University Geelong, VIC 327, Australia D. Walsh R. P. Brent Department of Computer Science Australian
Survey on Load Rebalancing for Distributed File System in Cloud
Survey on Load Rebalancing for Distributed File System in Cloud Prof. Pranalini S. Ketkar Ankita Bhimrao Patkure IT Department, DCOER, PG Scholar, Computer Department DCOER, Pune University Pune university
A Statistically Customisable Web Benchmarking Tool
Electronic Notes in Theoretical Computer Science 232 (29) 89 99 www.elsevier.com/locate/entcs A Statistically Customisable Web Benchmarking Tool Katja Gilly a,, Carlos Quesada-Granja a,2, Salvador Alcaraz
Measuring MPI Send and Receive Overhead and Application Availability in High Performance Network Interfaces
Measuring MPI Send and Receive Overhead and Application Availability in High Performance Network Interfaces Douglas Doerfler and Ron Brightwell Center for Computation, Computers, Information and Math Sandia
David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems
David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems About me David Rioja Redondo Telecommunication Engineer - Universidad de Alcalá >2 years building and managing clusters UPM
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
Evolutionary Prefetching and Caching in an Independent Storage Units Model
Evolutionary Prefetching and Caching in an Independent Units Model Athena Vakali Department of Informatics Aristotle University of Thessaloniki, Greece E-mail: avakali@csdauthgr Abstract Modern applications
Parallel Computing. Benson Muite. [email protected] http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage
Parallel Computing Benson Muite [email protected] http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework
A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique
A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique Jyoti Malhotra 1,Priya Ghyare 2 Associate Professor, Dept. of Information Technology, MIT College of
Understanding Data Locality in VMware Virtual SAN
Understanding Data Locality in VMware Virtual SAN July 2014 Edition T E C H N I C A L M A R K E T I N G D O C U M E N T A T I O N Table of Contents Introduction... 2 Virtual SAN Design Goals... 3 Data
PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN
1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction
Email Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
V:Drive - Costs and Benefits of an Out-of-Band Storage Virtualization System
V:Drive - Costs and Benefits of an Out-of-Band Storage Virtualization System André Brinkmann, Michael Heidebuer, Friedhelm Meyer auf der Heide, Ulrich Rückert, Kay Salzwedel, and Mario Vodisek Paderborn
Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
/35 Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of
A Fast Pattern Matching Algorithm with Two Sliding Windows (TSW)
Journal of Computer Science 4 (5): 393-401, 2008 ISSN 1549-3636 2008 Science Publications A Fast Pattern Matching Algorithm with Two Sliding Windows (TSW) Amjad Hudaib, Rola Al-Khalid, Dima Suleiman, Mariam
Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications
Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Ahmed Abdulhakim Al-Absi, Dae-Ki Kang and Myong-Jong Kim Abstract In Hadoop MapReduce distributed file system, as the input
Mining Association Rules on Grid Platforms
UNIVERSITY OF TUNIS EL MANAR FACULTY OF SCIENCES OF TUNISIA Mining Association Rules on Grid Platforms Raja Tlili [email protected] Yahya Slimani [email protected] CoreGrid 11 Plan Introduction
Mining Interesting Medical Knowledge from Big Data
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 1, Ver. II (Jan Feb. 2016), PP 06-10 www.iosrjournals.org Mining Interesting Medical Knowledge from
A Comparative Performance Analysis of Load Balancing Algorithms in Distributed System using Qualitative Parameters
A Comparative Performance Analysis of Load Balancing Algorithms in Distributed System using Qualitative Parameters Abhijit A. Rajguru, S.S. Apte Abstract - A distributed system can be viewed as a collection
Hadoop Scheduler w i t h Deadline Constraint
Hadoop Scheduler w i t h Deadline Constraint Geetha J 1, N UdayBhaskar 2, P ChennaReddy 3,Neha Sniha 4 1,4 Department of Computer Science and Engineering, M S Ramaiah Institute of Technology, Bangalore,
Chapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup
Chapter 12: Multiprocessor Architectures Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup Objective Be familiar with basic multiprocessor architectures and be able to
HPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
Towards a Load Balancing in a Three-level Cloud Computing Network
Towards a Load Balancing in a Three-level Cloud Computing Network Shu-Ching Wang, Kuo-Qin Yan * (Corresponding author), Wen-Pin Liao and Shun-Sheng Wang Chaoyang University of Technology Taiwan, R.O.C.
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC
Network Attached Storage. Jinfeng Yang Oct/19/2015
Network Attached Storage Jinfeng Yang Oct/19/2015 Outline Part A 1. What is the Network Attached Storage (NAS)? 2. What are the applications of NAS? 3. The benefits of NAS. 4. NAS s performance (Reliability
Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices
Proc. of Int. Conf. on Advances in Computer Science, AETACS Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices Ms.Archana G.Narawade a, Mrs.Vaishali Kolhe b a PG student, D.Y.Patil
