DYNAMIC LOAD BALANCING APPLICATIONS ON A HETEROGENEOUS UNIX/NT CLUSTER

Size: px
Start display at page:

Download "DYNAMIC LOAD BALANCING APPLICATIONS ON A HETEROGENEOUS UNIX/NT CLUSTER"

Transcription

1 European Congress on Computational Methods in Applied Sciences and Engineering ECCOMAS 2000 Barcelona, September 2000 ECCOMAS DYNAMIC LOAD BALANCING APPLICATIONS ON A HETEROGENEOUS UNIX/NT CLUSTER H.U. Akay, A. Ecer, E. Yilmaz, and L.P. Loo Computational Fluid Dynamics Laboratory Purdue School of Engineering and Technology, IUPUI Indianapolis, Indiana USA Web Page: Key words: Parallel Computing, Dynamic Load Balancing, Heterogeneous Cluster, CFD. Abstract. This study involves parallel computing and dynamic load-balancing applications on heterogeneous computer systems using our previously developed tools. A large-scale turbomachinery CFD code is chosen as the application code implemented on a cluster of Unix/IBM RS6000 and Windows NT/Pentium II workstations. The objective here is to be able to run large-scale codes routinely for solving problems on heterogeneous network of computers. In order to evaluate the performance of each workstation in heterogeneous environments, the measured computation and communication times are analyzed. The speedup and efficiency of each workstation during the parallel computations are also compared. For dynamic load balancing, two test cases were run on the combined Unix and NT cluster. The first case involves a curved duct flow with three different initial load distributions, which are: i) random loading, ii) heavy loading on Unix machines, and iii) equal loading on all machines. The second test case is rotor-stator problem. In this case, it is assumed that the system is initially loaded heavily on Unix machines. The effects of load balancing on the total elapsed time for achieving balanced loads on heterogeneous environment are demonstrated. 1

2 1. INTRODUCTION Parallel computing on cluster of workstations has become popular during the last decade, since it provides a cost-effective alternative to using expensive supercomputers for scientific computations. As performance of the networks between computers are improved, computing on multiple clusters is becoming feasible for large-scale computations. In such applications, computing resources may have different operating systems as well as different hardware configurations. Since these are typically multi-user environments and each cluster may have different performance, the conditions for load balancing become very complex and difficult to determine a-priori. Hence, the measurements of computer loads, communication speeds, and computation performance of computers are essential to decide for the loads on each compute nodes. For such applications, the use of dynamic load balancing techniques gain more importance. Movement of parallel tasks from one computer to other periodically under changing system conditions may result with substantial savings in total elapsed times. Domain decomposition-based parallel computing and dynamic load balancing algorithms developed by our group at IUPUI were previously limited to applications on Unix workstations [1,2]. More recently, we have extended such applications to PCs with NT operating systems [3]. The tools we have developed are also applicable for parallel computing of CFD problems in heterogeneous environments comprised of computers and operating systems of different types, speeds, and memory. In our approach, a given computational grid is subdivided into several solution blocks typically a multiple of the number of the available compute nodes [1]. The dynamic load balancer program (DLB) continuously measures the performance of a given system for computation and communication costs of each compute node [2]. Using an optimization algorithm, it then decides to redistribute the loads at some predetermined periods based on the measured performance during executions. In this paper, we present the recent applications of the tools we have developed on a turbomachinery flow code called ADPAC [4]. This code was originally developed by NASA and Rolls-Royce Allison. A parallel version on Unix was developed by our group in 1994 [5]. An NT version of the same parallel code was implemented recently [6]. Here, applications of this parallel code on a heterogeneous cluster consisting of Unix and NT operating systems is highlighted. The use of dynamic load balancing for cost-effective computing on heterogeneous environments is demonstrated. Dynamic load balancing examples involving initial load-distributions on heterogeneous compute-nodes are considered for a curved duct and a rotor-stator test cases. 2. GOVERNING FLOW EQUATIONS The numerical solution for ADPAC uses the conservation law form of the Navier-Stokes equations. For a rotating finite control volume, the inviscid form of the equations, i.e., Euler equations, are expressed as: 2

3 ( Q ) dv + Linv( Q) = K dv t (1) where: L inv ( Q) [ F da + G da + ( H rω Q) da = da inv z inv r inv Q is the vector of conservation variables in the form: Q = [ ρ ρν ρν ρν ρe ] T, K 0 0 ( ρν 2 + ) z r θ t [ p p 0 0] T = θ θ (2) (3) where the total internal energy, ρe t, for a perfect gas is defined as: p ρ et = + ρ( ν z + ν r + ( γ 1) 2 2 ν θ ) (4) γ is the specific heat ratio and ρ, p, ν z, ν r, νθ denote the density, pressure, axial, radial, and circumferential velocity components, respectively, relative to the coordinate system used in ADPAC. F inv, G inv, and H inv are the inviscid flux vectors. 3. PARALLEL COMPUTING ENVIRONMENT The test bed used for our applications is a cluster which consists of six UNIX-based workstations and 16 Windows/NT-based Pentium PCs. The Unix-based processors are IBM RS/6000 Model 43P-260 (RS6K) workstations with 512 MB memory each. The PCs consist of Intel 400 MHz Pentium II processors, each with 256 MB of memory size. Both systems are connected with a 100 Mb switch as the hub of the communication network. PVM library, version 3.4, is used for the inter-processors communication as needed. For all real operations, double precision floating-point arithmetic is used. Shown in Figure 1 is the layout of the computing environment used throughout this study. 4. PARALLEL PERFORMANCE EVALUATION The details of our parallel computing and dynamic load balancing tools may be found in our earlier works [1-4]. In this paper, we will report on the applications of these tools in a heterogeneous environment. For timing evaluations, two test cases were considered. These test cases correspond to different grid sizes for the similar flow conditions. The timing results obtained for these cases were based on the steady-state solution for the Euler/inviscid version of the ADPAC code. For both cases, the total elapsed time for one processor was estimated based on the average elapsed time per iteration per node obtained for two of the processors. In obtaining the timing results, each case was run on both NT and NT/Unix heterogeneous environment 3

4 by varying the number of machines or processors from 2 to 12 (half from each system in the heterogeneous case). The timing results for RS6K are done up to six machines because there are only six RS6K machines available in the cluster. Hence, in some cases, the performances of each computing environment is evaluated based on two, four, and six machines only. The timing records include total elapsed time and CPU time. A series of elapsed times for various computing environments are obtained statistically. Based on these timing results, the evaluation is assessed from three aspects: relative computational speed, parallel speedup, and parallel efficiency. The fully-optimized Fortran compiler options were used for both Unix and NT machines. Figure 1: Layout of the Computing Environment at the IUPUI s CFD Laboratory. 4.1 Test Case 1: interface size / block size = 1.5% This test case is a simple duct problem. Grid structure and geometry of this case is given in Figure 2. It has a mesh size of 765x25x25 grid points in x, y, and z directions, respectively. Shown in Figure 3 are the total elapsed times of two, four, and six-block solutions of the problem on Unix, NT and heterogeneous systems. Also shown in the same figure are the interface solver times, which are due to information exchange between the blocks. It is observed that the computational speed of RS6K is higher than NT by a factor of 1.85 to 1.45 from 2 to 6 machines, respectively. For the same number of machines used, the elapsed time for heterogeneous environment falls between NT and RS6K. 4

5 The corresponding speedup and efficiency curves versus the number of machines are plotted in Figure 4. It is apparent that the parallel efficiency for RS6K drops faster when the number of machines increases. On the other hand, the efficiencies for NT and heterogeneous workstation are higher. This is because the ratio of communication to computation is higher in RS6Ks, which are the faster machines in this case. The efficiency for RS6K and heterogeneous computers drops below 80% when 12 processors are used. This happens at the 6-processor case for RS6K. This is because the computational grids in each machine are relatively small and the time spent for message passing is becoming more dominant in the faster machines RS6Ks in this case. These phenomena imply that as the computing power of computers are increased, the benefit of using more than a certain number of processors for a certain number of data blocks would be diminished. Figure 2: Curved duct mesh. Figure 3: Elapsed Time Comparison for Network of Workstations (HTR = heterogeneous system). 5

6 As mentioned in the foregoing, the poor speedup and efficiency could be attributed to the increase of communication time. Although, the increase in communication time is not significant for these cases, which is normally the case for steady flow, any decrease in computation time was quickly offset by the increase of communication time. Generally, as the number of machines is increased, communication within processors is increased. In this case, proximity of machines could be the major contribution for the increase of communication time. For example, the relatively low speedup and efficiency in heterogeneous workstation are the consequences of higher communication due to machine locations. Figure 4: Efficiency and Speedup Comparison for Network of Workstations (HTR = heterogeneous system). 4.2 Test Case 2: interface size / block size = 10% In this case, the interface size was increased to about 10% of the block size by taking different mesh size for the same geometry given in Figure 2. It has a mesh size of 67x67x105 grid points in x, y, and z directions, respectively. This was done to investigate the effect of interface size on the computing environment performances. Figure 5 shows the elapsed times. The corresponding speedup and efficiency curves for each case are depicted in Figure 6. It is observed that the slower machines substantially affect the total elapsed time for the heterogeneous computing environment. Again, the RS6K machines were found to be the one with the fastest computational speed. An interesting feature observed here is that the parallel speedup for NT is outperformed those in RS6K and heterogeneous environment. For instance, for the 6-processor case, the parallel speedup for RS6K is approximately 30% lower than the ideal speedup while the NT and heterogeneous speedup are only 8% to 14% lower. In general, the relation between 6

7 speedup and number of processors/machines is of linear trend. For most of the cases, utilizing the heterogeneous computing environment has improved the performance of NT and RS6K in terms of computational speed and parallel speedup. Figure 5: Elapsed Time Comparison for Network of Workstations (HTR = heterogeneous system). Figure 6: Efficiency and Speedup Comparison for Network of Workstations (HTR = heterogeneous system). 7

8 As readily observed, the RS6K processors are faster, while the parallel efficiency for NT is higher than RS6K. The efficiency for RS6K is low when six processors were used for parallel computing. The low efficiency in RS6K could be attributed to the relatively high communication time compared to the block solver time. Nevertheless, a higher efficiency has been achieved when RS6K machines are combined with NT machines for heterogeneous computations. 5. DYNAMIC LOAD BALANCING CASE STUDIES 5.1 Curved Duct Flow Mesh of this problem is same as given in Figure 2. It has 765x25x25 grid points. A total of 60 blocks were generated by dividing the mesh in x direction only. Three different numerical experiments were performed to see the effect of initial load distribution on the same computers with UNIX (IBM RS/6K) and NT (Pentium PC) operating systems. In the first run, the loads are distributed randomly. In the second case, UNIX side has five times more loads than NT sides. In this case, all compute nodes has equal numbers of block among their cluster. In the third case, the loads are distributed equally on all computers. For the first run, initial random-loading is shown in Figure 7. Figure 8 gives the dynamic load balancing results obtained by using the Greedy algorithm [4,5]. An extraneous load exists in one of the UNIX machines during ADPAC simulation for this run. Four cycles are required to reduce the total elapsed time by 30%. After that, further simulation does not provide any time reduction. This suggests that a local minimum, which is a criteria used to determine the minimum time, has been achieved. Although the final distribution is still somewhat unbalanced, the total amount of time reduced is considered to be impressive. For the second run, initial loading is given in Figure 9. Loads in Unix side were intentionally chosen five times more then NT sides. After four cycles of load balancing with the Greedy algorithm, almost 28% time-efficient distribution has been obtained. The new load distribution is given in Figure 10. Note that, at the beginning there were two extraneous tasks that belongs to other users. However, at the final cycle there is only one. For the third run, equal-loading was chosen initially for all computers. Initial loads are shown in Figure 11. There is only one extraneous load on Unix side. After 4 cycles of load balancing, new distribution of blocks is given in Figure 12. The gain of elapsed time in this case is just 1%. Only three blocks are moved when first and last cycles are compared. 8

9 Figure 7: Initial random-loading for the duct problem. Figure 8: Dynamic load balancing result after four cycles for initial random-loading for the duct problem. 9

10 Figure 9: Initial heavy-loading on Unix side for the duct problem. Figure 10: Dynamic load balancing result after four cycles for initial heavy-loading on Unix side for the duct problem. 10

11 Figure 11: Initial equal-loading for all computers for the duct problem. Figure 12: Dynamic load balancing result after four cycles for initial equal-loading for all computers for the duct problem. 11

12 5.2 Single-stage rotor-stator combination In this case, a rotor-stator stage, which is known as Stage 37 is solved with ADPAC. Stage 37 is a single rotor and a stator combinations that has a total mesh size of 146x25x25 grid points. Mesh is shown in Figure 13. For parallel computing and load balancing, 24 divisions were used in x-direction. A total of eight single-cpu computers composed of four IBM RS/6K and four PC/NT computers were chosen. Figure 13: Mesh of Stage 37 a single stage rotor-stator combination. Initial blocks were distributed as five on each RS/6K and one on PC/NT machines. Figure 14 shows the initial loading. There are two extraneous loads: one on the Unix side and the other on the NT side. Figure 14: Initial-heavy loading on Unix side for the Stage 37 problem. After four cycles of load balancing, the obtained new distribution is shown in Figure 15. Two extraneous loads are still on the same processors as in the initial distribution. Around 2% of the gain in elapsed time is observed. This may be due to the size of the problem, 12

13 communication traffic in the network, and instantaneous loading of computers in used the present application. However, load distribution is much better than initial the one. Figure 15: Dynamic load balancing result after four cycles for the Stage37 problem. 6. CONCLUSIONS Algorithms are presented for running large-scale CFD codes on heterogeneous systems consisting of Unix and NT operating systems. The results indicate the feasibility of using available resources for parallel computing, in spite of the unbalances which may occur due to differences in operating systems and networks, as well as processor speed and memory size. The issues associated for achieving optimum efficiency present interesting possibilities for load balancing. A prior knowledge about the performance of available computer resources can help more reasonable initial load distribution by using human intelligence. However, in multi-user and multi-cluster computing environments this may not always be obvious. Moreover, generally, the users cannot monitor extraneous loads in submitting their jobs. Therefore, it is necessary to employ load balancing to get higher performance on such clusters. ACKNOWLEDGEMENTS This research was supported by the NASA Glenn Research Center. The authors would like to express their gratitude to NASA and Rolls-Royce Allison for providing the inviscid version of the ADPAC code in this research. The support and advice of the following individuals are acknowledged: Dr. J.D. Chen, I. Tarkan, and R. Payli from IUPUI CFD Laboratory, IUPUI, and Dr. E.J. Hall from Rolls-Royce Allison, Indianapolis. 13

14 REFERENCES [1] H.U. Akay, R.A. Blech, R.A., A. Ecer, D. Ercoskun, B. Kemle, A. Quealy, and A.A. Williams, A Database Management System for Parallel Processing of CFD Algorithms, Proceeding of Parallel CFD 92, Edited by Pelz, A.B., et al., Elsevier, Amsterdam, pp. 9-23, [2] Y.P. Chien, A. Ecer, H.U. Akay, F. Carpenter, and R.A. Blech, Dynamic Load Balancing on a Network of Workstations for Solving Computational Fluid Dynamics Problems, Computer Methods in Applied Mechanics and Engineering, Vol. 119, pp , [3] Y.P. Chien, J.D. Chen, A. Ecer, and H.U. Akay, Dynamic Load Balancing for Parallel CFD on NT Networks, Proceeding of Parallel CFD 99, Edited by Keyes, et al., Elsevier, Amsterdam, 2000 (in print). [4] E.J. Hall, R.A. Delaney, and J.L. Bettner, Investigation of Advanced Counterrotation Blade Configuration Concepts for High Speed Turboprop Systems, NASA Contractor Report CR , May [5] A. Ecer, H.U. Akay, W.B. Kemle, H. Wang, D. Ercoskun, and E.J. Hall, Parallel Computation of Fluid Dynamics Problems, Computer Methods in Applied Mechanics and Engineering, Vol. 112, 1994, pp [6] L.P. Loo, Parallel Computing and Dynamic Load Balancing of ADPAC in a Heterogeneous Cluster of Unix and Windows/NT Computers, Master s Thesis, IUPUI, May 2000 (in progress). 14

LOAD BALANCING FOR MULTIPLE PARALLEL JOBS

LOAD BALANCING FOR MULTIPLE PARALLEL JOBS European Congress on Computational Methods in Applied Sciences and Engineering ECCOMAS 2000 Barcelona, 11-14 September 2000 ECCOMAS LOAD BALANCING FOR MULTIPLE PARALLEL JOBS A. Ecer, Y. P. Chien, H.U Akay

More information

Dynamic Load Balancing for Distributed Heterogeneous Computing of Parallel CFD Problems

Dynamic Load Balancing for Distributed Heterogeneous Computing of Parallel CFD Problems NASA / CR--2000-209939 -0 Dynamic Load Balancing for Distributed Heterogeneous Computing of Parallel CFD Problems A. Ecer, Y.P. Chien, J.D. Chen, T. Boenisch, and H.U. Akay Purdue School of Engineering

More information

Module 6 Case Studies

Module 6 Case Studies Module 6 Case Studies 1 Lecture 6.1 A CFD Code for Turbomachinery Flows 2 Development of a CFD Code The lecture material in the previous Modules help the student to understand the domain knowledge required

More information

Model of a flow in intersecting microchannels. Denis Semyonov

Model of a flow in intersecting microchannels. Denis Semyonov Model of a flow in intersecting microchannels Denis Semyonov LUT 2012 Content Objectives Motivation Model implementation Simulation Results Conclusion Objectives A flow and a reaction model is required

More information

Computational Modeling of Wind Turbines in OpenFOAM

Computational Modeling of Wind Turbines in OpenFOAM Computational Modeling of Wind Turbines in OpenFOAM Hamid Rahimi hamid.rahimi@uni-oldenburg.de ForWind - Center for Wind Energy Research Institute of Physics, University of Oldenburg, Germany Outline Computational

More information

Turbomachinery CFD on many-core platforms experiences and strategies

Turbomachinery CFD on many-core platforms experiences and strategies Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29

More information

Multicore Parallel Computing with OpenMP

Multicore Parallel Computing with OpenMP Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large

More information

PUTTING THE SPIN IN CFD

PUTTING THE SPIN IN CFD W H I T E PA P E R PUTTING THE SPIN IN CFD Overview Engineers who design equipment with rotating components need to analyze and understand the behavior of those components if they want to improve performance.

More information

A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster

A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster , pp.11-20 http://dx.doi.org/10.14257/ ijgdc.2014.7.2.02 A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster Kehe Wu 1, Long Chen 2, Shichao Ye 2 and Yi Li 2 1 Beijing

More information

Pushing the limits. Turbine simulation for next-generation turbochargers

Pushing the limits. Turbine simulation for next-generation turbochargers Pushing the limits Turbine simulation for next-generation turbochargers KWOK-KAI SO, BENT PHILLIPSEN, MAGNUS FISCHER Computational fluid dynamics (CFD) has matured and is now an indispensable tool for

More information

Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach

Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach S. M. Ashraful Kadir 1 and Tazrian Khan 2 1 Scientific Computing, Royal Institute of Technology (KTH), Stockholm, Sweden smakadir@csc.kth.se,

More information

PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms

PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms P. E. Vincent! Department of Aeronautics Imperial College London! 25 th March 2014 Overview Motivation Flux Reconstruction Many-Core

More information

Compatibility and Accuracy of Mesh Generation in HyperMesh and CFD Simulation with Acusolve for Torque Converter

Compatibility and Accuracy of Mesh Generation in HyperMesh and CFD Simulation with Acusolve for Torque Converter Compatibility and Accuracy of Mesh Genen in HyperMesh and CFD Simulation with Acusolve for Converter Kathiresan M CFD Engineer Valeo India Private Limited Block - A, 4th Floor, TECCI Park, No. 176 Rajiv

More information

CFD Application on Food Industry; Energy Saving on the Bread Oven

CFD Application on Food Industry; Energy Saving on the Bread Oven Middle-East Journal of Scientific Research 13 (8): 1095-1100, 2013 ISSN 1990-9233 IDOSI Publications, 2013 DOI: 10.5829/idosi.mejsr.2013.13.8.548 CFD Application on Food Industry; Energy Saving on the

More information

Hari Reddy High Performance Computing Solutions Development Systems and Technology Group IBM 6609 Carriage Drive Colleyville, TX 76034

Hari Reddy High Performance Computing Solutions Development Systems and Technology Group IBM 6609 Carriage Drive Colleyville, TX 76034 PERFORMANCE EVALUATION OF STATIC AND DYNAMIC LOAD-BALANCING SCHEMES FOR A PARALLEL COMPUTATIONAL FLUID DYNAMICS SOFTWARE (CFD) APPLICATION (FLUENT) DISTRIBUTED ACROSS CLUSTERS OF HETEROGENEOUS SYMMETRIC

More information

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,

More information

reduction critical_section

reduction critical_section A comparison of OpenMP and MPI for the parallel CFD test case Michael Resch, Bjíorn Sander and Isabel Loebich High Performance Computing Center Stuttgart èhlrsè Allmandring 3, D-755 Stuttgart Germany resch@hlrs.de

More information

Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin. http://www.dell.com/clustering

Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin. http://www.dell.com/clustering Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin Reza Rooholamini, Ph.D. Director Enterprise Solutions Dell Computer Corp. Reza_Rooholamini@dell.com http://www.dell.com/clustering

More information

A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment

A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed

More information

Scientific Computing Programming with Parallel Objects

Scientific Computing Programming with Parallel Objects Scientific Computing Programming with Parallel Objects Esteban Meneses, PhD School of Computing, Costa Rica Institute of Technology Parallel Architectures Galore Personal Computing Embedded Computing Moore

More information

Introduction. 1.1 Motivation. Chapter 1

Introduction. 1.1 Motivation. Chapter 1 Chapter 1 Introduction The automotive, aerospace and building sectors have traditionally used simulation programs to improve their products or services, focusing their computations in a few major physical

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

A Load Balancing Tool for Structured Multi-Block Grid CFD Applications

A Load Balancing Tool for Structured Multi-Block Grid CFD Applications A Load Balancing Tool for Structured Multi-Block Grid CFD Applications K. P. Apponsah and D. W. Zingg University of Toronto Institute for Aerospace Studies (UTIAS), Toronto, ON, M3H 5T6, Canada Email:

More information

COMPUTATIONAL FLUID DYNAMICS (CFD) ANALYSIS OF INTERMEDIATE PRESSURE STEAM TURBINE

COMPUTATIONAL FLUID DYNAMICS (CFD) ANALYSIS OF INTERMEDIATE PRESSURE STEAM TURBINE Research Paper ISSN 2278 0149 www.ijmerr.com Vol. 3, No. 4, October, 2014 2014 IJMERR. All Rights Reserved COMPUTATIONAL FLUID DYNAMICS (CFD) ANALYSIS OF INTERMEDIATE PRESSURE STEAM TURBINE Shivakumar

More information

Modelling and CFD Analysis of Single Stage IP Steam Turbine

Modelling and CFD Analysis of Single Stage IP Steam Turbine International Journal of Mechanical Engineering, ISSN:2051-3232, Vol.42, Issue.1 1215 Modelling and CFD Analysis of Single Stage IP Steam Turbine C RAJESH BABU Mechanical Engineering Department, Gitam

More information

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State

More information

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching

More information

Performance prediction of a centrifugal pump working in direct and reverse mode using Computational Fluid Dynamics

Performance prediction of a centrifugal pump working in direct and reverse mode using Computational Fluid Dynamics European Association for the Development of Renewable Energies, Environment and Power Quality (EA4EPQ) International Conference on Renewable Energies and Power Quality (ICREPQ 10) Granada (Spain), 23rd

More information

Building an Inexpensive Parallel Computer

Building an Inexpensive Parallel Computer Res. Lett. Inf. Math. Sci., (2000) 1, 113-118 Available online at http://www.massey.ac.nz/~wwiims/rlims/ Building an Inexpensive Parallel Computer Lutz Grosz and Andre Barczak I.I.M.S., Massey University

More information

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906

More information

Efficient Convergence Acceleration for a Parallel CFD Code

Efficient Convergence Acceleration for a Parallel CFD Code Efficient Convergence Acceleration for a Parallel CFD Code R.D. Williams a, J. Häuser b, and R. Winkelmann b a California Institute of Technology, Pasadena, California b Center of Logistics and Expert

More information

Recommended hardware system configurations for ANSYS users

Recommended hardware system configurations for ANSYS users Recommended hardware system configurations for ANSYS users The purpose of this document is to recommend system configurations that will deliver high performance for ANSYS users across the entire range

More information

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. LS-DYNA Scalability on Cray Supercomputers Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. WP-LS-DYNA-12213 www.cray.com Table of Contents Abstract... 3 Introduction... 3 Scalability

More information

ABSTRACT FOR THE 1ST INTERNATIONAL WORKSHOP ON HIGH ORDER CFD METHODS

ABSTRACT FOR THE 1ST INTERNATIONAL WORKSHOP ON HIGH ORDER CFD METHODS 1 ABSTRACT FOR THE 1ST INTERNATIONAL WORKSHOP ON HIGH ORDER CFD METHODS Sreenivas Varadan a, Kentaro Hara b, Eric Johnsen a, Bram Van Leer b a. Department of Mechanical Engineering, University of Michigan,

More information

Parallel Scalable Algorithms- Performance Parameters

Parallel Scalable Algorithms- Performance Parameters www.bsc.es Parallel Scalable Algorithms- Performance Parameters Vassil Alexandrov, ICREA - Barcelona Supercomputing Center, Spain Overview Sources of Overhead in Parallel Programs Performance Metrics for

More information

CFD Lab Department of Engineering The University of Liverpool

CFD Lab Department of Engineering The University of Liverpool Development of a CFD Method for Aerodynamic Analysis of Large Diameter Horizontal Axis wind turbines S. Gomez-Iradi, G.N. Barakos and X. Munduate 2007 joint meeting of IEA Annex 11 and Annex 20 Risø National

More information

High Performance Computing in CST STUDIO SUITE

High Performance Computing in CST STUDIO SUITE High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver

More information

CFD Analysis of a Centrifugal Pump with Supercritical Carbon Dioxide as a Working Fluid

CFD Analysis of a Centrifugal Pump with Supercritical Carbon Dioxide as a Working Fluid KNS 2013 Spring CFD Analysis of a Centrifugal Pump with Supercritical Carbon Dioxide as a Working Fluid Seong Gu Kim Jeong Ik Lee Yoonhan Ahn Jekyoung Lee Jae Eun Cha Yacine Addad Dept. Nuclear & Quantum

More information

Methodology for predicting the energy consumption of SPMD application on virtualized environments *

Methodology for predicting the energy consumption of SPMD application on virtualized environments * Methodology for predicting the energy consumption of SPMD application on virtualized environments * Javier Balladini, Ronal Muresano +, Remo Suppi +, Dolores Rexachs + and Emilio Luque + * Computer Engineering

More information

Icepak High-Performance Computing at Rockwell Automation: Benefits and Benchmarks

Icepak High-Performance Computing at Rockwell Automation: Benefits and Benchmarks Icepak High-Performance Computing at Rockwell Automation: Benefits and Benchmarks Garron K. Morris Senior Project Thermal Engineer gkmorris@ra.rockwell.com Standard Drives Division Bruce W. Weiss Principal

More information

Interactive comment on A parallelization scheme to simulate reactive transport in the subsurface environment with OGS#IPhreeqc by W. He et al.

Interactive comment on A parallelization scheme to simulate reactive transport in the subsurface environment with OGS#IPhreeqc by W. He et al. Geosci. Model Dev. Discuss., 8, C1166 C1176, 2015 www.geosci-model-dev-discuss.net/8/c1166/2015/ Author(s) 2015. This work is distributed under the Creative Commons Attribute 3.0 License. Geoscientific

More information

GPUs for Scientific Computing

GPUs for Scientific Computing GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research

More information

Control 2004, University of Bath, UK, September 2004

Control 2004, University of Bath, UK, September 2004 Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of

More information

Accelerating CFD using OpenFOAM with GPUs

Accelerating CFD using OpenFOAM with GPUs Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide

More information

THE NAS KERNEL BENCHMARK PROGRAM

THE NAS KERNEL BENCHMARK PROGRAM THE NAS KERNEL BENCHMARK PROGRAM David H. Bailey and John T. Barton Numerical Aerodynamic Simulations Systems Division NASA Ames Research Center June 13, 1986 SUMMARY A benchmark test program that measures

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

Virtual Machines. www.viplavkambli.com

Virtual Machines. www.viplavkambli.com 1 Virtual Machines A virtual machine (VM) is a "completely isolated guest operating system installation within a normal host operating system". Modern virtual machines are implemented with either software

More information

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,

More information

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Dr. Maurice Eggen Nathan Franklin Department of Computer Science Trinity University San Antonio, Texas 78212 Dr. Roger Eggen Department

More information

NUMERICAL ANALYSIS OF THE EFFECTS OF WIND ON BUILDING STRUCTURES

NUMERICAL ANALYSIS OF THE EFFECTS OF WIND ON BUILDING STRUCTURES Vol. XX 2012 No. 4 28 34 J. ŠIMIČEK O. HUBOVÁ NUMERICAL ANALYSIS OF THE EFFECTS OF WIND ON BUILDING STRUCTURES Jozef ŠIMIČEK email: jozef.simicek@stuba.sk Research field: Statics and Dynamics Fluids mechanics

More information

Operation Count; Numerical Linear Algebra

Operation Count; Numerical Linear Algebra 10 Operation Count; Numerical Linear Algebra 10.1 Introduction Many computations are limited simply by the sheer number of required additions, multiplications, or function evaluations. If floating-point

More information

Grid Scheduling Dictionary of Terms and Keywords

Grid Scheduling Dictionary of Terms and Keywords Grid Scheduling Dictionary Working Group M. Roehrig, Sandia National Laboratories W. Ziegler, Fraunhofer-Institute for Algorithms and Scientific Computing Document: Category: Informational June 2002 Status

More information

P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE

P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE 1 P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE JEAN-MARC GRATIEN, JEAN-FRANÇOIS MAGRAS, PHILIPPE QUANDALLE, OLIVIER RICOIS 1&4, av. Bois-Préau. 92852 Rueil Malmaison Cedex. France

More information

Choosing a Computer for Running SLX, P3D, and P5

Choosing a Computer for Running SLX, P3D, and P5 Choosing a Computer for Running SLX, P3D, and P5 This paper is based on my experience purchasing a new laptop in January, 2010. I ll lead you through my selection criteria and point you to some on-line

More information

A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture

A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture Yangsuk Kee Department of Computer Engineering Seoul National University Seoul, 151-742, Korea Soonhoi

More information

Computational Fluid Dynamics Research Projects at Cenaero (2011)

Computational Fluid Dynamics Research Projects at Cenaero (2011) Computational Fluid Dynamics Research Projects at Cenaero (2011) Cenaero (www.cenaero.be) is an applied research center focused on the development of advanced simulation technologies for aeronautics. Located

More information

2: Computer Performance

2: Computer Performance 2: Computer Performance http://people.sc.fsu.edu/ jburkardt/presentations/ fdi 2008 lecture2.pdf... John Information Technology Department Virginia Tech... FDI Summer Track V: Parallel Programming 10-12

More information

A Review of Customized Dynamic Load Balancing for a Network of Workstations

A Review of Customized Dynamic Load Balancing for a Network of Workstations A Review of Customized Dynamic Load Balancing for a Network of Workstations Taken from work done by: Mohammed Javeed Zaki, Wei Li, Srinivasan Parthasarathy Computer Science Department, University of Rochester

More information

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Welcome! Who am I? William (Bill) Gropp Professor of Computer Science One of the Creators of

More information

High Performance Computing

High Performance Computing High Performance Computing Trey Breckenridge Computing Systems Manager Engineering Research Center Mississippi State University What is High Performance Computing? HPC is ill defined and context dependent.

More information

Parallelism and Cloud Computing

Parallelism and Cloud Computing Parallelism and Cloud Computing Kai Shen Parallel Computing Parallel computing: Process sub tasks simultaneously so that work can be completed faster. For instances: divide the work of matrix multiplication

More information

Optimization of Cluster Web Server Scheduling from Site Access Statistics

Optimization of Cluster Web Server Scheduling from Site Access Statistics Optimization of Cluster Web Server Scheduling from Site Access Statistics Nartpong Ampornaramveth, Surasak Sanguanpong Faculty of Computer Engineering, Kasetsart University, Bangkhen Bangkok, Thailand

More information

TwinMesh for Positive Displacement Machines: Structured Meshes and reliable CFD Simulations

TwinMesh for Positive Displacement Machines: Structured Meshes and reliable CFD Simulations TwinMesh for Positive Displacement Machines: Structured Meshes and reliable CFD Simulations 05.06.2014 Dipl.-Ing. Jan Hesse, Dr. Andreas Spille-Kohoff CFX Berlin Software GmbH Karl-Marx-Allee 90 A 10243

More information

Rapid Design of an optimized Radial Compressor using CFturbo and ANSYS

Rapid Design of an optimized Radial Compressor using CFturbo and ANSYS Rapid Design of an optimized Radial Compressor using CFturbo and ANSYS Enrique Correa, Marius Korfanty, Sebastian Stübing CFturbo Software & Engineering GmbH, Dresden (Germany) PRESENTATION TOPICS 1. Company

More information

CFD Analysis of Swept and Leaned Transonic Compressor Rotor

CFD Analysis of Swept and Leaned Transonic Compressor Rotor CFD Analysis of Swept and Leaned Transonic Compressor Nivin Francis #1, J. Bruce Ralphin Rose *2 #1 Student, Department of Aeronautical Engineering& Regional Centre of Anna University Tirunelveli India

More information

Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications

Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications Rupak Biswas MRJ Technology Solutions NASA Ames Research Center Moffett Field, CA 9435, USA rbiswas@nas.nasa.gov

More information

Turbulence Modeling in CFD Simulation of Intake Manifold for a 4 Cylinder Engine

Turbulence Modeling in CFD Simulation of Intake Manifold for a 4 Cylinder Engine HEFAT2012 9 th International Conference on Heat Transfer, Fluid Mechanics and Thermodynamics 16 18 July 2012 Malta Turbulence Modeling in CFD Simulation of Intake Manifold for a 4 Cylinder Engine Dr MK

More information

Performance Metrics and Scalability Analysis. Performance Metrics and Scalability Analysis

Performance Metrics and Scalability Analysis. Performance Metrics and Scalability Analysis Performance Metrics and Scalability Analysis 1 Performance Metrics and Scalability Analysis Lecture Outline Following Topics will be discussed Requirements in performance and cost Performance metrics Work

More information

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age.

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age. Volume 3, Issue 10, October 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Load Measurement

More information

JOB/scheduling system is one of the core and challenging. An Greedy-Based Job Scheduling Algorithm in Cloud Computing

JOB/scheduling system is one of the core and challenging. An Greedy-Based Job Scheduling Algorithm in Cloud Computing JOURNAL OF SOFTWARE, VOL. 9, NO. 4, APRIL 2014 921 An Greedy-Based Job Scheduling Algorithm in Cloud Computing Ji Li a,b, Longhua Feng a,b, Shenglong Fang c a College of Computer Science, Chongqing University,

More information

Observations on Data Distribution and Scalability of Parallel and Distributed Image Processing Applications

Observations on Data Distribution and Scalability of Parallel and Distributed Image Processing Applications Observations on Data Distribution and Scalability of Parallel and Distributed Image Processing Applications Roman Pfarrhofer and Andreas Uhl uhl@cosy.sbg.ac.at R. Pfarrhofer & A. Uhl 1 Carinthia Tech Institute

More information

Benchmarking COMSOL Multiphysics 3.5a CFD problems

Benchmarking COMSOL Multiphysics 3.5a CFD problems Presented at the COMSOL Conference 2009 Boston Benchmarking COMSOL Multiphysics 3.5a CFD problems Darrell W. Pepper Xiuling Wang* Nevada Center for Advanced Computational Methods University of Nevada Las

More information

ES250: Electrical Science. HW7: Energy Storage Elements

ES250: Electrical Science. HW7: Energy Storage Elements ES250: Electrical Science HW7: Energy Storage Elements Introduction This chapter introduces two more circuit elements, the capacitor and the inductor whose elements laws involve integration or differentiation;

More information

FPGA area allocation for parallel C applications

FPGA area allocation for parallel C applications 1 FPGA area allocation for parallel C applications Vlad-Mihai Sima, Elena Moscu Panainte, Koen Bertels Computer Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University

More information

Performance Comparison of a Vertical Axis Wind Turbine using Commercial and Open Source Computational Fluid Dynamics based Codes

Performance Comparison of a Vertical Axis Wind Turbine using Commercial and Open Source Computational Fluid Dynamics based Codes Performance Comparison of a Vertical Axis Wind Turbine using Commercial and Open Source Computational Fluid Dynamics based Codes Taimoor Asim 1, Rakesh Mishra 1, Sree Nirjhor Kaysthagir 1, Ghada Aboufares

More information

Model Order Reduction for Linear Convective Thermal Flow

Model Order Reduction for Linear Convective Thermal Flow Model Order Reduction for Linear Convective Thermal Flow Christian Moosmann, Evgenii B. Rudnyi, Andreas Greiner, Jan G. Korvink IMTEK, April 24 Abstract Simulation of the heat exchange between a solid

More information

Cost Effective Testbeds and Code Parallelization Efforts

Cost Effective Testbeds and Code Parallelization Efforts Cost Effective Testbeds and Code Parallelization Efforts Annual Review and Planning Meeting October 9-10, 2002 Isaac López Computing and Interdisciplinary Systems Office Glenn Research Center Cost-effective

More information

Application of CFD modelling to the Design of Modern Data Centres

Application of CFD modelling to the Design of Modern Data Centres Application of CFD modelling to the Design of Modern Data Centres White Paper March 2012 By Sam Wicks BEng CFD Applications Engineer Sudlows March 14, 2012 Application of CFD modelling to the Design of

More information

On-Demand Supercomputing Multiplies the Possibilities

On-Demand Supercomputing Multiplies the Possibilities Microsoft Windows Compute Cluster Server 2003 Partner Solution Brief Image courtesy of Wolfram Research, Inc. On-Demand Supercomputing Multiplies the Possibilities Microsoft Windows Compute Cluster Server

More information

Tomasz STELMACH. WindSim Annual User Meeting 16 June 2011

Tomasz STELMACH. WindSim Annual User Meeting 16 June 2011 Developments of PHOENICS as CFD engine for WindSim Tomasz STELMACH Ltd, UK ts@cham.co.uk WindSim Annual User Meeting 16 June 2011 Topics of presentation 1. - who we are, what we do 2. PHOENICS 3. GCV -

More information

THE CFD SIMULATION OF THE FLOW AROUND THE AIRCRAFT USING OPENFOAM AND ANSA

THE CFD SIMULATION OF THE FLOW AROUND THE AIRCRAFT USING OPENFOAM AND ANSA THE CFD SIMULATION OF THE FLOW AROUND THE AIRCRAFT USING OPENFOAM AND ANSA Adam Kosík Evektor s.r.o., Czech Republic KEYWORDS CFD simulation, mesh generation, OpenFOAM, ANSA ABSTRACT In this paper we describe

More information

Load Balancing MPI Algorithm for High Throughput Applications

Load Balancing MPI Algorithm for High Throughput Applications Load Balancing MPI Algorithm for High Throughput Applications Igor Grudenić, Stjepan Groš, Nikola Bogunović Faculty of Electrical Engineering and, University of Zagreb Unska 3, 10000 Zagreb, Croatia {igor.grudenic,

More information

A Review on an Algorithm for Dynamic Load Balancing in Distributed Network with Multiple Supporting Nodes with Interrupt Service

A Review on an Algorithm for Dynamic Load Balancing in Distributed Network with Multiple Supporting Nodes with Interrupt Service A Review on an Algorithm for Dynamic Load Balancing in Distributed Network with Multiple Supporting Nodes with Interrupt Service Payal Malekar 1, Prof. Jagruti S. Wankhede 2 Student, Information Technology,

More information

Introduction to CFD Analysis

Introduction to CFD Analysis Introduction to CFD Analysis 2-1 What is CFD? Computational Fluid Dynamics (CFD) is the science of predicting fluid flow, heat and mass transfer, chemical reactions, and related phenomena by solving numerically

More information

TWO-DIMENSIONAL FINITE ELEMENT ANALYSIS OF FORCED CONVECTION FLOW AND HEAT TRANSFER IN A LAMINAR CHANNEL FLOW

TWO-DIMENSIONAL FINITE ELEMENT ANALYSIS OF FORCED CONVECTION FLOW AND HEAT TRANSFER IN A LAMINAR CHANNEL FLOW TWO-DIMENSIONAL FINITE ELEMENT ANALYSIS OF FORCED CONVECTION FLOW AND HEAT TRANSFER IN A LAMINAR CHANNEL FLOW Rajesh Khatri 1, 1 M.Tech Scholar, Department of Mechanical Engineering, S.A.T.I., vidisha

More information

Overlapping Data Transfer With Application Execution on Clusters

Overlapping Data Transfer With Application Execution on Clusters Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer

More information

Figure 1. The cloud scales: Amazon EC2 growth [2].

Figure 1. The cloud scales: Amazon EC2 growth [2]. - Chung-Cheng Li and Kuochen Wang Department of Computer Science National Chiao Tung University Hsinchu, Taiwan 300 shinji10343@hotmail.com, kwang@cs.nctu.edu.tw Abstract One of the most important issues

More information

Application of a Development Time Productivity Metric to Parallel Software Development

Application of a Development Time Productivity Metric to Parallel Software Development Application of a Development Time Metric to Parallel Software Development Andrew Funk afunk@ll.mit.edu Victor Basili 2 basili@cs.umd.edu Lorin Hochstein 2 lorin@cs.umd.edu Jeremy Kepner kepner@ll.mit.edu

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

Training a Self-Organizing distributed on a PVM network

Training a Self-Organizing distributed on a PVM network Training a Self-Organizing Map distributed on a PVM network Nuno Bandeira Dep.Informatics, New University of Lisbon, Quinta da Torre 85 MONTE DA CAPARICA, PORTUGAL nb@di.fct.unl.pt Victor Jose Lobo Fernando

More information

Distributed communication-aware load balancing with TreeMatch in Charm++

Distributed communication-aware load balancing with TreeMatch in Charm++ Distributed communication-aware load balancing with TreeMatch in Charm++ The 9th Scheduling for Large Scale Systems Workshop, Lyon, France Emmanuel Jeannot Guillaume Mercier Francois Tessier In collaboration

More information

Lattice QCD Performance. on Multi core Linux Servers

Lattice QCD Performance. on Multi core Linux Servers Lattice QCD Performance on Multi core Linux Servers Yang Suli * Department of Physics, Peking University, Beijing, 100871 Abstract At the moment, lattice quantum chromodynamics (lattice QCD) is the most

More information

A SIMULATOR FOR LOAD BALANCING ANALYSIS IN DISTRIBUTED SYSTEMS

A SIMULATOR FOR LOAD BALANCING ANALYSIS IN DISTRIBUTED SYSTEMS Mihai Horia Zaharia, Florin Leon, Dan Galea (3) A Simulator for Load Balancing Analysis in Distributed Systems in A. Valachi, D. Galea, A. M. Florea, M. Craus (eds.) - Tehnologii informationale, Editura

More information

Chapter 18: Database System Architectures. Centralized Systems

Chapter 18: Database System Architectures. Centralized Systems Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and

More information

CS2101a Foundations of Programming for High Performance Computing

CS2101a Foundations of Programming for High Performance Computing CS2101a Foundations of Programming for High Performance Computing Marc Moreno Maza & Ning Xie University of Western Ontario, London, Ontario (Canada) CS2101 Plan 1 Course Overview 2 Hardware Acceleration

More information

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture White Paper Intel Xeon processor E5 v3 family Intel Xeon Phi coprocessor family Digital Design and Engineering Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture Executive

More information

Scalability of Master-Worker Architecture on Heroku

Scalability of Master-Worker Architecture on Heroku Scalability of Master- Architecture on Heroku Vibhor Aggarwal, Shubhashis Sengupta, Vibhu Soujanya Sharma, Aravindan Santharam Accenture Technology Labs Page 0 Table of Contents Synopsis... 2 Introduction...

More information

BBIPED: BCAM-Baltogar Industrial Platform for Engineering design

BBIPED: BCAM-Baltogar Industrial Platform for Engineering design BBIPED: BCAM-Baltogar Industrial Platform for Engineering design Carmen Alonso-Montes, Imanol García, Ali Ramezani, Lakhdar Remaki BCAM Basque Center for Applied Mathematics (Bilbao), Spain Motivation

More information

64-Bit versus 32-Bit CPUs in Scientific Computing

64-Bit versus 32-Bit CPUs in Scientific Computing 64-Bit versus 32-Bit CPUs in Scientific Computing Axel Kohlmeyer Lehrstuhl für Theoretische Chemie Ruhr-Universität Bochum March 2004 1/25 Outline 64-Bit and 32-Bit CPU Examples

More information