Performance metrics for parallelism


 Dwight Lindsey
 2 years ago
 Views:
Transcription
1 Performance metrics for parallelism 8th of November, 2013
2 Sources Rob H. Bisseling; Parallel Scientific Computing, Oxford Press. Grama, Gupta, Karypis, Kumar; Parallel Computing, Addison Wesley.
3 Definition (Parallel overhead) let T seq be the time taken by a sequential algorithm; let T p be the time taken by a parallelisation of that algorithm, using p processes. Then, the parallel overhead T o is given by T o = pt p T s. (Effort is proportional to the number of workers multiplied with the duration of their work, that is, equal to pt p.) Best case: T o = 0, such that T p = T seq /p.
4 Definition (Speedup) Let T seq, p, and T p be as before. Then, the speedup S is given by S(p) = T seq /T p.
5 Definition (Speedup) Let T seq, p, and T p be as before. Then, the speedup S is given by S(p) = T seq /T p. Target: S = p (no overhead; T o = 0). Best case: S > p (superlinear speedup). Worst case: S < 1 (slowdown).
6 What is T seq? Many sequential algorithms solving the same problem.
7 What is T seq? Many sequential algorithms solving the same problem. When determining the speedup S, compare against the best sequential algorithm (that is available on your architecture).
8 What is T seq? Many sequential algorithms solving the same problem. When determining the speedup S, compare against the best sequential algorithm (that is available on your architecture). When determining the overhead T o, compare against the most similar algorithm (maybe even take T seq = T 1 ).
9 Definition (strong scaling) S(p) = T seq /T p = Ω(p) (i.e., lim sup p S(p)/p > 0) 6 5 MulticoreBSP for Java: FFT on a Sun UltraSparc T2 Perfect scaling 512klength vector log 2 speedup log 2 p Question: is it reasonable to expect strong scalability for (good) parallel algorithms?
10 Definition (strong scaling) S(p) = T seq /T p = Ω(p) (i.e., lim sup p S(p)/p > 0) 6 5 MulticoreBSP for Java: FFT on a Sun UltraSparc T2 Perfect scaling 512klength vector 32Mlength vector log 2 speedup log 2 p Answer: not as p. You cannot efficiently clean a table with 50 people, or paint a wall with 500 painters.
11 Definition (weak scaling) S(n) = T seq (n)/t p (n) = Ω(1), with p fixed BSPlib: FFT on two different parallel machines Perfect speedup Lynx (distributedmemory) HP DL980 (sharedmemory) speedup log 2 n For large enough problems, we do expect to make maximum use of our parallel computer.
12 : example MulticoreBSP for Java: FFT on a Sun UltraSparc T speedup Perfect scaling 64 processes log 2 n This processor advertises 64 processors,
13 : example MulticoreBSP for Java: FFT on a Sun UltraSparc T speedup Perfect scaling 64 processes log 2 n This processor advertises 64 processors, but only has 32 FPUs. We oversubscribed!
14 MulticoreBSP for Java: FFT on a Sun UltraSparc T speedup Perfect scaling 64 processes 32 processes log 2 n This processor advertises 64 processors, but only has 32 FPUs. Be careful with oversubscription! (Including hyperthreading!)
15 MulticoreBSP for Java: FFT on a Sun UltraSparc T speedup Perfect scaling 64 processes 32 processes log 2 n This processor advertises 64 processors, but only has 32 FPUs. Q: would you say this algorithm scales on the Sun Ultrasparc T2?
16 A: if the speedup stabilises around 16x, then yes, since the relative efficiency is stable. Definition (Parallel efficiency) Let T seq, p, T p, and S as before. The parallel efficiency E equals E = T seq p /T p = T seq /pt p = S/p.
17 If there is no overhead (T o = pt p T seq = 0), the efficiency E = 1; decreasing the overhead increases the efficiency. Weak scalability: what happens if the problem size increases?
18 If there is no overhead (T o = pt p T seq = 0), the efficiency E = 1; decreasing the overhead increases the efficiency. Weak scalability: what happens if the problem size increases? But what is a sensible definition of the problem size? Consider the following applications: innerproduct calculation; binary search; sorting (quicksort).
19 Problem Size Runtime Innerproduct Θ(n) bytes Θ(n) flops Binary search Θ(n) bytes Θ(log 2 n) comparisons Sorting Θ(n) bytes Θ(n log 2 n) swaps FFT Θ(n) bytes Θ(n log 2 n) flops Hence the problem size is best identified by T seq. Question: How should the ratio T o /T seq behave as T seq, for the algorithm to scale in a weak sense?
20 If T o /T seq = c, with c R 0 constant, then pt p T seq T seq = ps 1 1 = c, so S = Note that here, E = S/p = 1 c+1 p, which is constant when p is fixed. c + 1 Question: How should the ratio T o /T seq behave as p, for the algorithm to scale in a strong sense?
21 If T o /T seq = c, with c R 0 constant, then pt p T seq T seq = ps 1 1 = c, so S = Note that here, E = S/p = 1 c+1 p c + 1. which is still constant! Answer: Exactly the same! Both strong and weak scalability are isoefficiency constraints (E remains constant).
22 Definition (isoefficiency) Let E be as before. Suppose T o = f (T seq, p) is a known function. Then the isoefficiency relation is given by T seq = 1 1 E 1f (T seq, p). This follows from the definition of E: E 1 = pt p /T seq + 1 T seq T seq = 1 + pt p T seq T seq = 1 + T o T seq, so T seq (E 1 1) = T o.
23 Questions Does this scale? Yes! (It s again an FFT) Time Problem size
24 Questions Does this scale? Yes! (It s again an FFT) 10 Measured running time Theoretical optimum (asymptotic) Execution time (logarithmic) log 2 n
25 Questions Better use speedups when investigating scalability. 70 HP DL speedup log 2 n
26 In summary... A BSP algorithm is scalable when T = O(T seq /p + p). This considers scalability of the speedup and includes parallel overhead.
27 In summary... A BSP algorithm is scalable when T = O(T seq /p + p). This considers scalability of the speedup and includes parallel overhead. It does not include memory scalability: M = O(M seq /p + p), where M is the memory taken by one BSP process and M seq the memory requirement of the best sequential algorithm. Question: does this definition of a scalable algorithm make sense?
28 In summary... It does indeed make sense: If T o = Θ(p), then E 1 = pt p T s E = = 1 + pt p T s T s = 1 + T o T s p/t s = T s T s + p. Note lim p E = 0 (strong scalability, Amdahl s Law) and lim Ts = 1 (weak scalability). For isoefficiency, the ratio T o and p has to change appropriately.
29 Practical session Subjects: Coming Thursday, see ~albertjan.yzelman/education/parco13/ 1 Amdahl s Law 2 Parallel Sorting 3 Parallel Grid computations
30 Insights from the practical session This intuition leads to a definition of how parallel certain algorithms are: Definition (Parallelism) Consider a parallel algorithm that runs in T p time. Let T seq the time taken by the best sequential algorithm that solves the same problem. Then the parallelism is given by T seq T T seq = lim. p T p This kind of analysis is fundamental for finegrained parallelisation schemes. Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou Cilk: an efficient multithreaded runtime system. SIGPLAN Not. 30, 8 (August 1995), pp
Performance metrics for parallel systems
Performance metrics for parallel systems S.S. Kadam CDAC, Pune sskadam@cdac.in CDAC/SECG/2006 1 Purpose To determine best parallel algorithm Evaluate hardware platforms Examine the benefits from parallelism
More informationParallel Scalable Algorithms Performance Parameters
www.bsc.es Parallel Scalable Algorithms Performance Parameters Vassil Alexandrov, ICREA  Barcelona Supercomputing Center, Spain Overview Sources of Overhead in Parallel Programs Performance Metrics for
More informationFormal Metrics for LargeScale Parallel Performance
Formal Metrics for LargeScale Parallel Performance Kenneth Moreland and Ron Oldfield Sandia National Laboratories, Albuquerque, NM 87185, USA {kmorel,raoldfi}@sandia.gov Abstract. Performance measurement
More informationPerformance Metrics and Scalability Analysis. Performance Metrics and Scalability Analysis
Performance Metrics and Scalability Analysis 1 Performance Metrics and Scalability Analysis Lecture Outline Following Topics will be discussed Requirements in performance and cost Performance metrics Work
More informationAdvanced Computer Architecture
Advanced Computer Architecture Institute for Multimedia and Software Engineering Conduction of Exercises: Institute for Multimedia eda and Software Engineering g BB 315c, Tel: 3791174 Email: marius.rosu@unidue.de
More informationDistributed Load Balancing for Machines Fully Heterogeneous
Internship Report 2 nd of June  22 th of August 2014 Distributed Load Balancing for Machines Fully Heterogeneous Nathanaël Cheriere nathanael.cheriere@ensrennes.fr ENS Rennes Academic Year 20132014
More informationCost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:
CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm
More informationPerformance Metrics for Parallel Programs. 8 March 2010
Performance Metrics for Parallel Programs 8 March 2010 Content measuring time towards analytic modeling, execution time, overhead, speedup, efficiency, cost, granularity, scalability, roadblocks, asymptotic
More informationCS 575 Parallel Processing
CS 575 Parallel Processing Lecture one: Introduction Wim Bohm Colorado State University Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5
More information{emery,browne}@cs.utexas.edu ABSTRACT. Keywords scalable, load distribution, load balancing, work stealing
Scalable Load Distribution and Load Balancing for Dynamic Parallel Programs E. Berger and J. C. Browne Department of Computer Science University of Texas at Austin Austin, Texas 78701 USA 01512471{9734,9579}
More informationSWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri
SWARM: A Parallel Programming Framework for Multicore Processors David A. Bader, Varun N. Kanade and Kamesh Madduri Our Contributions SWARM: SoftWare and Algorithms for Running on Multicore, a portable
More informationA Review of Customized Dynamic Load Balancing for a Network of Workstations
A Review of Customized Dynamic Load Balancing for a Network of Workstations Taken from work done by: Mohammed Javeed Zaki, Wei Li, Srinivasan Parthasarathy Computer Science Department, University of Rochester
More informationPerformance Evaluation for Parallel Systems: A Survey
Performance Evaluation for Parallel Systems: A Survey Lei Hu and Ian Gorton Department of Computer Systems School of Computer Science and Engineering University of NSW, Sydney 2052, Australia Email: {lei,
More informationUsing Mining@Home for Distributed Ensemble Learning
Using Mining@Home for Distributed Ensemble Learning Eugenio Cesario 1, Carlo Mastroianni 1, and Domenico Talia 1,2 1 ICARCNR, Italy {cesario,mastroianni}@icar.cnr.it 2 University of Calabria, Italy talia@deis.unical.it
More informationDYNAMIC CLOUD PROVISIONING FOR SCIENTIFIC GRID WORKFLOWS
DYNAMIC CLOUD PROVISIONING FOR SCIENTIFIC GRID WORKFLOWS Simon Ostermann, Radu Prodan and Thomas Fahringer Institute of Computer Science, University of Innsbruck Technikerstrasse 21a, Innsbruck, Austria
More informationVarious Schemes of Load Balancing in Distributed Systems A Review
741 Various Schemes of Load Balancing in Distributed Systems A Review Monika Kushwaha Pranveer Singh Institute of Technology Kanpur, U.P. (208020) U.P.T.U., Lucknow Saurabh Gupta Pranveer Singh Institute
More informationA Load Balancing Technique for Some CoarseGrained Multicomputer Algorithms
A Load Balancing Technique for Some CoarseGrained Multicomputer Algorithms Thierry Garcia and David Semé LaRIA Université de Picardie Jules Verne, CURI, 5, rue du Moulin Neuf 80000 Amiens, France, Email:
More informationDynamic Load Balancing using Graphics Processors
I.J. Intelligent Systems and Applications, 2014, 05, 7075 Published Online April 2014 in MECS (http://www.mecspress.org/) DOI: 10.5815/ijisa.2014.05.07 Dynamic Load Balancing using Graphics Processors
More informationEfficiency of algorithms. Algorithms. Efficiency of algorithms. Binary search and linear search. Best, worst and average case.
Algorithms Efficiency of algorithms Computational resources: time and space Best, worst and average case performance How to compare algorithms: machineindependent measure of efficiency Growth rate Complexity
More informationAnalysis of Binary Search algorithm and Selection Sort algorithm
Analysis of Binary Search algorithm and Selection Sort algorithm In this section we shall take up two representative problems in computer science, work out the algorithms based on the best strategy to
More informationScalable Computing in the Multicore Era
Scalable Computing in the Multicore Era XianHe Sun, Yong Chen and Surendra Byna Illinois Institute of Technology, Chicago IL 60616, USA Abstract. Multicore architecture has become the trend of high performance
More informationIMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS
Volume 2, No. 3, March 2011 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at www.jgrcs.info IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE
More informationChapter 2 Parallel Architecture, Software And Performance
Chapter 2 Parallel Architecture, Software And Performance UCSB CS140, T. Yang, 2014 Modified from texbook slides Roadmap Parallel hardware Parallel software Input and output Performance Parallel program
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationQuiz for Chapter 1 Computer Abstractions and Technology 3.10
Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [15 points] Consider two different implementations,
More informationDynamic Load Balancing. Using WorkStealing 35.1 INTRODUCTION CHAPTER. Daniel Cederman and Philippas Tsigas
CHAPTER Dynamic Load Balancing 35 Using WorkStealing Daniel Cederman and Philippas Tsigas In this chapter, we present a methodology for efficient load balancing of computational problems that can be easily
More informationPipelining and loadbalancing in parallel joins on distributed machines
NPPAR 05 p. / Pipelining and loadbalancing in parallel joins on distributed machines M. Bamha bamha@lifo.univorleans.fr Laboratoire d Informatique Fondamentale d Orléans (France) NPPAR 05 p. / Plan
More informationIntroduction to Cluster Computing
Introduction to Cluster Computing Brian Vinter vinter@diku.dk Overview Introduction Goal/Idea Phases Mandatory Assignments Tools Timeline/Exam General info Introduction Supercomputers are expensive Workstations
More informationBinary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT Inmemory tree structured index search is a fundamental database operation. Modern processors provide tremendous
More informationOptimizing Shared Resource Contention in HPC Clusters
Optimizing Shared Resource Contention in HPC Clusters Sergey Blagodurov Simon Fraser University Alexandra Fedorova Simon Fraser University Abstract Contention for shared resources in HPC clusters occurs
More information2: Computer Performance
2: Computer Performance http://people.sc.fsu.edu/ jburkardt/presentations/ fdi 2008 lecture2.pdf... John Information Technology Department Virginia Tech... FDI Summer Track V: Parallel Programming 1012
More informationHistorically, Huge Performance Gains came from Huge Clock Frequency Increases Unfortunately.
Historically, Huge Performance Gains came from Huge Clock Frequency Increases Unfortunately. Hardware Solution Evolution of Computer Architectures MicroScopic View Clock Rate Limits Have Been Reached
More informationRelating Empirical Performance Data to Achievable Parallel Application Performance
Published in Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'99), Vol. III, Las Vegas, Nev., USA, June 28July 1, 1999, pp. 16271633.
More informationContributions to Gang Scheduling
CHAPTER 7 Contributions to Gang Scheduling In this Chapter, we present two techniques to improve Gang Scheduling policies by adopting the ideas of this Thesis. The first one, Performance Driven Gang Scheduling,
More informationLocalityPreserving Dynamic Load Balancing for DataParallel Applications on DistributedMemory Multiprocessors
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 18, 10371048 (2002) Short Paper LocalityPreserving Dynamic Load Balancing for DataParallel Applications on DistributedMemory Multiprocessors PANGFENG
More informationGarbage Collection in the Java HotSpot Virtual Machine
http://www.devx.com Printed from http://www.devx.com/java/article/21977/1954 Garbage Collection in the Java HotSpot Virtual Machine Gain a better understanding of how garbage collection in the Java HotSpot
More informationResource Allocation and the Law of Diminishing Returns
esource Allocation and the Law of Diminishing eturns Julia and Igor Korsunsky The law of diminishing returns is a wellknown topic in economics. The law states that the output attributed to an individual
More informationCellular Computing on a Linux Cluster
Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results
More informationInterconnect Efficiency of Tyan PSC T630 with Microsoft Compute Cluster Server 2003
Interconnect Efficiency of Tyan PSC T630 with Microsoft Compute Cluster Server 2003 Josef Pelikán Charles University in Prague, KSVI Department, Josef.Pelikan@mff.cuni.cz Abstract 1 Interconnect quality
More informationMulticore Parallel Computing with OpenMP
Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large
More information18742 Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two
age 1 18742 Lecture 4 arallel rogramming II Spring 2005 rof. Babak Falsafi http://www.ece.cmu.edu/~ece742 write X Memory send X Memory read X Memory Slides developed in part by rofs. Adve, Falsafi, Hill,
More informationJunghyun Ahn Changho Sung Tag Gon Kim. Korea Advanced Institute of Science and Technology (KAIST) 3731 Kuseongdong, Yuseonggu Daejoen, Korea
Proceedings of the 211 Winter Simulation Conference S. Jain, R. R. Creasey, J. Himmelspach, K. P. White, and M. Fu, eds. A BINARY PARTITIONBASED MATCHING ALGORITHM FOR DATA DISTRIBUTION MANAGEMENT Junghyun
More informationParallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel
Parallel Databases Increase performance by performing operations in parallel Parallel Architectures Shared memory Shared disk Shared nothing closely coupled loosely coupled Parallelism Terminology Speedup:
More informationLecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle?
Lecture 3: Evaluating Computer Architectures Announcements  Reminder: Homework 1 due Thursday 2/2 Last Time technology back ground Computer elements Circuits and timing Virtuous cycle of the past and
More informationA Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed
More informationCSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Linda Shapiro Winter 2015
CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis Linda Shapiro Today Registration should be done. Homework 1 due 11:59 pm next Wednesday, January 14 Review math essential
More informationKeys to nodelevel performance analysis and threading in HPC applications
Keys to nodelevel performance analysis and threading in HPC applications Thomas GUILLET (Intel; Exascale Computing Research) IFERC seminar, 18 March 2015 Legal Disclaimer & Optimization Notice INFORMATION
More informationDesigning and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp
Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Welcome! Who am I? William (Bill) Gropp Professor of Computer Science One of the Creators of
More informationChapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup
Chapter 12: Multiprocessor Architectures Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup Objective Be familiar with basic multiprocessor architectures and be able to
More informationResource Models: Batch Scheduling
Resource Models: Batch Scheduling Last Time» Cycle Stealing Resource Model Large Reach, Mass Heterogeneity, complex resource behavior Asynchronous Revocation, independent, idempotent tasks» Resource Sharing
More informationImproving Scalability of OpenMP Applications on Multicore Systems Using Large Page Support
Improving Scalability of OpenMP Applications on Multicore Systems Using Large Page Support Ranjit Noronha and Dhabaleswar K. Panda Network Based Computing Laboratory (NBCL) The Ohio State University Outline
More informationSorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)
Sorting revisited How did we use a binary search tree to sort an array of elements? Tree Sort Algorithm Given: An array of elements to sort 1. Build a binary search tree out of the elements 2. Traverse
More informationMulticore Programming System Overview
Multicore Programming System Overview Based on slides from Intel Software College and MultiCore Programming increasing performance through software multithreading by Shameem Akhter and Jason Roberts,
More informationNAMD2 Greater Scalability for Parallel Molecular Dynamics. Presented by Abel Licon
NAMD2 Greater Scalability for Parallel Molecular Dynamics Laxmikant Kale, Robert Steel, Milind Bhandarkar,, Robert Bunner, Attila Gursoy,, Neal Krawetz,, James Phillips, Aritomo Shinozaki, Krishnan Varadarajan,,
More information32 K. Stride 8 32 128 512 2 K 8 K 32 K 128 K 512 K 2 M
Evaluating a Real Machine Performance Isolation using Microbenchmarks Choosing Workloads Evaluating a Fixedsize Machine Varying Machine Size Metrics All these issues, plus more, relevant to evaluating
More informationA STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti 2, Nidhi Rajak 3
A STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti, Nidhi Rajak 1 Department of Computer Science & Applications, Dr.H.S.Gour Central University, Sagar, India, ranjit.jnu@gmail.com
More informationBENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
More informationA new binary floatingpoint division algorithm and its software implementation on the ST231 processor
19th IEEE Symposium on Computer Arithmetic (ARITH 19) Portland, Oregon, USA, June 810, 2009 A new binary floatingpoint division algorithm and its software implementation on the ST231 processor ClaudePierre
More informationCS550. Distributed Operating Systems (Advanced Operating Systems) Instructor: XianHe Sun
CS550 Distributed Operating Systems (Advanced Operating Systems) Instructor: XianHe Sun Email: sun@iit.edu, Phone: (312) 5675260 Office hours: 2:10pm3:10pm Tuesday, 3:30pm4:30pm Thursday at SB229C,
More informationMAQAO Performance Analysis and Optimization Tool
MAQAO Performance Analysis and Optimization Tool Andres S. CHARIFRUBIAL andres.charif@uvsq.fr Performance Evaluation Team, University of Versailles SQY http://www.maqao.org VIHPS 18 th Grenoble 18/22
More informationIntroduction to Parallel Computing. George Karypis Parallel Programming Platforms
Introduction to Parallel Computing George Karypis Parallel Programming Platforms Elements of a Parallel Computer Hardware Multiple Processors Multiple Memories Interconnection Network System Software Parallel
More informationModern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
More informationMuse Server Sizing. 18 June 2012. Document Version 0.0.1.9 Muse 2.7.0.0
Muse Server Sizing 18 June 2012 Document Version 0.0.1.9 Muse 2.7.0.0 Notice No part of this publication may be reproduced stored in a retrieval system, or transmitted, in any form or by any means, without
More informationA graphoriented task manager for small multiprocessor systems.
A graphoriented task manager for small multiprocessor systems. Xavier Verians 1, JeanDidier Legat 1, JeanJacques Quisquater 1 and Benoit Macq 2 1 Microelectronics Laboratory, Université Catholique de
More informationNotation:  active node  inactive node
Dynamic Load Balancing Algorithms for Sequence Mining Valerie Guralnik, George Karypis fguralnik, karypisg@cs.umn.edu Department of Computer Science and Engineering/Army HPC Research Center University
More informationOffline sorting buffers on Line
Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com
More informationCS/COE 1501 http://cs.pitt.edu/~bill/1501/
CS/COE 1501 http://cs.pitt.edu/~bill/1501/ Lecture 01 Course Introduction Metanotes These notes are intended for use by students in CS1501 at the University of Pittsburgh. They are provided free of charge
More informationEnergy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
More informationAlgorithms and Data Structures
Algorithm Analysis Page 1 BFHTI: Softwareschule Schweiz Algorithm Analysis Dr. CAS SD01 Algorithm Analysis Page 2 Outline Course and Textbook Overview Analysis of Algorithm PseudoCode and Primitive Operations
More informationPaul Brebner, Senior Researcher, NICTA, Paul.Brebner@nicta.com.au
Is your Cloud Elastic Enough? Part 2 Paul Brebner, Senior Researcher, NICTA, Paul.Brebner@nicta.com.au Paul Brebner is a senior researcher in the egovernment project at National ICT Australia (NICTA,
More informationMesh Generation and Load Balancing
Mesh Generation and Load Balancing Stan Tomov Innovative Computing Laboratory Computer Science Department The University of Tennessee April 04, 2012 CS 594 04/04/2012 Slide 1 / 19 Outline Motivation Reliable
More informationInformatica Ultra Messaging SMX SharedMemory Transport
White Paper Informatica Ultra Messaging SMX SharedMemory Transport Breaking the 100Nanosecond Latency Barrier with BenchmarkProven Performance This document contains Confidential, Proprietary and Trade
More informationAn HPC Application Deployment Model on Azure Cloud for SMEs
An HPC Application Deployment Model on Azure Cloud for SMEs Fan Ding CLOSER 2013, Aachen, Germany, May 9th,2013 Rechen und Kommunikationszentrum (RZ) Agenda Motivation Windows Azure Relevant Technology
More informationChapter 18: Database System Architectures. Centralized Systems
Chapter 18: Database System Architectures! Centralized Systems! ClientServer Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and
More informationSMock A Test Platform for the Evaluation of Monitoring Tools
SMock A Test Platform for the Evaluation of Monitoring Tools User Manual Ruth Mizzi Faculty of ICT University of Malta June 20, 2013 Contents 1 Introduction 3 1.1 The Architecture and Design of SMock................
More informationProgram Optimization for Multicore Architectures
Program Optimization for Multicore Architectures Sanjeev K Aggarwal (ska@iitk.ac.in) M Chaudhuri (mainak@iitk.ac.in) R Moona (moona@iitk.ac.in) Department of Computer Science and Engineering, IIT Kanpur
More informationHigh Performance Computing. Course Notes 20072008. HPC Fundamentals
High Performance Computing Course Notes 20072008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define  it s a moving target. Later 1980s, a supercomputer performs
More informationInteractive comment on A parallelization scheme to simulate reactive transport in the subsurface environment with OGS#IPhreeqc by W. He et al.
Geosci. Model Dev. Discuss., 8, C1166 C1176, 2015 www.geoscimodeldevdiscuss.net/8/c1166/2015/ Author(s) 2015. This work is distributed under the Creative Commons Attribute 3.0 License. Geoscientific
More informationCellSWat: Modeling and Scheduling Wavefront Computations on the Cell Broadband Engine
CellSWat: Modeling and Scheduling Wavefront Computations on the Cell Broadband Engine Ashwin Aji, Wu Feng, Filip Blagojevic and Dimitris Nikolopoulos Forecast Efficient mapping of wavefront algorithms
More informationSystem Models for Distributed and Cloud Computing
System Models for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 201014 FIS Distinguished Professor of Computer Science School of Computing, UNF Classification of Distributed Computing Systems
More information22S:295 Seminar in Applied Statistics High Performance Computing in Statistics
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC
More informationSuccessful Implementation of L3B: Low Level Load Balancer
Successful Implementation of L3B: Low Level Load Balancer Kiril Cvetkov, Sasko Ristov, Marjan Gusev University Ss Cyril and Methodius, FCSE, Rugjer Boshkovic 16, Skopje, Macedonia Email: kiril cvetkov@yahoo.com,
More informationAn Introduction to Parallel Computing/ Programming
An Introduction to Parallel Computing/ Programming Vicky Papadopoulou Lesta Astrophysics and High Performance Computing Research Group (http://ahpc.euc.ac.cy) Dep. of Computer Science and Engineering European
More informationHardware Assisted Virtualization
Hardware Assisted Virtualization G. Lettieri 21 Oct. 2015 1 Introduction In the hardwareassisted virtualization technique we try to execute the instructions of the target machine directly on the host
More informationRackspace Cloud Databases and Containerbased Virtualization
Rackspace Cloud Databases and Containerbased Virtualization August 2012 J.R. Arredondo @jrarredondo Page 1 of 6 INTRODUCTION When Rackspace set out to build the Cloud Databases product, we asked many
More informationLarge Scale Simulation on Clusters using COMSOL 4.2
Large Scale Simulation on Clusters using COMSOL 4.2 Darrell W. Pepper 1 Xiuling Wang 2 Steven Senator 3 Joseph Lombardo 4 David Carrington 5 with David Kan and Ed Fontes 6 1 DVPUSAFAUNLV, 2 PurdueCalumet,
More informationP013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE
1 P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE JEANMARC GRATIEN, JEANFRANÇOIS MAGRAS, PHILIPPE QUANDALLE, OLIVIER RICOIS 1&4, av. BoisPréau. 92852 Rueil Malmaison Cedex. France
More informationBSPCloud: A Hybrid Programming Library for Cloud Computing *
BSPCloud: A Hybrid Programming Library for Cloud Computing * Xiaodong Liu, Weiqin Tong and Yan Hou Department of Computer Engineering and Science Shanghai University, Shanghai, China liuxiaodongxht@qq.com,
More informationCentralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures
Chapter 18: Database System Architectures Centralized Systems! Centralized Systems! ClientServer Systems! Parallel Systems! Distributed Systems! Network Types! Run on a single computer system and do
More informationComputer Architecture TDTS10
why parallelism? Performance gain from increasing clock frequency is no longer an option. Outline Computer Architecture TDTS10 Superscalar Processors Very Long Instruction Word Processors Parallel computers
More informationLoad balancing. Prof. Richard Vuduc Georgia Institute of Technology CSE/CS 8803 PNA: Parallel Numerical Algorithms [L.26] Thursday, April 17, 2008
Load balancing Prof. Richard Vuduc Georgia Institute of Technology CSE/CS 8803 PNA: Parallel Numerical Algorithms [L.26] Thursday, April 17, 2008 1 Today s sources CS 194/267 at UCB (Yelick/Demmel) Intro
More informationThe Piranha computer algebra system. introduction and implementation details
: introduction and implementation details Advanced Concepts Team European Space Agency (ESTEC) Course on Differential Equations and Computer Algebra Estella, Spain October 2930, 2010 Outline A Brief Overview
More information3 Extending the Refinement Calculus
Building BSP Programs Using the Refinement Calculus D.B. Skillicorn? Department of Computing and Information Science Queen s University, Kingston, Canada skill@qucis.queensu.ca Abstract. We extend the
More informationNotes from Week 1: Algorithms for sequential prediction
CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 1: Algorithms for sequential prediction Instructor: Robert Kleinberg 2226 Jan 2007 1 Introduction In this course we will be looking
More informationHETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100300 PFLOPS Peak Performance
More informationInstruction scheduling
Instruction ordering Instruction scheduling Advanced Compiler Construction Michel Schinz 2015 05 21 When a compiler emits the instructions corresponding to a program, it imposes a total order on them.
More informationPerformance Evaluation and Analysis of Parallel Computers Workload
, pp.127134 http://dx.doi.org/10.14257/ijgdc.2016.9.1.13 Performance Evaluation and Analysis of Parallel Computers Workload M.Narayana Moorthi 1 and R.Manjula 2 1 Assistant Professor, (SG), SCOPEVIT
More informationMapParallel Scheduling (mps) using Hadoop environment for job scheduler and time span for Multicore Processors
MapParallel Scheduling (mps) using Hadoop environment for job scheduler and time span for Sudarsanam P Abstract G. Singaravel Parallel computing is an base mechanism for data process with scheduling task,
More informationA Comparison of Distributed Systems: ChorusOS and Amoeba
A Comparison of Distributed Systems: ChorusOS and Amoeba Angelo Bertolli Prepared for MSIT 610 on October 27, 2004 University of Maryland University College Adelphi, Maryland United States of America Abstract.
More informationLoad Balancing on a Nondedicated Heterogeneous Network of Workstations
Load Balancing on a Nondedicated Heterogeneous Network of Workstations Dr. Maurice Eggen Nathan Franklin Department of Computer Science Trinity University San Antonio, Texas 78212 Dr. Roger Eggen Department
More information