Partitioning and Divide and Conquer Strategies

Similar documents
Load Balancing and Data Locality in Adaptive Hierarchical N-body Methods: Barnes-Hut, Fast Multipole, and Radiosity

Algorithms. Algorithms GEOMETRIC APPLICATIONS OF BSTS. 1d range search line segment intersection kd trees interval search trees rectangle intersection

Lesson 3. Numerical Integration

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

Section 6.4: Work. We illustrate with an example.

Introduction to Algorithms March 10, 2004 Massachusetts Institute of Technology Professors Erik Demaine and Shafi Goldwasser Quiz 1.

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations

Lecture 21 Integration: Left, Right and Trapezoid Rules

Fast Multipole Method for particle interactions: an open source parallel library component

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations

Vector storage and access; algorithms in GIS. This is lecture 6

Load Balancing Techniques

Lecture 10: Regression Trees

APP INVENTOR. Test Review

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Binary Heap Algorithms

Topological Properties

Estimating the Average Value of a Function

PARALLEL PROGRAMMING

Data Warehousing und Data Mining

Work as the Area Under a Graph of Force vs. Displacement

A Review of Customized Dynamic Load Balancing for a Network of Workstations

G.A. Pavliotis. Department of Mathematics. Imperial College London

W i f(x i ) x. i=1. f(x i ) x = i=1

Parallel Scalable Algorithms- Performance Parameters

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)

Basic Techniques of Parallel Computing/Programming & Examples

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

Optimizing Load Balance Using Parallel Migratable Objects

PROVABLY GOOD PARTITIONING AND LOAD BALANCING ALGORITHMS FOR PARALLEL ADAPTIVE N-BODY SIMULATION

Computer programming course in the Department of Physics, University of Calcutta

Nonlinear Iterative Partial Least Squares Method

CS473 - Algorithms I

Integration. Topic: Trapezoidal Rule. Major: General Engineering. Author: Autar Kaw, Charlie Barker.

Expanding the CASEsim Framework to Facilitate Load Balancing of Social Network Simulations

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13

Fast Sequential Summation Algorithms Using Augmented Data Structures

(b)using the left hand end points of the subintervals ( lower sums ) we get the aprroximation

Load Balancing. Load Balancing 1 / 24

MATH 132: CALCULUS II SYLLABUS

Clustering Very Large Data Sets with Principal Direction Divisive Partitioning

SAT Subject Math Level 1 Facts & Formulas

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Numerical Analysis An Introduction

AP Computer Science AB Syllabus 1

Area Under the Curve. Riemann Sums And the Trapezoidal Rule

A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment

P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE

Physics 111: Lecture 4: Chapter 4 - Forces and Newton s Laws of Motion. Physics is about forces and how the world around us reacts to these forces.

ACT Math Facts & Formulas

Euclidean Minimum Spanning Trees Based on Well Separated Pair Decompositions Chaojun Li. Advised by: Dave Mount. May 22, 2014

Chapter 13: Query Processing. Basic Steps in Query Processing

AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS

Why the Network Matters

6. Standard Algorithms

3 Extending the Refinement Calculus

Algorithm Visualization through Animation and Role Plays

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

2008 AP Calculus AB Multiple Choice Exam

CS 575 Parallel Processing

Static Load Balancing of Parallel PDE Solver for Distributed Computing Environment

ABSTRACT FOR THE 1ST INTERNATIONAL WORKSHOP ON HIGH ORDER CFD METHODS

Chapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup

Hadoop Design and k-means Clustering

HPC Deployment of OpenFOAM in an Industrial Setting

Chapter 4 - Lecture 1 Probability Density Functions and Cumul. Distribution Functions

Big Data & Scripting Part II Streaming Algorithms

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Describe the process of parallelization as it relates to problem solving.

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2

Big Data: Big N. V.C Note. December 2, 2014

Big Data Systems CS 5965/6965 FALL 2015

Efficient Algorithms for Molecular Dynamics Simulations and Other Dynamic Spatial Join Queries

Introduction to Parallel Programming and MapReduce

Chapter Objectives. Chapter 9. Sequential Search. Search Algorithms. Search Algorithms. Binary Search

A Refinement-tree Based Partitioning Method for Dynamic Load Balancing with Adaptively Refined Grids

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS

Frsq: A Binary Image Coding Method

Sorting Algorithms. Nelson Padua-Perez Bill Pugh. Department of Computer Science University of Maryland, College Park

Big Data and Scripting. Part 4: Memory Hierarchies

Characterizing the Performance of Dynamic Distribution and Load-Balancing Techniques for Adaptive Grid Hierarchies

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

Mesh Generation and Load Balancing

Transcription:

and Divide and Conquer Strategies Lecture 4 and Strategies

Strategies Data partitioning aka domain decomposition Functional decomposition Lecture 4 and Strategies

Quiz 4.1 For nuclear reactor simulation, what type of partitioning would be most effective? Data decomposition and functional decomposition are both effective for different parts of the simulation. Lecture 4 and Strategies

Strategies Example: Adding Numbers divide sequence of n numbers for m processors each process adds up n/m numbers m partial sums are added for a total operation Master-Slave broadcast numbers using MPI_Scatter compute local sums compute sum on master using MPI_Reduce Lecture 4 and Strategies

Divide and Conquer partitioning continued on smaller and smaller problems recursive definitions M-ary trees, e.g., binary trees Lecture 4 and Strategies

Divide and Conquer Example: Adding Numbers problem division How would this compare to our earlier addition example? divide sequence of n number in two to create two processes with half of the numbers each recurse until enough processes for processors addition add up the numbers in each process problem combination odd processes pass values to even processes even processes add communicated value to local sum logically renumber processes and repeat combining step until one process left. Lecture 4 and Strategies

M-ary Divide and Conquer Same as divide and conquer except that we divide into more pieces at each step. quadtrees Lecture 4 and Strategies

Quiz 4.2 How does the divide and conquer approach used in the previous example address load balancing given that the regions are of such widely varying sizes? By dividing space so that the number of points in each region is about the same. Lecture 4 and Strategies

Bucket Sort sequential buckets sort merge lists Lecture 4 and Strategies

Quiz 4.3 Will each bucket have the same number of elements? Why or why not? No. The number of elements will only be approximately the same if the values are uniformly distributed in the interval. Lecture 4 and Strategies

Bucket Sort parallel VERSION 1 unsorted processors buckets sort merge lists Lecture 4 and Strategies sorted

Quiz 4.4 What is the major problem with the parallel bucket sort just presented? All processes examine every data element and then only process the ones in their sub-interval. Lecture 4 and Strategies

Bucket Sort parallel VERSION 2 processors mini buckets buckets sort merge lists Lecture 4 and Strategies

Quiz 4.5 Which of the two parallel bucket sorts requires more communication to set up the buckets for sorting? Version 1 requires that each process get a copy of all data: n*m. Version 2 requires each process get n/m elements and sends n/m in the worst case: 2*n. Lecture 4 and Strategies

Quiz 4.6 Which of the two parallel bucket sorts will have faster communication to set up the buckets for sorting? It depends on the machine. Lecture 4 and Strategies

Quiz 4.7 Which of the two parallel bucket sorts will have faster computation to set up the buckets for sorting? Version 2 will be faster in setting up buckets (for large problems) because it will use parallelism to put elements in buckets. Lecture 4 and Strategies

Numerical Integration integrate f(x) from a to b i.e., compute area under curve f(x) divide the area so each process computes the area for one region the area under the curve is sum of the areas computed by all of the processes Lecture 4 and Strategies

Numerical Integration midpoint of rectangular regions Lecture 4 and Strategies

Quiz 4.8 How could we test whether we are using enough rectangles for the integration? Do the evaluation for r rectangles and for 2*r rectangles. If the difference is small enough, then there are enough rectangles. Lecture 4 and Strategies

Numerical Integration trapezoid for regions Lecture 4 and Strategies

Numerical Integration adaptive quadrature Lecture 4 and Strategies

Quiz 4.9 How can you address the load imbalance issue in adaptive quadrature? Create a work list of regions to be computed. (work load) Create an initial subdivision with many more pieces than processor and assign multiple pieces to each processor from different areas. (randomized) Lecture 4 and Strategies

Quiz 4.10 In addressing the load imbalance issue in adaptive quadrature with a work list, what issues arise? We now have a shared work list that will cause contention. Lecture 4 and Strategies

Quiz 4.11 In addressing the load imbalance issue in adaptive quadrature with many subdivisions, what issues arise? We can still end up with a processor that has to do many more subdivisions than other processors and therefore has much more work to do. Lecture 4 and Strategies

Quiz 4.12 Can you see any convergence issues that might be possible with adaptive quadrature? Lecture 4 and Strategies

Quiz 4.13 Are the convergence issues with adaptive quadrature any different than with the other approximation methods we discussed? No, something similar can happen with all of them. Lecture 4 and Strategies

N-body Problem typically determine the effects of forces between bodies Gravitational N-body problem find the positions and movements of bodies in space subject to gravitational forces from other bodies using Newton s laws of physics. forces between each pair of bodies is proportional to 1/r 2, where r is the distance between bodies Lecture 4 and Strategies

Quiz 4.14 In an N-body simulation, what communication problem arises for parallelization and why? Since every body s position is a function of every other one on every time step or iteration, a straightforward implementation requires all-to-all communication on every time step. Lecture 4 and Strategies

Gravitational N-body parallelization partition the bodies in 3d space and assign a process to each region of space pass messages for each pair of bodies that captures the force between the bodies Lecture 4 and Strategies

Quiz 4.15 Name two problems that arise with the spatial partitioning of bodies and direct communication of individual forces. Spatial partitioning may cause a large imbalance in work. Individual force communication will cause a large communication overhead. Lecture 4 and Strategies

Gravitational N-body parallelization partition the bodies in 3d space and assign a process to each region of space pass messages for each distant body cluster that captures the force between the cluster of bodies and a single body Lecture 4 and Strategies

Barnes-Hut (N-body) parallelization Start with the 3D space. Partition using an octtree. For any region that has too many particles Recursively partition using an octtree. compute the total mass and center of mass of each cubic region The force on each body can be obtained by traversing the tree starting at the root and stopping when the clustering approximation Lecture 4 is and valid. Strategies

Orthogonal recursive bisection more general than octtree Lecture 4 and Strategies

Quiz 4.16 How has the Barnes-Hut approach addressed a parallelization problem for N-body simulations? It subdivides the bodies so that each processor will have the same (approximate) amount of work. Lecture 4 and Strategies

Divide and Conquer Tree constructions Bucket sort Numerical Integration N-body problem Lecture 4 and Strategies