and Divide and Conquer Strategies Lecture 4 and Strategies
Strategies Data partitioning aka domain decomposition Functional decomposition Lecture 4 and Strategies
Quiz 4.1 For nuclear reactor simulation, what type of partitioning would be most effective? Data decomposition and functional decomposition are both effective for different parts of the simulation. Lecture 4 and Strategies
Strategies Example: Adding Numbers divide sequence of n numbers for m processors each process adds up n/m numbers m partial sums are added for a total operation Master-Slave broadcast numbers using MPI_Scatter compute local sums compute sum on master using MPI_Reduce Lecture 4 and Strategies
Divide and Conquer partitioning continued on smaller and smaller problems recursive definitions M-ary trees, e.g., binary trees Lecture 4 and Strategies
Divide and Conquer Example: Adding Numbers problem division How would this compare to our earlier addition example? divide sequence of n number in two to create two processes with half of the numbers each recurse until enough processes for processors addition add up the numbers in each process problem combination odd processes pass values to even processes even processes add communicated value to local sum logically renumber processes and repeat combining step until one process left. Lecture 4 and Strategies
M-ary Divide and Conquer Same as divide and conquer except that we divide into more pieces at each step. quadtrees Lecture 4 and Strategies
Quiz 4.2 How does the divide and conquer approach used in the previous example address load balancing given that the regions are of such widely varying sizes? By dividing space so that the number of points in each region is about the same. Lecture 4 and Strategies
Bucket Sort sequential buckets sort merge lists Lecture 4 and Strategies
Quiz 4.3 Will each bucket have the same number of elements? Why or why not? No. The number of elements will only be approximately the same if the values are uniformly distributed in the interval. Lecture 4 and Strategies
Bucket Sort parallel VERSION 1 unsorted processors buckets sort merge lists Lecture 4 and Strategies sorted
Quiz 4.4 What is the major problem with the parallel bucket sort just presented? All processes examine every data element and then only process the ones in their sub-interval. Lecture 4 and Strategies
Bucket Sort parallel VERSION 2 processors mini buckets buckets sort merge lists Lecture 4 and Strategies
Quiz 4.5 Which of the two parallel bucket sorts requires more communication to set up the buckets for sorting? Version 1 requires that each process get a copy of all data: n*m. Version 2 requires each process get n/m elements and sends n/m in the worst case: 2*n. Lecture 4 and Strategies
Quiz 4.6 Which of the two parallel bucket sorts will have faster communication to set up the buckets for sorting? It depends on the machine. Lecture 4 and Strategies
Quiz 4.7 Which of the two parallel bucket sorts will have faster computation to set up the buckets for sorting? Version 2 will be faster in setting up buckets (for large problems) because it will use parallelism to put elements in buckets. Lecture 4 and Strategies
Numerical Integration integrate f(x) from a to b i.e., compute area under curve f(x) divide the area so each process computes the area for one region the area under the curve is sum of the areas computed by all of the processes Lecture 4 and Strategies
Numerical Integration midpoint of rectangular regions Lecture 4 and Strategies
Quiz 4.8 How could we test whether we are using enough rectangles for the integration? Do the evaluation for r rectangles and for 2*r rectangles. If the difference is small enough, then there are enough rectangles. Lecture 4 and Strategies
Numerical Integration trapezoid for regions Lecture 4 and Strategies
Numerical Integration adaptive quadrature Lecture 4 and Strategies
Quiz 4.9 How can you address the load imbalance issue in adaptive quadrature? Create a work list of regions to be computed. (work load) Create an initial subdivision with many more pieces than processor and assign multiple pieces to each processor from different areas. (randomized) Lecture 4 and Strategies
Quiz 4.10 In addressing the load imbalance issue in adaptive quadrature with a work list, what issues arise? We now have a shared work list that will cause contention. Lecture 4 and Strategies
Quiz 4.11 In addressing the load imbalance issue in adaptive quadrature with many subdivisions, what issues arise? We can still end up with a processor that has to do many more subdivisions than other processors and therefore has much more work to do. Lecture 4 and Strategies
Quiz 4.12 Can you see any convergence issues that might be possible with adaptive quadrature? Lecture 4 and Strategies
Quiz 4.13 Are the convergence issues with adaptive quadrature any different than with the other approximation methods we discussed? No, something similar can happen with all of them. Lecture 4 and Strategies
N-body Problem typically determine the effects of forces between bodies Gravitational N-body problem find the positions and movements of bodies in space subject to gravitational forces from other bodies using Newton s laws of physics. forces between each pair of bodies is proportional to 1/r 2, where r is the distance between bodies Lecture 4 and Strategies
Quiz 4.14 In an N-body simulation, what communication problem arises for parallelization and why? Since every body s position is a function of every other one on every time step or iteration, a straightforward implementation requires all-to-all communication on every time step. Lecture 4 and Strategies
Gravitational N-body parallelization partition the bodies in 3d space and assign a process to each region of space pass messages for each pair of bodies that captures the force between the bodies Lecture 4 and Strategies
Quiz 4.15 Name two problems that arise with the spatial partitioning of bodies and direct communication of individual forces. Spatial partitioning may cause a large imbalance in work. Individual force communication will cause a large communication overhead. Lecture 4 and Strategies
Gravitational N-body parallelization partition the bodies in 3d space and assign a process to each region of space pass messages for each distant body cluster that captures the force between the cluster of bodies and a single body Lecture 4 and Strategies
Barnes-Hut (N-body) parallelization Start with the 3D space. Partition using an octtree. For any region that has too many particles Recursively partition using an octtree. compute the total mass and center of mass of each cubic region The force on each body can be obtained by traversing the tree starting at the root and stopping when the clustering approximation Lecture 4 is and valid. Strategies
Orthogonal recursive bisection more general than octtree Lecture 4 and Strategies
Quiz 4.16 How has the Barnes-Hut approach addressed a parallelization problem for N-body simulations? It subdivides the bodies so that each processor will have the same (approximate) amount of work. Lecture 4 and Strategies
Divide and Conquer Tree constructions Bucket sort Numerical Integration N-body problem Lecture 4 and Strategies