Load Balancing Strategies for Multi-Block Overset Grid Applications NAS
|
|
|
- Jade Morrison
- 10 years ago
- Views:
Transcription
1 Load Balancing Strategies for Multi-Block Overset Grid Applications NAS M. Jahed Djomehri Computer Sciences Corporation, NASA Ames Research Center, Moffett Field, CA Rupak Biswas NAS Division, NASA Ames Research Center, Moffett Field, CA Noe Lopez-Benitez Department of Computer Science, Texas Tech University, Lubbock, TX Abstract The multi-block overset grid method is a powerful technique for high-fidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process uses a grid system that discretizes the problem domain by using separately generated but overlapping structured grids that periodically update and exchange boundary information through interpolation. For efficient high performance computations of large-scale realistic applications using this methodology, the individual grids must be properly partitioned among the parallel processors. Overall performance, therefore, largely depends on the quality of load balancing. In this paper, we present three different load balancing strategies for overset grids and analyze their effects on the parallel efficiency of a Navier-Stokes CFD application running on an SGI Origin000 machine. 1 Introduction The multi-block overset grid technique [] is a powerful method for high-fidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process uses a grid system that discretizes the problem domain by using separately generated but overlapping structured grids that periodically update and exchange boundary information through Chimera interpolation [1]. However, to reduce time-to-solution, high performance computations of largescale realistic applications using this overset grid methodology must be performed efficiently on 1
2 state-of-the-art parallel supercomputers. Fortunately, a message passing paradigm can be readily employed to exploit coarse-grained parallelism at the grid level as well as communicate boundary data between distributed overlapping grids. The parallel efficiency of the overset approach, however, depends upon the proper distribution of the computational workload and the communication overhead among the processors. Applications with tens of millions of grid points may consist of many overlapping grids. A smart partitioning of the individual grids (also known as blocks or zones) among the processors should therefore not only consider the total number of grid points, but also the size and connectivity of the inter-grid data. In fact, to be more accurate, the computational and communication times ought to be used to perform the load balancing. Major challenges during the clustering process may arise due to the wide variation in block sizes and the disparity in the number of inter-grid boundary points. In this paper, we present three different load balancing strategies for overset grid applications. The first uses a simple bin-packing algorithm to maintain uniform computational loads across processors while retaining some degree of connectivity among the grids assigned to each processor. The other two load balancing techniques are more sophisticated and based on graph partitioning. One uses the spectral bisection algorithm available within the Chaco package [5] to balance processor workloads and minimize total communication. The other uses a task assignment heuristic from EVAH [8] to minimize the total execution time. We analyze the effects of all three methods on the parallel efficiency of a Navier-Stokes CFD application called OVERFLOW-D [9]. Our experiments are conducted on an SGI Origin000 machine at NASA Ames Research Center using a test case that simulates complex rotorcraft vortex dynamics. The grid system consists of 857 blocks and approximately 69 million grid points. Results indicate that graph partitioning-based strategies perform better than a naive bin-packing algorithm and that the EVAH task assignment heuristic is generally superior to the Chaco spectral technique. The remainder of this paper is organized as follows. Section provides a brief description of the OVERFLOW-D overset grid application. The three load balancing techniques are described in Section 3. Parallel performance results are presented and critically analyzed in Section 4. Finally, Section 5 concludes the paper with a summary and some key observations.
3 z 0 z 3 Group 1 Inter group exchanges Donor Receiver z 1 Four overset grids z Group Intra group exchanges (a) (b) Figure 1: (a) Example of an overset grid system; (b) schematic of intra-group and inter-group communication. Overset Grid Application In this section, we provide an overview of the overset grid CFD application called OVERFLOW-D, including the basics of its solution process, the Chimera interpolation of inter-grid boundary data, and a message-passing parallelization model..1 Solution Process The high-fidelity overset grid application called OVERFLOW-D [9] is a special version of OVER- FLOW [] that owes its popularity within the aerodynamics community due to its ability to handle complex configurations. These designs typically consist of multiple geometric components, where individual body-fitted grids can be constructed easily about each component. The grids are either attached to the aerodynamics configuration (near-body) or detached (off-body). The union of all near- and off-body grids covers the entire computational domain. An overset system consisting of four grids (z 0, z 1, z, and z 3 ) is shown in Fig. 1(a). Both OVERFLOW and OVERFLOW-D use a Reynolds-averaged Navier-Stokes solver, augmented with a number of turbulence models. However, unlike OVERFLOW which is primarily meant for static grid systems, OVERFLOW-D is explicitly designed to simplify the modeling of components in relative motion (dynamic grid systems). For example, in typical rotary-wing problems, the near-field is modeled with one or more grids around the moving rotor blades. The code 3
4 then automatically generates Cartesian background (wake) grids that encompass these curvilinear near-body grids. At each time step, the flow equations are solved independently on each zone in a sequential manner. Overlapping boundary inter-grid data is updated from previous solutions prior to the start of the current time step using a Chimera interpolation procedure [1]. The code uses finite differences in space, with a variety of spatial differencing and implicit/explicit temporal time-stepping.. Chimera Interpolation The Chimera interpolation procedure [1] determines the proper connectivity of the individual grids. Adjacent grids are expected to have at least a one-cell (single fringe) overlap to ensure the continuity of the solutions; for higher-order accuracy and to retain certain physical features in the solution, a double fringe overlap is sometimes used [13]. A program named Domain Connectivity Function (DCF) [10] computes the inter-grid donor points that have to be supplied to other grids (see Fig. 1). The DCF procedure is incorporated into the OVERFLOW-D code and fully coupled with the flow solver. All boundary exchanges are conducted at the beginning of every time step based on the interpolatory updates from the previous time step. In addition, for dynamic grid systems, DCF has to be invoked at every time step to create new holes and inter-grid boundary data..3 Message Passing Parallelization A parallel message passing (based on MPI) version of the OVERFLOW-D application has been developed around its multi-block feature which offers a natural coarse-grained parallelism [16]. The top-level computational logic of the sequential code consists of a time-loop and a nested gridloop. Within the grid-loop, solutions are obtained on the individual grids with imposed boundary conditions, where the Chimera interpolation procedure successively updates inter-grid boundaries after computing the numerical solution on each grid. Upon completion of the grid-loop, the solution is advanced to the next time step by the time-loop. The overall procedure may be thought of as a Gauss-Seidel iteration. To facilitate parallel execution, a grouping strategy is required to assign each grid to an MPI process. The total number of groups, G, is equal to the total number of MPI processes, P. Since a grid can only belong in one group, the total number of grids, Z, must be at least equal to P. If Z is larger than P, a group will consist of more than one grid. However, the parallel efficiency 4
5 of the overset approach depends critically on how the grouping is performed. In other words, the individual grids must be partitioned among the processes so that the computational workload is balanced and the communication overhead is minimized. Three different techniques for clustering grids into groups in a load balanced fashion are discussed in Section 3. In the MPI version of OVERFLOW-D, the grid-loop is therefore split into two nested loops: one over the groups (called the group-loop) and the other over the grids within each group. Since each MPI process is assigned to only one group, the group-loop is performed in parallel, with each process performing its own sequential grid-loop. The inter-grid/intra-group boundary updates among the grids within each group are conducted as in the serial case. However, inter-group exchanges between the corresponding processes are performed via MPI calls using send/receive buffers where donor points from grids in one group contribute to the solution at receiver points in another group (see Fig. 1(b)). The communication can be synchronous or asynchronous, but the choice significantly affects the MPI programming model. The current version of OVERFLOW-D uses asynchronous message passing that relaxes the communication schedule in order to hide latency. Unlike the original synchronous model [16], these non-blocking invocations place no constraints on each other in terms of completion. Receive completes immediately, even if no messages are available, and hence allows maximal concurrency. In general, however, control flow and debugging can become a serious problem if, for instance, the order of messages needs to be preserved. Fortunately, in the overset grid application, the Chimera boundary updates take place at the completion of each time step, and the computations are independent of the order in which messages are sent or received. Being able to exploit this fact allows us to easily use asynchronous communication within OVERFLOW-D. 3 Load Balancing Strategies Proper load balancing is critically important for efficient parallel computing. The objective is to distribute equal computational workloads among the processors while minimizing the interprocessor communication cost. On a given platform, the primary procedure that affects the load balancing of an overset grid application is the grid grouping strategy. To facilitate parallel execution, each grid must be assigned to an MPI process. In other words, the Z grids need to be clustered into G groups, where G is equal to the total number of processes, P. Unfortunately, the sizes of the Z grids may vary substantially and there may be wide disparity in the number of inter-grid 5
6 boundary points. Both these factors complicate the grouping procedure and affect the quality of overall load balance. In this section, we present three different grouping strategies for overset grid applications. The first uses a simple bin-packing algorithm while the other two are more sophisticated techniques based on graph partitioning. All three methods depend only on the characteristics of the grids and their connectivity; they do not take into account the topology of the physical processors. The assignment of groups to processors is somewhat random, and is taken care of by the operating system usually based on a first-touch strategy at the time of the run. 3.1 Bin-Packing Algorithm The original parallel version of OVERFLOW-D uses a grid grouping strategy based on a bin-packing algorithm [16]. It is one of the simplest clustering techniques that strives to maintain a uniform number of weighted grid points per group while retaining some degree of connectivity among the grids within each group. Prior to the grouping procedure, each grid is weighted depending on the physics of the solution sought. The goal is to ensure that each weighted grid point requires the same amount of computational work. For instance, the execution time per point belonging to near-body grids requiring viscous solutions is higher than that for the inviscid solutions of off-body grids. The weight can also be deduced from the presence or absence of a turbulence model. The bin-packing algorithm then sorts the grids by size in descending order, and assigns a grid to every empty group. Therefore, at this point, the G largest grids are each in a group by themselves. The remaining Z G grids are then handled one at a time: each is assigned to the smallest group that satisfies the connectivity test with other grids in that group. The connectivity test only inspects for an overlap between a pair of grids, regardless of the size of the boundary data or their connectivity to other neighboring grids. The process terminates when all grids are assigned to groups. 3. Chaco Graph Partitioning Heuristic Graph partitioning has been used in many areas of scientific computing for the purpose of load balancing. For example, it is frequently utilized to order sparse matrices for direct solutions, decompose large parallel computing applications, and optimize VLSI circuit designs. It is particularly popular for load balancing parallel CFD calculations on unstructured grids [15] and handling dynamic solution-adaptive irregular meshes [1]. As an analogy to this unstructured-grid approach, we can construct a graph representation of 6
7 the OVERFLOW-D overset grid system. The nodes of the graph correspond to the near- and offbody grids, while an edge exists between two nodes if the corresponding grids overlap. The nodes are weighted by the weighted number of grid points and the edges by the communication volume. Given such a graph with Z nodes and E edges, the problem is to divide the nodes into G sets such that the sum of the nodal weights in each set are almost equal while the sum of the cut edges weights is minimized. Such a grouping strategy is, in theory, more optimal than bin-packing in that the grids in each group enjoy higher intra-group dependencies with fewer inter-group exchanges. Two popular graph partitioning software packages have been interfaced with OVERFLOW-D. The first is METIS [7], a widely used multilevel partitioner that first reduces the size of the graph by collapsing edges, then partitions the coarse graph, and finally projects it back to the original. Unfortunately, METIS is primarily suitable for large graphs and does not perform well with overset grid systems since the graph representations usually contain at most a few hundred nodes. The second package, called Chaco [5], includes a variety of heuristic partitioning algorithms based on inertial, spectral, and Kernighan-Lin (KL) multilevel principles, as well as a few other simpler strategies. All algorithms employ global partitioning methods, but KL is a local refinement technique that can be coupled to global methods to improve partition quality. We have experimented with several Chaco partitioners, but the results reported in this paper were all obtained using the spectral algorithm. Spectral methods use the eigenvectors of the Laplacian matrix of the graph to partition it. The spectral algorithm in Chaco is a weighted version of recursive spectral bisection (RSB) [11] and uses a Lanczos-based eigensolver. A detailed description of the algorithm can be found in [6]. 3.3 EVAH Task Assignment Heuristic The other graph partitioning-based algorithm that we have implemented in OVERFLOW-D to address the grid grouping problem uses the EVAH package [8]. EVAH consists of a set of allocation heuristics that considers the constraints inherent in multi-block CFD problems. It was initially developed to predict the performance scalability of overset grid applications on large numbers of processors, particularly within the context of distributed grid computing across multiple resources [3]. In this work, we have modified EVAH to cluster overset grids into groups while taking into account their relative overlaps. The overall goal is to minimize the total execution time. Among several heuristics that are available within EVAH, we have used the one called Largest Task First with Minimum Finish Time and Available Communication Costs (LTF MFT ACC). In 7
8 begin procedure LTF MFT ACC 1: sort tasks z i, i = 1,,... Z in descending order by size (LTF policy) : for each processor p j, j = 1,,... P do T (p j ) = 0 3: for each sorted task z i, i = 1,,... Z do 3.1: assign task z i to processor p j with minimum T (p j ) (LTF MFT policy) compute T (p j ) = T (p j ) + X i 3.: for each task z r R(z i ) assigned to processor p k p j do T (p j ) = T (p j ) + C ir 3.3: for each task z d D(z i ) assigned to processor p m p j do T (p m ) = T (p m ) + C di 4: end for end procedure Figure : Outline of the LTF MFT ACC task assignment heuristic from EVAH. the context of the current work, a task is synonymous with a block in the overset grid system. The size of a task is defined as the computation time for the corresponding block. LTF MFT ACC is constructed from the basic Largest Task First (LTF) policy that sorts the tasks in descending order by size. LTF is then enhanced by the systematic integration of the status of the processors in the form of Minimum Finish Time (LTF MFT). Finally, because of the overhead involved due to data exchanges between neighboring zones, LTF MFT is further augmented to obtain the LTF MFT ACC task assignment heuristic. To model their impact on the overall execution time, LTF MFT ACC utilizes communication costs which are estimated from the inter-grid data volume and the interprocessor communication rate. A procedure has been developed to interface the DCF subroutine of OVERFLOW-D with EVAH heuristics. An outline of the LTF MFT ACC procedure is presented in Fig.. It is easiest to explain the LTF MFT ACC grouping strategy by using a simple example and then stepping through the procedure in Fig.. Figure 3(a) shows a graph representation of the overset grid system in Fig. 1(a) that is being partitioned across two processors. The computational time for block z i is denoted as X i and shown for all four blocks in Fig. 3(a). Similarly, the communication overhead from block z d (donor) to another block z r (receiver) is denoted as C dr and shown for all inter-grid data exchanges along the graph edges. In step 1, the four blocks are sorted in descending order by computational time; hence the order is: z 3, z, z 0, z 1. In step, the total execution times of the two processors are initialized: T (p 0 ) = T (p 1 ) = 0. Step 3 has to be executed four times since 8
9 X = z 1 z X = 40 3 X = 60 1 z 0 3 (a) X 3 = 75 z 3 1: sorted z 3, z, z 0, z 1 3.1: z 0 p 1 : T (p 0 ) = T (p 1 ) = 0 T (p 1 ) = = : z 3 p 0 3.: T (p 1 ) = = 117 T (p 0 ) = = : T (p 0 ) = 79 + = : z p 1 3.1: z 1 p 0 T (p 1 ) = = 60 T (p 0 ) = = 11 3.: T (p 1 ) = = 64 3.: T (p 0 ) = = 1 3.3: T (p 0 ) = = 79 3.: T (p 0 ) = = : T (p 1 ) = = : T (p 1 ) = 10 + = 1 (b) Figure 3: (a) Graph representation of overset grid system in Fig. 1; (b) stepping through the LTF MFT ACC procedure in Fig.. we have four grids that must be grouped. Block z 3 is assigned to processor p 0 and T (p 0 ) = 75 in step 3.1. Since no other blocks have yet been assigned, steps 3. and 3.3 are not executed. Now z must be mapped to a processor. According to the LTF MFT policy, z is assigned to the processor that currently has the smallest total execution time; thus, z goes to p 1 and T (p 1 ) = 60 in step 3.1. In step 3., we need to look at all grids assigned to processors other than p 1 that are also receivers of inter-grid data from z. This set of grids is denoted as R(z ) in Fig.. The graph in Fig. 3(a) shows that z 1 and z 3 are in R(z ); however, z 1 is still unassigned. Since z 3 belongs to p 0, the communication overhead C 3 is added to T (p 1 ); hence, T (p 1 ) = 64. Similarly, in step 3.3, the set D(z ) consists of grids that are donors of inter-grid data to z. Because z 1 is unassigned and z 3 is mapped to p 0, T (p 0 ) is updated with C 3 ; thus, T (p 0 ) = 79. The remainder of the grouping procedure is shown in Fig. 3(b). Notice that the parallel execution time is 13 (max j T (p j )) whereas the serial execution time is 5 ( i X i). One important advantage of EVAH over standard graph partitioners like Chaco is that EVAH is able to handle directed graphs. This is critical for overset grid systems where the volume of intergrid data is not necessarily symmetric between two overlapping grids. For example, C 01 C 10 in Fig. 3. For our experiments with Chaco, we generated undirected graphs by collapsing parallel edges and setting the edge weight to be the average of the two weights. Therefore, C 01 and C 10 would be replaced by a single undirected edge with weight
10 (a) (b) Figure 4: (a) A cut through the overset grid system surrounding the hub and rotors; (b) computed vorticity magnitude contours on a cutting plane located 45 o behind the rotor blade. 4 Experimental Results The CFD problem used for the experiments in this paper is a Navier-Stokes simulation of vortex dynamics in the complex wake flow region around hovering rotors [14]. The Cartesian off-body wake grids surround the curvilinear near-body grids with uniform resolution, but become gradually coarser upon approaching the outer boundary of the computational domain. The overset grid system consists of 857 blocks and approximately 69 million grid points; Fig. 4(a) shows a cut through it. Figure 4(b) presents computed vorticity magnitude contours on a cutting plane located 45 o behind the rotor blade. All experiments were run on the 51-processor 400 MHz SGI Origin000 shared-memory system at NASA Ames Research Center. Due to the size of the test problem, our runs were conducted for only 100 time steps. The timing results are averaged over the number of iterations and given in seconds. Table 1 shows a comparison of the three load balancing strategies: bin-packing, Chaco, and EVAH, for a range of processor sets. The execution time T exec is the average time required to solve every time step of the application, and includes the computation, communication, Chimera interpolation, and processor idle times. The average computation (T avg comp) and communication (Tcomm avg ) times over P processors are also shown. Finally, the maximum computation (T max comp ) and communication (T max comm) times are reported and used to measure the quality of load balancing for each run. The computation load balance factor (LB comp ) is the ratio of Tcomp max to T avg comp, while the 10
11 communication load balance factor (LB comm ) is the ratio of T max comm to T avg comm. Obviously, the closer these factors are to unity, the higher is the quality of load balancing. Table 1: Run times (in seconds) and load imbalance factors with bin-packing, Chaco, and EVAH grid grouping strategies P T exec T max comp T avg comp Bin-packing algorithm T max comm T avg comm LB comp LB comm P T exec T max comp Chaco graph partitioning heuristic T avg comp T max comm T avg comm LB comp LB comm P T exec T max comp EVAH task assignment heuristic T avg comp T max comm T avg comm LB comp LB comm Notice that the computational workload for all three strategies is identical as is evidenced by the fact that T avg comp is essentially the same for a given value of P. However, T exec is almost always the lowest when using the EVAH-based task assignment heuristic for load balancing. This is expected since the goal of this strategy is to minimize the total execution time. On the other hand, the communication times T max comm and T avg comm for Chaco are about a factor of two lower than the corresponding timings for the bin-packing algorithm and EVAH. This is because standard graph 11
12 partitioners like Chaco strive to minimize total communication time. The overall quality of computational workload balance for the three strategies can be observed by comparing LB comp from Table 1. EVAH returns the best results, demonstrating its overall superiority. Rather surprisingly, Chaco performs worse than bin-packing. We believe that this is because graph partitioners are usually unable to handle small graphs containing only a few hundred nodes. Obviously, LB comp increases with P since load balancing becomes more challenging with a fixed problem size. Results in Table 1 also show that the quality of communication load balancing LB comm is a big drawback of the bin-packing algorithm. Both Chaco and EVAH improve this factor considerably. In fact, except for P = 384, LB comm using EVAH is at most It should be noted here that though we are evaluating LB comp from the computational run times, the grid grouping strategies are all based on the number of weighted grid points. The three left plots in Fig. 5 show the distribution of weighted grid points across 64, 18, and 56 processors for all three load balancing strategies. The predicted values of LB comp are also reported and are typically somewhat better than those computed from actual run times (see Table 1), but demonstrate the same overall trend. The three plots on the right in Fig. 5 present a more detailed report of the execution, computation, and communication times per processor for P = 64, 18, and 56. Due to the synchronization of MPI processes, T exec is independent of the processor ID and are shown at the top of the plot area for each case. For the sake of clarity, T comm is shown at a different scale indicated by the right vertical axis. Observe that the EVAH T comp curves are consistently much smoother than those for the other grouping strategies. For the T comm curves, both Chaco and EVAH show a much more uniform distribution across processors than bin-packing. Scalability beyond 56 processors is generally poor as a result of using a fixed problem size that is independent of the number of processors. For example, when P = 448, each group contains, on average, only two grids (since Z = 857). With such a low number of blocks per group, the effectiveness of any strategy is diminished; moreover, the communication overhead relative to computation increases substantially. For instance, with low processor counts, the communication-to-computation ratio is less than 6%, but grows to more than 37% with higher counts. 1
13 Weighted grid points 6.0E E E+06 1.E E+06 Predicted LB comp Bin-packing 1.08 Chaco 1.06 EVAH Processor ID T Comp & T Exec (solid lines) Bin-packing Chaco EVAH Processor ID T Comm (dashed lines) Weighted grid points.0e E E E+05 Predicted LB comp Bin-packing 1.15 Chaco 1.11 EVAH Processor ID T Comp & T Exec (solid lines) Bin-packing Chaco EVAH Processor ID T Comm (dashed lines) Weighted grid points 1.0E+05.0E E E E+05 Predicted LB comp Bin-packing 1.8 Chaco 1.5 EVAH Processor ID T Comp & T Exec (solid lines) Bin-packing Chaco EVAH Processor ID T Comm (dashed lines) Figure 5: Distribution of weighted grid points (left) and execution, computation, and communication times (right) per processor for P = 64, 18, and 56 (top to bottom). 13
14 5 Summary and Conclusions The multi-block overset grid technique is a powerful tool for solving complex realistic computational fluid dynamics (CFD) problems at high fidelity; however, parallel performance depends critically on the quality of load balancing. In this paper, we presented and analyzed three different load balancing strategies for overset grid applications. The first used a simple bin-packing algorithm while the other two were based on graph partitioning techniques. All experiments were performed with the OVERFLOW-D Navier-Stokes code on a 51-processor Origin000 system at NASA Ames Research Center. The CFD application was the simulation of vortex dynamics in the complex flow region around hovering rotors. The grid system consisted of 857 blocks and almost 69 million grid points. Overall results indicated that graph partitioning-based strategies perform better than naive bin-packing due to their more comprehensive consideration of grid connectivity. However, standard graph partitioners did not perform too well because of their inability to effectively handle small graphs. Instead, a grid clustering technique based on task assignment heuristics with the goal of reducing the total execution time performed extremely favorably by improving the communication balance. Further improvements in the scalability of the overset grid methodology could be sought by using a more sophisticated parallel programming paradigm especially when the number of blocks Z is comparable to the number of processors P, or even when P > Z. One potential strategy that can be exploited on SMP clusters is to use a hybrid MPI+OpenMP multilevel programming style [4]. This approach is currently under investigation. Acknowledgements The work of the first author was supported by NASA Ames Research Center under Contract Number DTTS59-99-D-00437/A6181D with AMTI/CSC. The work of the third author was supported by the Texas Tech University Excellence Fund. References [1] R. Biswas, S.K. Das, D.J. Harvey, and L. Oliker, Parallel dynamic load balancing strategies for adaptive irregular applications, Applied Mathematical Modelling, 5 (000)
15 [] P.G. Buning, W. Chan, K.J. Renze, D.Sondak, I.T. Chiu, J.P. Slotnick, R. Gomez, and D. Jespersen, Overflow User s Manual Version 1.6au, NASA Ames Research Center, Moffett Field, CA, [3] M.J. Djomehri, R. Biswas, R.F. Van der Wijngaart, and M. Yarrow, Parallel and distributed computational fluid dynamics: Experimental results and challenges, in: Proc. 7th Intl. High Performance Computing Conf., LNCS 1970 (000) [4] M.J. Djomehri and H. Jin, Hybrid MPI+OpenMP programming of an overset CFD solver and performance investigations, NASA Ames Research Center, NAS Technical Report NAS (00). [5] B. Hendrickson and R. Leland, The Chaco User s Guide Version.0, Tech. Rep. SAND95-344, Sandia National Laboratories, Albuquerque, NM, [6] B. Hendrickson and R. Leland, An improved spectral graph partitioning algorithm for mapping parallel omputations, SIAM J. on Scientific Computing, 16 (1995) [7] G. Karypis and V. Kumar, A fast and high quality multilevel scheme for partitioning irregular graphs, SIAM J. on Scientific Computing, 0 (1998) [8] N. Lopez-Benitez, M.J. Djomehri, and R. Biswas, Task assignment heuristics for distributed CFD applications, in: Proc. 30th Intl. Conf. on Parallel Processing Workshops, (001) [9] R. Meakin, On adaptive refinement and overset structured grids, in: Proc. 13th AIAA Computational Fluid Dynamics Conf., Paper (1997). [10] R. Meakin and A.M. Wissink, Unsteady aerodynamic simulation of static and moving bodies using scalable computers, in: Proc. 14th AIAA Computational Fluid Dynamics Conf., Paper (1999). [11] H.D. Simon, Partitioning of unstructured problems for parallel processing, Computing Systems in Engineering, (1991) [1] J. Steger, F. Dougherty, and J. Benek, A Chimera grid scheme, ASME FED, 5 (1983). [13] R.C. Strawn and J.U. Ahmad, Computational modeling of hovering rotors and wakes, in: Proc. 38th AIAA Aerospace Sciences Mtg., Paper (000). 15
16 [14] R.C. Strawn and M.J. Djomehri, Computational modeling of hovering rotor and wake aerodynamics, in: Proc. 57th Annual Forum of the American Helicopter Society, (001). [15] R. Van Driessche and D. Roose, Load balancing computational fluid dynamics calculations on unstructured grids, in: Parallel Computing in CFD, AGARD-R-807 (1995).1.6. [16] A.M. Wissink and R. Meakin, Computational fluid dynamics with adaptive overset grids on parallel and distributed computer platforms, in: Proc. Intl. Conf. on Parallel and Distributed Processing Techniques and Applications, (1998)
A New Solution Adaption Capability for the OVERFLOW CFD Code
A New Solution Adaption Capability for the OVERFLOW CFD Code Pieter G. Buning NASA Langley Research Center, Hampton, VA 10 th Symposium on Overset Composite Grid and Solution Technology September 20-23,
Mesh Generation and Load Balancing
Mesh Generation and Load Balancing Stan Tomov Innovative Computing Laboratory Computer Science Department The University of Tennessee April 04, 2012 CS 594 04/04/2012 Slide 1 / 19 Outline Motivation Reliable
Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations
Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams, 1990 Presented by Chris Eldred Outline Summary Finite Element Solver Load Balancing Results Types Conclusions
A Load Balancing Tool for Structured Multi-Block Grid CFD Applications
A Load Balancing Tool for Structured Multi-Block Grid CFD Applications K. P. Apponsah and D. W. Zingg University of Toronto Institute for Aerospace Studies (UTIAS), Toronto, ON, M3H 5T6, Canada Email:
Modeling Rotor Wakes with a Hybrid OVERFLOW-Vortex Method on a GPU Cluster
Modeling Rotor Wakes with a Hybrid OVERFLOW-Vortex Method on a GPU Cluster Mark J. Stock, Ph.D., Adrin Gharakhani, Sc.D. Applied Scientific Research, Santa Ana, CA Christopher P. Stone, Ph.D. Computational
Distributed Dynamic Load Balancing for Iterative-Stencil Applications
Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,
A FAST AND HIGH QUALITY MULTILEVEL SCHEME FOR PARTITIONING IRREGULAR GRAPHS
SIAM J. SCI. COMPUT. Vol. 20, No., pp. 359 392 c 998 Society for Industrial and Applied Mathematics A FAST AND HIGH QUALITY MULTILEVEL SCHEME FOR PARTITIONING IRREGULAR GRAPHS GEORGE KARYPIS AND VIPIN
Characterizing the Performance of Dynamic Distribution and Load-Balancing Techniques for Adaptive Grid Hierarchies
Proceedings of the IASTED International Conference Parallel and Distributed Computing and Systems November 3-6, 1999 in Cambridge Massachusetts, USA Characterizing the Performance of Dynamic Distribution
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster Gabriele Jost and Haoqiang Jin NAS Division, NASA Ames Research Center, Moffett Field, CA 94035-1000 {gjost,hjin}@nas.nasa.gov
Suggar++: An Improved General Overset Grid Assembly Capability
19th AIAA Computational Fluid Dynamics 22-25 June 2009, San Antonio, Texas AIAA 2009-3992 19th AIAA Computational Fluid Dynamics Conference, 22-25 June 2009, San Antonio, TX Suggar++: An Improved General
Aeroelastic Investigation of the Sandia 100m Blade Using Computational Fluid Dynamics
Aeroelastic Investigation of the Sandia 100m Blade Using Computational Fluid Dynamics David Corson Altair Engineering, Inc. Todd Griffith Sandia National Laboratories Tom Ashwill (Retired) Sandia National
Fast Multipole Method for particle interactions: an open source parallel library component
Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,
Acceleration of a CFD Code with a GPU
Acceleration of a CFD Code with a GPU Dennis C. Jespersen ABSTRACT The Computational Fluid Dynamics code Overflow includes as one of its solver options an algorithm which is a fairly small piece of code
Load Balancing Techniques
Load Balancing Techniques 1 Lecture Outline Following Topics will be discussed Static Load Balancing Dynamic Load Balancing Mapping for load balancing Minimizing Interaction 2 1 Load Balancing Techniques
Lecture 7 - Meshing. Applied Computational Fluid Dynamics
Lecture 7 - Meshing Applied Computational Fluid Dynamics Instructor: André Bakker http://www.bakker.org André Bakker (2002-2006) Fluent Inc. (2002) 1 Outline Why is a grid needed? Element types. Grid types.
Load Balancing Strategies for Parallel SAMR Algorithms
Proposal for a Summer Undergraduate Research Fellowship 2005 Computer science / Applied and Computational Mathematics Load Balancing Strategies for Parallel SAMR Algorithms Randolf Rotta Institut für Informatik,
Structured grid flow solvers such as NASA s OVERFLOW compressible Navier-Stokes flow solver 1, 2
19th AIAA Computational Fluid Dynamics 22-25 June 2009, San Antonio, Texas AIAA 2009-3998 OVERSMART A Solution Monitoring And Reporting Tool for the OVERFLOW Flow Solver David L Kao and William M Chan
HPC enabling of OpenFOAM R for CFD applications
HPC enabling of OpenFOAM R for CFD applications Towards the exascale: OpenFOAM perspective Ivan Spisso 25-27 March 2015, Casalecchio di Reno, BOLOGNA. SuperComputing Applications and Innovation Department,
A scalable multilevel algorithm for graph clustering and community structure detection
A scalable multilevel algorithm for graph clustering and community structure detection Hristo N. Djidjev 1 Los Alamos National Laboratory, Los Alamos, NM 87545 Abstract. One of the most useful measures
CFD Lab Department of Engineering The University of Liverpool
Development of a CFD Method for Aerodynamic Analysis of Large Diameter Horizontal Axis wind turbines S. Gomez-Iradi, G.N. Barakos and X. Munduate 2007 joint meeting of IEA Annex 11 and Annex 20 Risø National
FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG
FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG) Lehrstuhl für Informatik 10 (Systemsimulation) Massively Parallel Multilevel Finite
TESLA Report 2003-03
TESLA Report 23-3 A multigrid based 3D space-charge routine in the tracking code GPT Gisela Pöplau, Ursula van Rienen, Marieke de Loos and Bas van der Geer Institute of General Electrical Engineering,
Overset composite grids for the simulation of complex moving geometries
EUA4X#21 Poster Session MASCOT06, IAC-CNR, Roma, Italy Overset composite grids for the simulation of complex moving geometries Joel Guerrero DICAT, University of Genoa, Via Montallegro 1, 16145 Genoa,
An Improved Spectral Load Balancing Method*
SAND93-016C An Improved Spectral Load Balancing Method* Bruce Hendrickson Robert Leland Abstract We describe an algorithm for the static load balancing of scientific computations that generalizes and improves
XFlow CFD results for the 1st AIAA High Lift Prediction Workshop
XFlow CFD results for the 1st AIAA High Lift Prediction Workshop David M. Holman, Dr. Monica Mier-Torrecilla, Ruddy Brionnaud Next Limit Technologies, Spain THEME Computational Fluid Dynamics KEYWORDS
Hypergraph-based Dynamic Load Balancing for Adaptive Scientific Computations
Hypergraph-based Dynamic Load Balancing for Adaptive Scientific Computations Umit V. Catalyurek, Erik G. Boman, Karen D. Devine, Doruk Bozdağ, Robert Heaphy, and Lee Ann Riesen Ohio State University Sandia
A Comparison of General Approaches to Multiprocessor Scheduling
A Comparison of General Approaches to Multiprocessor Scheduling Jing-Chiou Liou AT&T Laboratories Middletown, NJ 0778, USA [email protected] Michael A. Palis Department of Computer Science Rutgers University
Simulation of Fluid-Structure Interactions in Aeronautical Applications
Simulation of Fluid-Structure Interactions in Aeronautical Applications Martin Kuntz Jorge Carregal Ferreira ANSYS Germany D-83624 Otterfing [email protected] December 2003 3 rd FENET Annual Industry
ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008
ParFUM: A Parallel Framework for Unstructured Meshes Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008 What is ParFUM? A framework for writing parallel finite element
Parallel Simplification of Large Meshes on PC Clusters
Parallel Simplification of Large Meshes on PC Clusters Hua Xiong, Xiaohong Jiang, Yaping Zhang, Jiaoying Shi State Key Lab of CAD&CG, College of Computer Science Zhejiang University Hangzhou, China April
Lecture 12: Partitioning and Load Balancing
Lecture 12: Partitioning and Load Balancing G63.2011.002/G22.2945.001 November 16, 2010 thanks to Schloegel,Karypis and Kumar survey paper and Zoltan website for many of today s slides and pictures Partitioning
Mesh Partitioning and Load Balancing
and Load Balancing Contents: Introduction / Motivation Goals of Load Balancing Structures Tools Slide Flow Chart of a Parallel (Dynamic) Application Partitioning of the initial mesh Computation Iteration
Load Balancing on a Non-dedicated Heterogeneous Network of Workstations
Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Dr. Maurice Eggen Nathan Franklin Department of Computer Science Trinity University San Antonio, Texas 78212 Dr. Roger Eggen Department
Multiphase Flow - Appendices
Discovery Laboratory Multiphase Flow - Appendices 1. Creating a Mesh 1.1. What is a geometry? The geometry used in a CFD simulation defines the problem domain and boundaries; it is the area (2D) or volume
Module 6 Case Studies
Module 6 Case Studies 1 Lecture 6.1 A CFD Code for Turbomachinery Flows 2 Development of a CFD Code The lecture material in the previous Modules help the student to understand the domain knowledge required
CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing
CSE / Notes : Task Scheduling & Load Balancing Task Scheduling A task is a (sequential) activity that uses a set of inputs to produce a set of outputs. A task (precedence) graph is an acyclic, directed
CFD analysis for road vehicles - case study
CFD analysis for road vehicles - case study Dan BARBUT*,1, Eugen Mihai NEGRUS 1 *Corresponding author *,1 POLITEHNICA University of Bucharest, Faculty of Transport, Splaiul Independentei 313, 060042, Bucharest,
Guide to Partitioning Unstructured Meshes for Parallel Computing
Guide to Partitioning Unstructured Meshes for Parallel Computing Phil Ridley Numerical Algorithms Group Ltd, Wilkinson House, Jordan Hill Road, Oxford, OX2 8DR, UK, email: [email protected] April 17,
walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation
walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation SIAM Parallel Processing for Scientific Computing 2012 February 16, 2012 Florian Schornbaum,
Application of CFD Simulation in the Design of a Parabolic Winglet on NACA 2412
, July 2-4, 2014, London, U.K. Application of CFD Simulation in the Design of a Parabolic Winglet on NACA 2412 Arvind Prabhakar, Ayush Ohri Abstract Winglets are angled extensions or vertical projections
Dynamic Load Balancing of SAMR Applications on Distributed Systems y
Dynamic Load Balancing of SAMR Applications on Distributed Systems y Zhiling Lan, Valerie E. Taylor Department of Electrical and Computer Engineering Northwestern University, Evanston, IL 60208 fzlan,
Overset Grids Technology in STAR-CCM+: Methodology and Applications
Overset Grids Technology in STAR-CCM+: Methodology and Applications Eberhard Schreck, Milovan Perić and Deryl Snyder [email protected] [email protected] [email protected]
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,
Load balancing Static Load Balancing
Chapter 7 Load Balancing and Termination Detection Load balancing used to distribute computations fairly across processors in order to obtain the highest possible execution speed. Termination detection
Scalable coupling of Molecular Dynamics (MD) and Direct Numerical Simulation (DNS) of multi-scale flows
Scalable coupling of Molecular Dynamics (MD) and Direct Numerical Simulation (DNS) of multi-scale flows Lucian Anton 1, Edward Smith 2 1 Numerical Algorithms Group Ltd, Wilkinson House, Jordan Hill Road,
Computational Modeling of Wind Turbines in OpenFOAM
Computational Modeling of Wind Turbines in OpenFOAM Hamid Rahimi [email protected] ForWind - Center for Wind Energy Research Institute of Physics, University of Oldenburg, Germany Outline Computational
CFD Simulation of the NREL Phase VI Rotor
CFD Simulation of the NREL Phase VI Rotor Y. Song* and J. B. Perot # *Theoretical & Computational Fluid Dynamics Laboratory, Department of Mechanical & Industrial Engineering, University of Massachusetts
Policy Distribution Methods for Function Parallel Firewalls
Policy Distribution Methods for Function Parallel Firewalls Michael R. Horvath GreatWall Systems Winston-Salem, NC 27101, USA Errin W. Fulp Department of Computer Science Wake Forest University Winston-Salem,
A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture
A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture Yangsuk Kee Department of Computer Engineering Seoul National University Seoul, 151-742, Korea Soonhoi
Static Load Balancing of Parallel PDE Solver for Distributed Computing Environment
Static Load Balancing of Parallel PDE Solver for Distributed Computing Environment Shuichi Ichikawa and Shinji Yamashita Department of Knowledge-based Information Engineering, Toyohashi University of Technology
CFD ANALYSIS OF CONTROLLABLE PITCH PROPELLER USED IN MARINE VEHICLE
CFD ANALYSIS OF CONROLLABLE PICH PROPELLER USED IN MARINE VEHICLE Aditya Kolakoti 1,.V.K.Bhanuprakash 2 & H.N.Das 3 1 M.E in Marine Engineering And Mechanical Handling, Dept of Marine Engineering, Andhra
Performance Evaluation of Amazon EC2 for NASA HPC Applications!
National Aeronautics and Space Administration Performance Evaluation of Amazon EC2 for NASA HPC Applications! Piyush Mehrotra!! J. Djomehri, S. Heistand, R. Hood, H. Jin, A. Lazanoff,! S. Saini, R. Biswas!
University Turbine Systems Research 2012 Fellowship Program Final Report. Prepared for: General Electric Company
University Turbine Systems Research 2012 Fellowship Program Final Report Prepared for: General Electric Company Gas Turbine Aerodynamics Marion Building 300 Garlington Rd Greenville, SC 29615, USA Prepared
Dynamic Mapping and Load Balancing on Scalable Interconnection Networks Alan Heirich, California Institute of Technology Center for Advanced Computing Research The problems of mapping and load balancing
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed
Load balancing in a heterogeneous computer system by self-organizing Kohonen network
Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.
Computational Fluid Dynamics Research Projects at Cenaero (2011)
Computational Fluid Dynamics Research Projects at Cenaero (2011) Cenaero (www.cenaero.be) is an applied research center focused on the development of advanced simulation technologies for aeronautics. Located
Partitioning and Dynamic Load Balancing for Petascale Applications
Partitioning and Dynamic Load Balancing for Petascale Applications Karen Devine, Sandia National Laboratories Erik Boman, Sandia National Laboratories Umit Çatalyürek, Ohio State University Lee Ann Riesen,
Laminar Flow in a Baffled Stirred Mixer
Laminar Flow in a Baffled Stirred Mixer Introduction This exercise exemplifies the use of the rotating machinery feature in the CFD Module. The Rotating Machinery interface allows you to model moving rotating
Load Balancing and Termination Detection
Chapter 7 Load Balancing and Termination Detection 1 Load balancing used to distribute computations fairly across processors in order to obtain the highest possible execution speed. Termination detection
Circuits 1 M H Miller
Introduction to Graph Theory Introduction These notes are primarily a digression to provide general background remarks. The subject is an efficient procedure for the determination of voltages and currents
DYNAMIC LOAD BALANCING APPLICATIONS ON A HETEROGENEOUS UNIX/NT CLUSTER
European Congress on Computational Methods in Applied Sciences and Engineering ECCOMAS 2000 Barcelona, 11-14 September 2000 ECCOMAS DYNAMIC LOAD BALANCING APPLICATIONS ON A HETEROGENEOUS UNIX/NT CLUSTER
TIME-ACCURATE SIMULATION OF THE FLOW AROUND THE COMPLETE BO105 WIND TUNNEL MODEL
TIME-ACCURATE SIMULATION OF THE FLOW AROUND THE COMPLETE BO105 WIND TUNNEL MODEL Walid Khier, Thorsten Schwarz, Jochen Raddatz presented by Andreas Schütte DLR, Institute of Aerodynamics and Flow Technology
ME6130 An introduction to CFD 1-1
ME6130 An introduction to CFD 1-1 What is CFD? Computational fluid dynamics (CFD) is the science of predicting fluid flow, heat and mass transfer, chemical reactions, and related phenomena by solving numerically
Keywords: Heat transfer enhancement; staggered arrangement; Triangular Prism, Reynolds Number. 1. Introduction
Heat transfer augmentation in rectangular channel using four triangular prisms arrange in staggered manner Manoj Kumar 1, Sunil Dhingra 2, Gurjeet Singh 3 1 Student, 2,3 Assistant Professor 1.2 Department
Symmetric Multiprocessing
Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called
Scientific Computing Programming with Parallel Objects
Scientific Computing Programming with Parallel Objects Esteban Meneses, PhD School of Computing, Costa Rica Institute of Technology Parallel Architectures Galore Personal Computing Embedded Computing Moore
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,
DECENTRALIZED LOAD BALANCING IN HETEROGENEOUS SYSTEMS USING DIFFUSION APPROACH
DECENTRALIZED LOAD BALANCING IN HETEROGENEOUS SYSTEMS USING DIFFUSION APPROACH P.Neelakantan Department of Computer Science & Engineering, SVCET, Chittoor [email protected] ABSTRACT The grid
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications
Performance Comparison of a Vertical Axis Wind Turbine using Commercial and Open Source Computational Fluid Dynamics based Codes
Performance Comparison of a Vertical Axis Wind Turbine using Commercial and Open Source Computational Fluid Dynamics based Codes Taimoor Asim 1, Rakesh Mishra 1, Sree Nirjhor Kaysthagir 1, Ghada Aboufares
Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations
C3P 913 June 1990 Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams Concurrent Supercomputing Facility California Institute of Technology Pasadena, California
C3.8 CRM wing/body Case
C3.8 CRM wing/body Case 1. Code description XFlow is a high-order discontinuous Galerkin (DG) finite element solver written in ANSI C, intended to be run on Linux-type platforms. Relevant supported equation
Partitioning and Load Balancing for Emerging Parallel Applications and Architectures
Chapter Partitioning and Load Balancing for Emerging Parallel Applications and Architectures Karen D. Devine, Erik G. Boman, and George Karypis. Introduction An important component of parallel scientific
Parallel Scalable Algorithms- Performance Parameters
www.bsc.es Parallel Scalable Algorithms- Performance Parameters Vassil Alexandrov, ICREA - Barcelona Supercomputing Center, Spain Overview Sources of Overhead in Parallel Programs Performance Metrics for
Multi-Block Gridding Technique for FLOW-3D Flow Science, Inc. July 2004
FSI-02-TN59-R2 Multi-Block Gridding Technique for FLOW-3D Flow Science, Inc. July 2004 1. Introduction A major new extension of the capabilities of FLOW-3D -- the multi-block grid model -- has been incorporated
THE NAS KERNEL BENCHMARK PROGRAM
THE NAS KERNEL BENCHMARK PROGRAM David H. Bailey and John T. Barton Numerical Aerodynamic Simulations Systems Division NASA Ames Research Center June 13, 1986 SUMMARY A benchmark test program that measures
Load Balancing and Termination Detection
Chapter 7 slides7-1 Load Balancing and Termination Detection slides7-2 Load balancing used to distribute computations fairly across processors in order to obtain the highest possible execution speed. Termination
CSE Case Study: Optimising the CFD code DG-DES
CSE Case Study: Optimising the CFD code DG-DES CSE Team NAG Ltd., [email protected] Naveed Durrani University of Sheffield June 2008 Introduction One of the activities of the NAG CSE (Computational
Energy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
Aerodynamic Department Institute of Aviation. Adam Dziubiński CFD group FLUENT
Adam Dziubiński CFD group IoA FLUENT Content Fluent CFD software 1. Short description of main features of Fluent 2. Examples of usage in CESAR Analysis of flow around an airfoil with a flap: VZLU + ILL4xx
CFD Based Air Flow and Contamination Modeling of Subway Stations
CFD Based Air Flow and Contamination Modeling of Subway Stations Greg Byrne Center for Nonlinear Science, Georgia Institute of Technology Fernando Camelli Center for Computational Fluid Dynamics, George
Express Introductory Training in ANSYS Fluent Lecture 1 Introduction to the CFD Methodology
Express Introductory Training in ANSYS Fluent Lecture 1 Introduction to the CFD Methodology Dimitrios Sofialidis Technical Manager, SimTec Ltd. Mechanical Engineer, PhD PRACE Autumn School 2013 - Industry
Source Code Transformations Strategies to Load-balance Grid Applications
Source Code Transformations Strategies to Load-balance Grid Applications Romaric David, Stéphane Genaud, Arnaud Giersch, Benjamin Schwarz, and Éric Violard LSIIT-ICPS, Université Louis Pasteur, Bd S. Brant,
Pushing the limits. Turbine simulation for next-generation turbochargers
Pushing the limits Turbine simulation for next-generation turbochargers KWOK-KAI SO, BENT PHILLIPSEN, MAGNUS FISCHER Computational fluid dynamics (CFD) has matured and is now an indispensable tool for
