Static load balancing for multi-level Monte Carlo finite volume solvers
|
|
|
- Denis Campbell
- 10 years ago
- Views:
Transcription
1 Eidgenössische Technische Hochschule Zürich Ecole polytechnique fédérale de Zurich Politecnico federale di Zurigo Swiss Federal Institute of Technology Zurich Static load balancing for multi-level Monte Carlo finite volume solvers J. Šukys, S. Mishra and Ch. Schwab Research Report No May 2011 Seminar für Angewandte Mathematik Eidgenössische Technische Hochschule CH-8092 Zürich Switzerland
2 Static load balancing for multi-level Monte Carlo finite volume solvers Jonas Šukys, Siddhartha Mishra, and Christoph Schwab ETH Zürich, Switzerland, Abstract. The Multi-Level Monte Carlo finite volumes (MLMC-FVM) algorithm was shown to be a robust and fast solver for uncertainty quantification in the solutions of multi-dimensional systems of stochastic conservation laws. A novel load balancing procedure is used to ensure scalability of the MLMC algorithm on massively parallel hardware. We describe this procedure together with other arising challenges in great detail. Finally, numerical experiments in multi-dimensions showing strong and weak scaling of our implementation are presented. Keywords: uncertainty quantification, conservation laws, multi-level Monte Carlo, finite volumes, static load balancing, linear scaling. Acknowledgments. This work is performed under ETH interdisciplinary research grant CH CS acknowledges partial support by the European Research Council under grant ERC AdG STAHDPDE. JŠ is grateful to Stefan Pauli and Peter Arbenz for their contributions. 1 Introduction A number of problems in physics and engineering are modeled in terms of systems of conservation laws: { Ut + div(f(u)) = S(x, U), (x,t) R d R +. (1) U(x, 0) = U 0 (x), Here, U : R d R m denotes the vector of conserved variables, F : R m R m R m d is the collection of directional flux vectors and S : R d R m R m is the source term. The partial differential equation is augmented with initial data U 0. Examples for conservation laws include the shallow water equations of oceanography, the Euler equations of gas dynamics, the Magnetohydrodynamics (MHD) equations of plasma physics and the equations of non-linear elasticity. As the equations are non-linear, analytical solution formulas are only available in very special situations. Consequently, numerical schemes such as Finite Volume methods [6] are required for the study of systems of conservation laws. Existing numerical methods for approximating (1) require the initial data U 0 and source S as the input. However, in most practical situations, it is not
3 2 possible to measure this input precisely. This uncertainty in the inputs for (1) propagates to the solution, leading to the stochastic system of conservation laws: { U(x, t, ω)t + div(f(u(x, t, ω))) = S(x, ω), x R d, t > 0, ω Ω. (2) U(x, 0, ω) =U 0 (x, ω), where (Ω, F, P) is a complete probability space and initial data U 0 with the source term S are random fields [1,2]. The solution is also realized as a random field; its statistical moments like the expectation and variance are the quantities of interest. An estimate of expectation can be obtained by the so-called Monte Carlo finite volume method (MC-FVM) consisting of the following three steps: 1. Sample: We draw M independent identically distributed (i.i.d.) initial data and source samples {U i 0, S i 0} with i =1, 2,..., M from the random fields {U 0, S 0 } and approximate these by piecewise constant cell averages. 2. Solve: For each realization {U i 0, S i 0}, the underlying conservation law (1) is solved numerically by the finite volume method [6,3,5]. We denote the FVM solutions by U i,n T, i.e. by cell averages {Ui,n K : K T}at the time level tn. 3. Estimate Statistics: We estimate the expectation of the random solution field with the sample mean (ensemble average) of the approximate solution: E M [U n T ] := 1 M M i=1 U i,n T, M = O( x s 2 ). (3) The error of the MC-FVM asymptotically scales [2] as (Work) (d+1+2s), making the MC-FVM method computationally infeasible when high accuracy is needed. The multi-level Monte Carlo finite volume method (MLMC-FVM) was recently proposed in [2,1]. The key idea behind MLMC-FVM is to simultaneously draw MC samples on a hierarchy of nested grids. There are four main steps: 1. Nested meshes: Consider nested triangulations {T l } l=0 of the spatial domain with corresponding mesh widths x l that satisfy: x l = x(t l ) = sup{diam(k) : K T l } = O(2 l x 0 ), l N 0, (4) where x 0 - mesh width of the coarsest resolution at the lowest level l = Sample: For each level of resolution l N 0, we draw M l independent identically distributed (i.i.d) samples {U i 0,l, Si 0,l } with i =1, 2,,M l from the random fields {U 0, S 0 } and approximate these by cell averages. 3. Solve: For each resolution level l and each realization {U i 0,l, Si 0,l }, the underlying balance law (1) is solved by the finite volume method [6,3,5] with mesh width x l ; denote solutions by U i,n T l at the time t n and mesh level l. 4. Estimate solution statistics: Fix some positive integer L < corresponding to the highest level. We estimate the expectation of the random solution field with the following estimator: E L [U(,t n )] := L E Ml [U n T l U n T l 1 ], (5) l=0 s
4 with E Ml being the MC estimator defined in (3) for the level l. Higher statistical moments can be approximated analogously (see, e.g., [2]). Authors in [2] provide error vs. computational work estimate leading to the following required number of samples on each discretization level l to equilibrate the statistical and spatio-temporal discretization errors in (5): M l = O(2 2(L l)s ). (6) Notice that (6) implies that the largest number of MC samples is required on the coarsest mesh level l = 0, whereas only a small fixed number of MC samples are needed on the finest discretization levels. The corresponding error vs. work estimate for MLMC-FVM is given by [1,2], (Work) s/(d+1) if s<(d + 1)/2, error ( s/(d+1) (7) log(work)) if s =(d + 1)/2. The above estimates show that the MLMC-FVM is superior to the MC-FVM. Furthermore, if convergence rate s of the FVM solver satisfies s<(d+1)/2 then this estimate is exactly of the same order as the estimate for the deterministic finite volume scheme. For the same error, the MLMC-FVM was shown to be considerably faster than the MC-FVM [1,2]; in particular, at the relative error level of 1%, the speed up reached approximately two orders of magnitude [1,2]. 3 2 Highly scalable implementation of MLMC-FVM MLMC-FVM is non-intrusive as any standard FVM code can be used in step 3. Furthermore, MLMC-FVM is amenable to efficient parallelization as data from different grid resolutions and samples only interacts in step 4. Select a nested hierarchy of triangulations in step 1 is straightforward for any parallel architecture. In step 2, we draw samples for {U 0, S 0 } with a given probability distribution. Here, a robust random number generator (RNG) is needed. 2.1 Robust pseudo random number generation Random number generation becomes a very sensitive part of Monte Carlo type algorithms on massively parallel architectures. Inconsistent seeding and insufficient period length of the RNG might cause correlations in presumably i.i.d. draws which might potentially lead to biased solutions such as in Figure 5 of [1]. Such spurious correlations are due to two factors: firstly, large number of MC samples requires longer period and hence larger buffer of the RNG. Secondly, the seeding of the buffer for each core must preserve statistical independence. For the numerical simulations reported below, we used the WELL-series [9] of pseudo random number generators, which were designed with particular attention towards large periods and good equidistribution. In particular, the RNG
5 4 WELL512a was used; we found WELL512a to have a sufficiently large period and to be reasonably efficient (33 CPU sec for 10 9 draws). We emphasize that there are plenty of alternatives to WELL512a with even longer periods (which, however, use more memory). To name a few: WELL1024a with period , takes 34 sec and WELLRNG44497 with period which takes 41 sec for 10 9 draws. The strategy to deal with seeding issues is described in subsection 2.4. In step 3 of the MLMC-FVM algorithm, we solve the conservation law (2) for each draw of the initial data. This is performed with ALSVID. A massively parallel version of ALSVID has already been developed for deterministic problems; refer to [7] for further details. The parallelization paradigm for ALSVID is based on domain decomposition using MPI standard [12]. 2.2 A priori estimates for computational work The key issue in the parallel implementation of the solve steps (in the Step 3 of MLMC-FVM algorithm) is to distribute computational work evenly among the cores. The FVM algorithm consists of computing fluxes across all cell interfaces and then updating cell averages via the explicit stable time stepping routine [6]. The computational complexity of numerical flux approximations is given by an explicit algorithm and is of order equal to the number of cells in the mesh T, Work step T = Work step ( x) =O(N) =O( x d ), (8) where N =#T denotes the number of cells and x denotes the mesh width of triangulation T. To ensure stability of the FVM scheme, a CFL condition [6] is imposed on the time step size t := t n+1 t n, which forces t = O( x). (9) Hence, the computational work Work det T for one complete deterministic solve using the FVM method on the triangulation T with mesh width x is given by multiplying the work for one step (8) by the total number of steps O( t 1 ), Work det T = Work step T O( t) (9) = O( x d ) O( x 1 )=O( x (d+1) ). (10) In most explicit FVM schemes (e.g. Rusanov with (W)ENO and SSP-RK2, see [6]), all lower order terms O( x d ) in (10) are negligible, even when a very coarse mesh is used. Hence, we assume that (10) holds in a stricter sense, Work det T = K x (d+1), (11) where constant K depends on the FVM that is used and on the time horizon t > 0, but does not depend on mesh width x. Finally, MC algorithm (3) combines multiple deterministic solves for a sequence of sample draws ω Ω. By Work M ( x) we denote the computational work needed for a MC-FVM algorithm performing M solves, each of complexity as in (11). Then, Work M ( x) =M Work det ( x) =M K x (d+1). (12) Next, we describe our load balancing strategy needed for the Step 3.
6 5 2.3 Static load balancing In what follows, we assume a homogeneous computing environment meaning that all cores are assumed to have identical CPUs and RAM per node, and equal bandwidth and latency to all other cores. There are 3 levels of parallelization: across mesh resolution levels, across MC samples and inside the deterministic solver using domain decomposition (see example in Figure 1). Domain decomposition is used only in the few levels with the finest mesh resolution. On these levels, the number of MC samples is small. However, these levels require most of the computational effort (unless s =(d + 1)/2 in (7) holds). For the finest level l = L we fix the number of cores: C L }{{} # of cores = D L }{{} # of subdomains P L }{{} # of groups for MC samples (13) where for every level 0 l L, D l denotes the number of subdomains and P l denotes the number of samplers - groups of cores, where every such group computes some portion of required M l Monte Carlo samples at level l. We assume that each subdomain is computed on exactly one core and denote the total number of cores at level 0 l L by C l, i.e. C l = D l P l, l {0,..., L}. (14) Since total computational work for E Ml [U n T l U n T l 1 ] is then given by (12), i.e. Work l := Work Ml ( x l ) + Work Ml ( x l 1 ), (15) the ratio of computational work for the remaining levels l {L 1,..., 1} is given recursively by inserting (6) into a priori work estimates (12): ( ) Work M L 2 2(L l)s K x (d+1) l l + x (d+1) l 1 = ( ) Work l 1 M L 2 2(L (l 1))s K x (d+1) l 1 + x (d+1) l 2 = 2 2ls x (d+1) ( ) (16) l (d+1) 2 2(l+1)s ( 2 (d+1) +2 2(d+1)) =2 d+1 2s. x (d+1) l For level l = 0, the term U n T 1 in E M0 [U n T 0 U n T 1 ] is known ( 0), hence (16) provides a lower bound rather than an equality, i.e. Work 0 Work 1 /(2 d+1 2s ). Consequently, the positive integer parameters D L and P L M L recursively determine the number of cores needed for each level l <L via the relation C l = Cl+1 2 d+1 2s, l < L. (17) Notice, that the denominator 2 d+1 2s in (17) is a positive integer (a power of 2) provided s N/2 and s (d + 1)/2 (which is not an additional constraint as it is also present in (7)). However, when s<(d + 1)/2, we have: 2 d+1 2s 2, (18)
7 6 which (when L is large) leads to inefficient load distribution for levels l l, where each successive level needs needs less than one core: l := min{0 l L : C l+1 < 2 d+1 2s }. (19) We investigate the amount of total computational work (Work {0,...,l }) required for such inefficient levels l {0,..., l }: Work {0,...,l } := = l l=0 (16) Work l = l l=0 Work l (18) 1 (2 d+1 2s ) 1 Work l Work l 2 l (d+1 2s)(l l) l= =2 Work l. Work l 2 (d+1 2s)(l l) (20) For the sake of simplicity, assume that P L and D L are nonnegative integer powers of 2. Under this assumption, definition (19) of l together with recurrence relation (17) without rounding up ( ) implies that C l 1/2. Hence, total work estimate (20) for all levels l {0,..., l } translates into an estimate for sufficient number of cores, which, instead of l + 1, turns out to be only 1: Work {0,...,l } 2 Work l C {0,...,l } 2 1 =1. (21) 2 The implementation of (21) (i.e. multiple levels per 1 core) is essential to obtain efficient and highly scalable parallelization of MLMC-FVM when s< d+1 2. The example of static load distribution for MLMC-FVM algorithm using all three parallelization levels is given in Figure 1, where the parameters are set to: L =5, M L =4, d =1, s = 1 2, D L =2, P L =4.!"#"!$%$&!"#"!$%$'!"#"!$%$( ; ; ; ; < < < < = = )?< -"'&./ -$!"'("%.0."/ %0#.9:0."/ ",+%&'()$% ':)0.()$+)$C$)% ($#+!"#$ >!"#$% %&'()$#% *+",+%&'()$% Fig. 1. Example for the static load distribution structure Next, we describe our implementation of this load balancing using C++ and MPI.
8 7 2.4 Implementation using C++ and MPI In what follows, we assume that each MPI process is running on its own core. Simulation is divided into 3 main phases - initialization, simulation and data collection; key concepts for the implementation of the static load balancing algorithm for each phase is described below. Phase 1 - Initialization: MPI groups and communicators. By default, message passing in MPI is done via the main communicator MPI COMM WORLD which connects all processes. Each process has a prescribed unique non-negative integer called rank. The process with the rank 0 is called root. Apart from MPI COMM WORLD, we use MPI Group range incl() and MPI Comm create() to create sub-groups and corresponding inter-communicators, this way empowering message passing within particular subgroups of processes. Such communicators ease the use of collective reduction operations within some particular subgroup of processes. We implemented 3 types of communicators (see Figure 2): 1. Domain communicators connect processes within each sampler; these precesses are used for domain decomposition of one physical mesh. 2. Sampler communicators connect processes that work on the MC samples at the same mesh level. 3. Level communicators connect only the processes (across all levels) that are roots of both domain and sampler communicators, where, analogously to MPI COMM WORLD, every process has a unique rank in each of the communicators 1-3; processes with rank 0 in domain communicators are called domain roots, in sampler communicators - sampler roots, and in level communicators - level roots. MPI COMM WORLD is used only in MPI Init(), MPI Finalize() and MPI Wtime(). Figure 2 depicts all nontrivial communicators and roots for the example setup as in Figure 1. -"##.&%/$0"+'1+""0'2!"#$%& '$#()*+ )*,*) Fig. 2. Structure and root processes of the communicators for setup as in Figure 1
9 8 Seeding RNG. To deal with the seeding issues mentioned in subsection 2.1, we injectively (i.e. one-to-one) map the unique rank (in MPI COMM WORLD) of each process that is root in both domain and sampler communicators to some corresponding element in the hardcoded array of prime numbers (seeds). Then these processes generate random vectors of real numbers needed to compute MC samples {U i 0,l, Si 0,l } for the entire level l. Afterwards, samples are scattered evenly using MPI Scatter() via the sampler communicator; MPI IN PLACE is used to remove unnecessary memory overhead. Finally, samples are broadcast using MPI Bcast() via the domain communicator. Phase 2 - Simulation: FVM solves for 2 mesh levels. FVM solves of each sample are performed for E Ml [U n T l ] and for E Ml [U n T l 1 ], and then combined into E Ml [U n T l U n T l 1 ]. Inter-domain communication. Cell values near interfaces of adjacent subdomains are exchanged asynchronously with MPI Isend() and MPI Recv(). Phase 3 - Data collection and output: MC estimator. For each level, statistical estimates are collectively reduced with MPI Reduce() into sampler roots using sampler communicators; then MC estimators (3) for mean and variance (see subsection 2.5) are obtained. MLMC estimator. MC estimators from different levels are finally combined via level communicators to level roots to obtain MLMC estimator (5). Parallel data output. Each process that is both sampler root and level root writes out final result. Hence, the number of parallel output files is equal to the number of subdomains on the finest mesh level. This concludes the discussion of static load balancing and of step 3 of MLMC- FVM. In step 4, the results are combined to compute sample mean and variance. 2.5 Variance computation for parallel runs A numerically stable serial variance computation algorithm (so-called online ) is given as follows [10]: set ū 0 = 0 and Φ 0 = 0; then proceed recursively, i i ū i = u j /i, Φ i := (u j ū i ) 2 = Φ i 1 +(u i ū i )(u i ū i 1 ). (22) j=1 j=1 Then, the unbiased mean and variance estimates are given by: E M [u] = ū M, Var M [u] =Φ M /(M 1). (23) For parallel architectures, assume we have 2 cores A and B each computing M A and M B (M = M A +M B ) number of samples respectively. Then an unbiased estimate for mean and variance can be obtained by [11] E M [u] = M AE MA [u]+m B E MB [u], Var M [u] = ΦM M M 1, (24) where: Φ M = Φ M A + Φ M B + δ 2 MA M B M, δ = E M B [u] E MA [u]. (25) Then, for any finite number of cores formula (24) is applied recursively.
10 9 3 Efficiency and linear scaling in numerical simulations The static load balancing algorithm was tested on a series of standard benchmarks for hyperbolic solvers. For detailed description of the setup, refer to [1] for Euler equations of gas dynamics, [1,3] for Magnetohydrodynamics (MHD) equations of plasma physics, and [4] for shallow water equations with randomly varying bottom topography. The runtime of all aforementioned simulations was measured by the so-called wall-time, accessible as MPI Wtime() routine in MPI2.0. We define parallel efficiency as a fraction of pure simulation time over total time, (cumulative wall-time) (cumulative wall-time of MPI calls) efficiency :=. cumulative wall-time (26) In Figure 3 we verify strong scaling (fixed discretization and sampling parameters while increasing #cores) and in Figure 4 we verify weak scaling (problem size is equivalent to #cores) of our implementation in 1d. In 2d simulations, we maintained such scaling upto 1023 cores at 97% efficiency. We believe that our parallelization algorithm will scale linearly for a much larger number of cores. Fig. 3. Strong scaling. Domain decomposition method (DDM) is enabled from cores onwards (for MLMC only); its scalability is inferior to pure (ML)MC parallelization due to additional networking between sub-domain boundaries. 4 Conclusion MLMC-FVM algorithm is superior to standard MC algorithms for uncertainty quantification in hyperbolic conservation laws, and yet, as most sampling algorithms, it still scales linearly w.r.t. number of uncertainty sources. Due to its non-intrusiveness, MLMC-FVM was efficiently parallelized for multi-core architectures. Strong and weak scaling of our implementation ALSVID-UQ [8] of the proposed static load balancing was verified on the high performance clusters [13,14] in multiple space dimensions. The suite of benchmarks included Euler equations of gas dynamics, MHD equations [1], and shallow water equations [4].
11 10 Fig. 4. Weak scaling. Analogously as in Figure 3, the slight deterioration in MLMC scaling from cores onwards could have been caused by inferior scaling of DDM. References 1. S. Mishra, Ch. Schwab and J. Šukys. Multi-level Monte Carlo finite volume methods for nonlinear systems of conservation laws in multi-dimensions. Submitted, Available from: 2. S. Mishra and Ch. Schwab. Sparse tensor multi-level Monte Carlo Finite Volume Methods for hyperbolic conservation laws with random initial data. Preprint Available from 3. F. Fuchs, A. D. McMurry, S. Mishra, N. H. Risebro and K. Waagan. Approximate Riemann solver based high-order finite volume schemes for the Godunov-Powell form of ideal MHD equations in multi-dimensions. Comm. Comp. Phys., 9: , S. Mishra, Ch. Schwab and J. Šukys. Multi-level Monte Carlo finite volume methods for shallow water equations with uncertain topography in multi-dimensions. In progress, U.S. Fjordholm, S. Mishra, and E. Tadmor. Well-balanced, energy stable schemes for the shallow water equations with varying topology. Submitted, Available from 6. R.A. LeVeque. Numerical Solution of Hyperbolic Conservation Laws. Cambridge Univ. Press ALSVID. Available from 8. ALSVID-UQ. Available from 9. P. L Ecuyer and F. Panneton. Fast Random Number Generators Based on Linear Recurrences Modulo 2. ACM Trans. Math. Software, 32:1-16, B. P. Welford. Note on a Method for Calculating Corrected Sums of Squares and Products. Technometrics, 4: , T. F. Chan, G. H. Golub. and R. J. LeVeque. Updating Formulae and a Pairwise Algorithm for Computing Sample Variances. STAN-CS , MPI: A Message-Passing Interface Standard. Version 2.2, Available from: Brutus, ETH Zürich, de.wikipedia.org/wiki/brutus_(cluster). 14. Palu, Swiss National Supercomputing Center (CSCS), Manno,
12 Research Reports No. Authors/Title J. Šukys, S. Mishra and Ch. Schwab Static load balancing for multi-level Monte Carlo finite volume solvers C.J. Gittelson, J. Könnö, Ch. Schwab and R. Stenberg The multi-level Monte Carlo Finite Element Method for a stochastic Brinkman problem A. Barth, A. Lang and Ch. Schwab Multi-level Monte Carlo Finite Element method for parabolic stochastic partial differential equations M. Hansen and Ch. Schwab Analytic regularity and nonlinear approximation of a class of parametric semilinear elliptic PDEs R. Hiptmair and S. Mao Stable multilevel splittings of boundary edge element spaces Ph. Grohs Shearlets and microlocal analysis H. Kumar Implicit-explicit Runge-Kutta methods for the two-fluid MHD equations H. Papasaika, E. Kokiopoulou, E. Baltsavias, K. Schindler and D. Kressner Sparsity-seeking fusion of digital elevation models H. Harbrecht and J. Li A fast deterministic method for stochastic elliptic interface problems based on low-rank approximation P. Corti and S. Mishra Stable finite difference schemes for the magnetic induction equation with Hall effect H. Kumar and S. Mishra Entropy stable numerical schemes for two-fluid MHD equations H. Heumann, R. Hiptmair, K. Li and J. Xu Semi-Lagrangian methods for advection of differential forms A. Moiola Plane wave approximation in linear elasticity C.J. Gittelson Uniformly convergent adaptive methods for parametric operator equations
HPC Deployment of OpenFOAM in an Industrial Setting
HPC Deployment of OpenFOAM in an Industrial Setting Hrvoje Jasak [email protected] Wikki Ltd, United Kingdom PRACE Seminar: Industrial Usage of HPC Stockholm, Sweden, 28-29 March 2011 HPC Deployment
Bootstrapping Big Data
Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu
A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster
Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906
Fast Multipole Method for particle interactions: an open source parallel library component
Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,
FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG
FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG) Lehrstuhl für Informatik 10 (Systemsimulation) Massively Parallel Multilevel Finite
VARIANCE REDUCTION TECHNIQUES FOR IMPLICIT MONTE CARLO SIMULATIONS
VARIANCE REDUCTION TECHNIQUES FOR IMPLICIT MONTE CARLO SIMULATIONS An Undergraduate Research Scholars Thesis by JACOB TAYLOR LANDMAN Submitted to Honors and Undergraduate Research Texas A&M University
Copyright. Network and Protocol Simulation. What is simulation? What is simulation? What is simulation? What is simulation?
Copyright Network and Protocol Simulation Michela Meo Maurizio M. Munafò [email protected] [email protected] Quest opera è protetta dalla licenza Creative Commons NoDerivs-NonCommercial. Per
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014
Finite Difference Approach to Option Pricing
Finite Difference Approach to Option Pricing February 998 CS5 Lab Note. Ordinary differential equation An ordinary differential equation, or ODE, is an equation of the form du = fut ( (), t) (.) dt where
Load Balancing and Switch Scheduling
EE384Y Project Final Report Load Balancing and Switch Scheduling Xiangheng Liu Department of Electrical Engineering Stanford University, Stanford CA 94305 Email: [email protected] Abstract Load
MPI Hands-On List of the exercises
MPI Hands-On List of the exercises 1 MPI Hands-On Exercise 1: MPI Environment.... 2 2 MPI Hands-On Exercise 2: Ping-pong...3 3 MPI Hands-On Exercise 3: Collective communications and reductions... 5 4 MPI
From CFD to computational finance (and back again?)
computational finance p. 1/21 From CFD to computational finance (and back again?) Mike Giles [email protected] Oxford University Mathematical Institute Oxford-Man Institute of Quantitative Finance
Introduction to the Finite Element Method
Introduction to the Finite Element Method 09.06.2009 Outline Motivation Partial Differential Equations (PDEs) Finite Difference Method (FDM) Finite Element Method (FEM) References Motivation Figure: cross
Advanced CFD Methods 1
Advanced CFD Methods 1 Prof. Patrick Jenny, FS 2014 Date: 15.08.14, Time: 13:00, Student: Federico Danieli Summary The exam took place in Prof. Jenny s office, with his assistant taking notes on the answers.
OpenFOAM Optimization Tools
OpenFOAM Optimization Tools Henrik Rusche and Aleks Jemcov [email protected] and [email protected] Wikki, Germany and United Kingdom OpenFOAM Optimization Tools p. 1 Agenda Objective Review optimisation
ABSTRACT FOR THE 1ST INTERNATIONAL WORKSHOP ON HIGH ORDER CFD METHODS
1 ABSTRACT FOR THE 1ST INTERNATIONAL WORKSHOP ON HIGH ORDER CFD METHODS Sreenivas Varadan a, Kentaro Hara b, Eric Johnsen a, Bram Van Leer b a. Department of Mechanical Engineering, University of Michigan,
Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui
Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching
Multi-Block Gridding Technique for FLOW-3D Flow Science, Inc. July 2004
FSI-02-TN59-R2 Multi-Block Gridding Technique for FLOW-3D Flow Science, Inc. July 2004 1. Introduction A major new extension of the capabilities of FLOW-3D -- the multi-block grid model -- has been incorporated
High Performance Computing. Course Notes 2007-2008. HPC Fundamentals
High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs
Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows
TECHNISCHE UNIVERSITEIT EINDHOVEN Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows Lloyd A. Fasting May 2014 Supervisors: dr. M. Firat dr.ir. M.A.A. Boon J. van Twist MSc. Contents
Pricing and calibration in local volatility models via fast quantization
Pricing and calibration in local volatility models via fast quantization Parma, 29 th January 2015. Joint work with Giorgia Callegaro and Martino Grasselli Quantization: a brief history Birth: back to
Multilevel Load Balancing in NUMA Computers
FACULDADE DE INFORMÁTICA PUCRS - Brazil http://www.pucrs.br/inf/pos/ Multilevel Load Balancing in NUMA Computers M. Corrêa, R. Chanin, A. Sales, R. Scheer, A. Zorzo Technical Report Series Number 049 July,
Poisson Equation Solver Parallelisation for Particle-in-Cell Model
WDS'14 Proceedings of Contributed Papers Physics, 233 237, 214. ISBN 978-8-7378-276-4 MATFYZPRESS Poisson Equation Solver Parallelisation for Particle-in-Cell Model A. Podolník, 1,2 M. Komm, 1 R. Dejarnac,
P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE
1 P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE JEAN-MARC GRATIEN, JEAN-FRANÇOIS MAGRAS, PHILIPPE QUANDALLE, OLIVIER RICOIS 1&4, av. Bois-Préau. 92852 Rueil Malmaison Cedex. France
Binary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms
PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms P. E. Vincent! Department of Aeronautics Imperial College London! 25 th March 2014 Overview Motivation Flux Reconstruction Many-Core
From CFD to computational finance (and back again?)
computational finance p. 1/17 From CFD to computational finance (and back again?) Mike Giles [email protected] Oxford University Mathematical Institute Oxford-Man Institute of Quantitative Finance
MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
Parallel Computing for Option Pricing Based on the Backward Stochastic Differential Equation
Parallel Computing for Option Pricing Based on the Backward Stochastic Differential Equation Ying Peng, Bin Gong, Hui Liu, and Yanxin Zhang School of Computer Science and Technology, Shandong University,
IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS
Volume 2, No. 3, March 2011 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at www.jgrcs.info IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE
Performance of the JMA NWP models on the PC cluster TSUBAME.
Performance of the JMA NWP models on the PC cluster TSUBAME. K.Takenouchi 1), S.Yokoi 1), T.Hara 1) *, T.Aoki 2), C.Muroi 1), K.Aranami 1), K.Iwamura 1), Y.Aikawa 1) 1) Japan Meteorological Agency (JMA)
Adaptation of General Purpose CFD Code for Fusion MHD Applications*
Adaptation of General Purpose CFD Code for Fusion MHD Applications* Andrei Khodak Princeton Plasma Physics Laboratory P.O. Box 451 Princeton, NJ, 08540 USA [email protected] Abstract Analysis of many fusion
A simplified implementation of the least squares solution for pairwise comparisons matrices
A simplified implementation of the least squares solution for pairwise comparisons matrices Marcin Anholcer Poznań University of Economics Al. Niepodleg lości 10, 61-875 Poznań, Poland V. Babiy McMaster
Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints
Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints Michael Bauer, Srinivasan Ravichandran University of Wisconsin-Madison Department of Computer Sciences {bauer, srini}@cs.wisc.edu
How High a Degree is High Enough for High Order Finite Elements?
This space is reserved for the Procedia header, do not use it How High a Degree is High Enough for High Order Finite Elements? William F. National Institute of Standards and Technology, Gaithersburg, Maryland,
A Load Balancing Tool for Structured Multi-Block Grid CFD Applications
A Load Balancing Tool for Structured Multi-Block Grid CFD Applications K. P. Apponsah and D. W. Zingg University of Toronto Institute for Aerospace Studies (UTIAS), Toronto, ON, M3H 5T6, Canada Email:
Proposal 1: Model-Based Control Method for Discrete-Parts machining processes
Proposal 1: Model-Based Control Method for Discrete-Parts machining processes Proposed Objective: The proposed objective is to apply and extend the techniques from continuousprocessing industries to create
A Pattern-Based Approach to. Automated Application Performance Analysis
A Pattern-Based Approach to Automated Application Performance Analysis Nikhil Bhatia, Shirley Moore, Felix Wolf, and Jack Dongarra Innovative Computing Laboratory University of Tennessee (bhatia, shirley,
AN APPROACH FOR SECURE CLOUD COMPUTING FOR FEM SIMULATION
AN APPROACH FOR SECURE CLOUD COMPUTING FOR FEM SIMULATION Jörg Frochte *, Christof Kaufmann, Patrick Bouillon Dept. of Electrical Engineering and Computer Science Bochum University of Applied Science 42579
ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008
ParFUM: A Parallel Framework for Unstructured Meshes Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008 What is ParFUM? A framework for writing parallel finite element
Load Balancing on a Non-dedicated Heterogeneous Network of Workstations
Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Dr. Maurice Eggen Nathan Franklin Department of Computer Science Trinity University San Antonio, Texas 78212 Dr. Roger Eggen Department
Master of Arts in Mathematics
Master of Arts in Mathematics Administrative Unit The program is administered by the Office of Graduate Studies and Research through the Faculty of Mathematics and Mathematics Education, Department of
Master of Mathematical Finance: Course Descriptions
Master of Mathematical Finance: Course Descriptions CS 522 Data Mining Computer Science This course provides continued exploration of data mining algorithms. More sophisticated algorithms such as support
Stephen du Toit Mathilda du Toit Gerhard Mels Yan Cheng. LISREL for Windows: PRELIS User s Guide
Stephen du Toit Mathilda du Toit Gerhard Mels Yan Cheng LISREL for Windows: PRELIS User s Guide Table of contents INTRODUCTION... 1 GRAPHICAL USER INTERFACE... 2 The Data menu... 2 The Define Variables
System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1
System Interconnect Architectures CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.4 System Interconnect Architectures Direct networks for static connections Indirect
Cellular Computing on a Linux Cluster
Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results
A Comparison of General Approaches to Multiprocessor Scheduling
A Comparison of General Approaches to Multiprocessor Scheduling Jing-Chiou Liou AT&T Laboratories Middletown, NJ 0778, USA [email protected] Michael A. Palis Department of Computer Science Rutgers University
Optimization of Supply Chain Networks
Optimization of Supply Chain Networks M. Herty TU Kaiserslautern September 2006 (2006) 1 / 41 Contents 1 Supply Chain Modeling 2 Networks 3 Optimization Continuous optimal control problem Discrete optimal
Load Balancing Strategies for Parallel SAMR Algorithms
Proposal for a Summer Undergraduate Research Fellowship 2005 Computer science / Applied and Computational Mathematics Load Balancing Strategies for Parallel SAMR Algorithms Randolf Rotta Institut für Informatik,
A NOVEL RESOURCE EFFICIENT DMMS APPROACH
A NOVEL RESOURCE EFFICIENT DMMS APPROACH FOR NETWORK MONITORING AND CONTROLLING FUNCTIONS Golam R. Khan 1, Sharmistha Khan 2, Dhadesugoor R. Vaman 3, and Suxia Cui 4 Department of Electrical and Computer
Source Code Transformations Strategies to Load-balance Grid Applications
Source Code Transformations Strategies to Load-balance Grid Applications Romaric David, Stéphane Genaud, Arnaud Giersch, Benjamin Schwarz, and Éric Violard LSIIT-ICPS, Université Louis Pasteur, Bd S. Brant,
DECENTRALIZED LOAD BALANCING IN HETEROGENEOUS SYSTEMS USING DIFFUSION APPROACH
DECENTRALIZED LOAD BALANCING IN HETEROGENEOUS SYSTEMS USING DIFFUSION APPROACH P.Neelakantan Department of Computer Science & Engineering, SVCET, Chittoor [email protected] ABSTRACT The grid
Fourth-Order Compact Schemes of a Heat Conduction Problem with Neumann Boundary Conditions
Fourth-Order Compact Schemes of a Heat Conduction Problem with Neumann Boundary Conditions Jennifer Zhao, 1 Weizhong Dai, Tianchan Niu 1 Department of Mathematics and Statistics, University of Michigan-Dearborn,
The Heat Equation. Lectures INF2320 p. 1/88
The Heat Equation Lectures INF232 p. 1/88 Lectures INF232 p. 2/88 The Heat Equation We study the heat equation: u t = u xx for x (,1), t >, (1) u(,t) = u(1,t) = for t >, (2) u(x,) = f(x) for x (,1), (3)
Experiments on the local load balancing algorithms; part 1
Experiments on the local load balancing algorithms; part 1 Ştefan Măruşter Institute e-austria Timisoara West University of Timişoara, Romania [email protected] Abstract. In this paper the influence
Domain Decomposition Methods. Partial Differential Equations
Domain Decomposition Methods for Partial Differential Equations ALFIO QUARTERONI Professor ofnumericalanalysis, Politecnico di Milano, Italy, and Ecole Polytechnique Federale de Lausanne, Switzerland ALBERTO
Static Load Balancing of Parallel PDE Solver for Distributed Computing Environment
Static Load Balancing of Parallel PDE Solver for Distributed Computing Environment Shuichi Ichikawa and Shinji Yamashita Department of Knowledge-based Information Engineering, Toyohashi University of Technology
Pricing Barrier Option Using Finite Difference Method and MonteCarlo Simulation
Pricing Barrier Option Using Finite Difference Method and MonteCarlo Simulation Yoon W. Kwon CIMS 1, Math. Finance Suzanne A. Lewis CIMS, Math. Finance May 9, 000 1 Courant Institue of Mathematical Science,
The HLLC Riemann Solver
The HLLC Riemann Solver Eleuterio TORO Laboratory of Applied Mathematics University of Trento, Italy [email protected] http://www.ing.unitn.it/toro August 26, 212 Abstract: This lecture is about a method
Part II: Finite Difference/Volume Discretisation for CFD
Part II: Finite Difference/Volume Discretisation for CFD Finite Volume Metod of te Advection-Diffusion Equation A Finite Difference/Volume Metod for te Incompressible Navier-Stokes Equations Marker-and-Cell
New Hash Function Construction for Textual and Geometric Data Retrieval
Latest Trends on Computers, Vol., pp.483-489, ISBN 978-96-474-3-4, ISSN 79-45, CSCC conference, Corfu, Greece, New Hash Function Construction for Textual and Geometric Data Retrieval Václav Skala, Jan
Mean Value Coordinates
Mean Value Coordinates Michael S. Floater Abstract: We derive a generalization of barycentric coordinates which allows a vertex in a planar triangulation to be expressed as a convex combination of its
Sound Power Measurement
Sound Power Measurement A sound source will radiate different sound powers in different environments, especially at low frequencies when the wavelength is comparable to the size of the room 1. Fortunately
The Application of a Black-Box Solver with Error Estimate to Different Systems of PDEs
The Application of a Black-Bo Solver with Error Estimate to Different Systems of PDEs Torsten Adolph and Willi Schönauer Forschungszentrum Karlsruhe Institute for Scientific Computing Karlsruhe, Germany
ME6130 An introduction to CFD 1-1
ME6130 An introduction to CFD 1-1 What is CFD? Computational fluid dynamics (CFD) is the science of predicting fluid flow, heat and mass transfer, chemical reactions, and related phenomena by solving numerically
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications
Stochastic Inventory Control
Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the
Short Programs for functions on Curves
Short Programs for functions on Curves Victor S. Miller Exploratory Computer Science IBM, Thomas J. Watson Research Center Yorktown Heights, NY 10598 May 6, 1986 Abstract The problem of deducing a function
Multiple Optimization Using the JMP Statistical Software Kodak Research Conference May 9, 2005
Multiple Optimization Using the JMP Statistical Software Kodak Research Conference May 9, 2005 Philip J. Ramsey, Ph.D., Mia L. Stephens, MS, Marie Gaudard, Ph.D. North Haven Group, http://www.northhavengroup.com/
HPC enabling of OpenFOAM R for CFD applications
HPC enabling of OpenFOAM R for CFD applications Towards the exascale: OpenFOAM perspective Ivan Spisso 25-27 March 2015, Casalecchio di Reno, BOLOGNA. SuperComputing Applications and Innovation Department,
Chapter 10: Network Flow Programming
Chapter 10: Network Flow Programming Linear programming, that amazingly useful technique, is about to resurface: many network problems are actually just special forms of linear programs! This includes,
LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.
LS-DYNA Scalability on Cray Supercomputers Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. WP-LS-DYNA-12213 www.cray.com Table of Contents Abstract... 3 Introduction... 3 Scalability
Load Balancing on a Grid Using Data Characteristics
Load Balancing on a Grid Using Data Characteristics Jonathan White and Dale R. Thompson Computer Science and Computer Engineering Department University of Arkansas Fayetteville, AR 72701, USA {jlw09, drt}@uark.edu
Feedback guided load balancing in a distributed memory environment
Feedback guided load balancing in a distributed memory environment Constantinos Christofi August 18, 2011 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2011 Abstract
Analysis of Load Frequency Control Performance Assessment Criteria
520 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 16, NO. 3, AUGUST 2001 Analysis of Load Frequency Control Performance Assessment Criteria George Gross, Fellow, IEEE and Jeong Woo Lee Abstract This paper presents
TESLA Report 2003-03
TESLA Report 23-3 A multigrid based 3D space-charge routine in the tracking code GPT Gisela Pöplau, Ursula van Rienen, Marieke de Loos and Bas van der Geer Institute of General Electrical Engineering,
General Framework for an Iterative Solution of Ax b. Jacobi s Method
2.6 Iterative Solutions of Linear Systems 143 2.6 Iterative Solutions of Linear Systems Consistent linear systems in real life are solved in one of two ways: by direct calculation (using a matrix factorization,
Load balancing in a heterogeneous computer system by self-organizing Kohonen network
Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.
Chapter 4: Vector Autoregressive Models
Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...
Cryptography and Network Security. Prof. D. Mukhopadhyay. Department of Computer Science and Engineering. Indian Institute of Technology, Kharagpur
Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Module No. # 01 Lecture No. # 12 Block Cipher Standards
Energy Efficient Load Balancing among Heterogeneous Nodes of Wireless Sensor Network
Energy Efficient Load Balancing among Heterogeneous Nodes of Wireless Sensor Network Chandrakant N Bangalore, India [email protected] Abstract Energy efficient load balancing in a Wireless Sensor
Image Compression through DCT and Huffman Coding Technique
International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul
Co-simulation of Microwave Networks. Sanghoon Shin, Ph.D. RS Microwave
Co-simulation of Microwave Networks Sanghoon Shin, Ph.D. RS Microwave Outline Brief review of EM solvers 2D and 3D EM simulators Technical Tips for EM solvers Co-simulated Examples of RF filters and Diplexer
Interactive simulation of an ash cloud of the volcano Grímsvötn
Interactive simulation of an ash cloud of the volcano Grímsvötn 1 MATHEMATICAL BACKGROUND Simulating flows in the atmosphere, being part of CFD, is on of the research areas considered in the working group
Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations
Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams, 1990 Presented by Chris Eldred Outline Summary Finite Element Solver Load Balancing Results Types Conclusions
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
Energy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
SMT 2014 Algebra Test Solutions February 15, 2014
1. Alice and Bob are painting a house. If Alice and Bob do not take any breaks, they will finish painting the house in 20 hours. If, however, Bob stops painting once the house is half-finished, then the
SOFTWARE FOR GENERATION OF SPECTRUM COMPATIBLE TIME HISTORY
3 th World Conference on Earthquake Engineering Vancouver, B.C., Canada August -6, 24 Paper No. 296 SOFTWARE FOR GENERATION OF SPECTRUM COMPATIBLE TIME HISTORY ASHOK KUMAR SUMMARY One of the important
Using simulation to calculate the NPV of a project
Using simulation to calculate the NPV of a project Marius Holtan Onward Inc. 5/31/2002 Monte Carlo simulation is fast becoming the technology of choice for evaluating and analyzing assets, be it pure financial
Clustering and scheduling maintenance tasks over time
Clustering and scheduling maintenance tasks over time Per Kreuger 2008-04-29 SICS Technical Report T2008:09 Abstract We report results on a maintenance scheduling problem. The problem consists of allocating
Parallel Scalable Algorithms- Performance Parameters
www.bsc.es Parallel Scalable Algorithms- Performance Parameters Vassil Alexandrov, ICREA - Barcelona Supercomputing Center, Spain Overview Sources of Overhead in Parallel Programs Performance Metrics for
Numerical Methods for Option Pricing
Chapter 9 Numerical Methods for Option Pricing Equation (8.26) provides a way to evaluate option prices. For some simple options, such as the European call and put options, one can integrate (8.26) directly
Research Statement Immanuel Trummer www.itrummer.org
Research Statement Immanuel Trummer www.itrummer.org We are collecting data at unprecedented rates. This data contains valuable insights, but we need complex analytics to extract them. My research focuses
