Parallel Smoothers for Matrix-based Multigrid Methods on Unstructured Meshes Using Multicore CPUs and GPUs

Size: px
Start display at page:

Download "Parallel Smoothers for Matrix-based Multigrid Methods on Unstructured Meshes Using Multicore CPUs and GPUs"

Transcription

1 Parallel Smooters for Matrix-based Multigrid Metods on Unstructured Meses Using Multicore CPUs and GPUs Vincent Heuveline Dimitar Lukarski Nico Trost Jan-Pilipp Weiss No. 2-9 Preprint Series of te Engineering Matematics and Computing Lab (EMCL) KIT University of te State of Baden-Wuerttemberg and National Researc Center of te Helmoltz Association

2 Preprint Series of te Engineering Matematics and Computing Lab (EMCL) ISSN No. 2-9 Impressum Karlsrue Institute of Tecnology (KIT) Engineering Matematics and Computing Lab (EMCL) Fritz-Erler-Str. 23, building Karlsrue Germany KIT University of te State of Baden Wuerttemberg and National Laboratory of te Helmoltz Association Publised on te Internet under te following Creative Commons License: ttp://creativecommons.org/licenses/by-nc-nd/3./de.

3 Parallel Smooters for Matrix-based Multigrid Metods on Unstructured Meses Using Multicore CPUs and GPUs Vincent Heuveline, Dimitar Lukarski,2, Nico Trost and Jan-Pilipp Weiss,2 Engineering Matematics and Computing Lab (EMCL) 2 SRG New Frontiers in Hig Performance Computing Karlsrue Institute of Tecnology, Germany {vincent.euveline, dimitar.lukarski, jan-pilipp.weiss}@kit.edu, nico.trost@student.kit.edu Abstract. Multigrid metods are efficient and fast solvers for problems typically modeled by partial differential equations of elliptic type. For problems wit complex geometries and local singularities stencil-type discrete operators on equidistant Cartesian grids need to be replaced by more flexible concepts for unstructured meses in order to properly resolve all problem-inerent specifics and for maintaining a moderate number of unknowns. However, flexibility in te meses goes along wit severe drawbacks wit respect to parallel execution especially wit respect to te definition of adequate smooters. Tis point becomes in particular pronounced in te framework of fine-grained parallelism on GPUs wit undreds of execution units. We use te approac of matrixbased multigrid tat as ig flexibility and adapts well to te exigences of modern computing platforms. In tis work we investigate multi-colored Gauß-Seidel type smooters, te power(q)-pattern enanced multi-colored ILU(p) smooters wit fillins, and factorized sparse approximate inverse (FSAI) smooters. Tese approaces provide efficient smooters wit a ig degree of parallelism. In combination wit matrix-based multigrid metods on unstructured meses our smooters provide powerful solvers tat are applicable across a wide range of parallel computing platforms and almost arbitrary geometries. We describe te configuration of our smooters in te context of te portable lmplatoolbox and te HiFlow 3 parallel finite element package. In our approac, a single source code can be used across diverse platforms including multicore CPUs and GPUs. Higly optimized implementations are idden beind a unified user interface. Efficiency and scalability of our multigrid solvers are demonstrated by means of a compreensive performance analysis on multicore CPUs and GPUs. Keywords: Parallel smooters, unstructured meses, matrix-based multigrid, multi-coloring, power(q)-pattern metod, FSAI, multi-core, GPUs Corresponding autor: dimitar.lukarski@kit.edu

4 2 Introduction Te need for ig accuracy and sort simulation times relies bot on efficient numerical scemes and appropriate scalable parallel implementations. Te latter point is crucial in te context of fine-grained parallelism in te prospect of te manycore era. In particular, grapics processing units (GPUs) open new potentials in terms of computing power and internal bandwidt values. Tese arcitectures ave a significant impact on te design and implementation of parallel algoritms in numerical simulation. Te algoritms need to be laid out suc tat tousands of fine-grained treads can run in parallel in order to feed undreds of computing units. Portability of solver concepts, codes and performance across several platforms is furter a major issue in te era of igly capable multicore and accelerator platforms. Multigrid metods rely on te decomposition of te error in low- and igfrequency contributions. Te so-called smooters damp out te ig frequency contributions of te error at a given level. Adequate prolongation and restriction operators allow to address te considered problem on a ierarcy of discretizations and by tis means cover te full spectral range of error contributions only on te basis of tese smooters (see [2] and references terein for furter details). For full flexibility of te solvers in te context of complex geometries, stencil-based multigrid metods need to be replaced by more flexible concepts. We use te approac of matrix-based multigrid were all operations i.e. smooters, grid transfers and residual computation are represented by sparse matrix-vector multiplications (SpMV). Tis approac is sown to work well on modern multicore platforms and GPUs [3]. Moreover, te restriction to basic algoritmic building blocks is te key tecnique for building portable solvers. Te major callenge wit respect to parallelism is related to te definition and implementation of adequate parallel smooters. Flexible multigrid metods on emerging multicore tecnologies are subject of recent researc. Parallel multigrid smooters and preconditioners on regularly structured tensor-product meses (wit a fixed number of neigbors but arbitrary displacements) are considered in [7]. Te autor discusses parallel implementation aspects for several platforms and sows integration of accelerators into a finite element software package. In [6], geometric multigrid on unstructured meses for iger order finite element metods is investigated. A parallel Jacobi smooter and grid transfer operators are assembled into sparse matrix representation leading to efficient solvers on multicore CPUs and GPUs. In [5] performance results for multigrid solvers based on te sparse approximate inverse (SPAI) tecnique are presented. An alternative approac witout te need for mes ierarcies are algebraic multigrid metods (AMG) []. An implementation and performance results of an AMG on GPUs are discussed in [8]. In tis work we propose a new parallel smooter based on te power(q)- pattern enanced multi-colored ILU(p) factorization [4]. We compare it to multicolored splitting-type smooters and FSAI smooters. Tese approaces provide efficient smooters wit scalable parallelism across multicore CPUs and GPUs. Various ardware platforms can be easily used in te context of tese solvers by

5 3 means of te portable implementation based on te lmplatoolbox in te parallel finite element software package HiFlow 3 [2]. Te capability of te proposed approac is demonstrated in a compreensive performance analysis. Tis paper is organized as follows. In Section 2 we describe te context of matrix-based multigrid metods. Section 3 describes te parallel concepts for efficient and scalable smooters. Implementation aspects and te approac taken in te HiFlow 3 finite element metod (FEM) package are presented in Section 4. Te numerical test problem is te Poisson problem on an L-saped domain. Te impact of te coice and te configuration of te smooters on te MG convergence is investigated in Section 5. A performance analysis wit respect to solver times and parallel speedups on multicore CPUs and GPUs is te central teme in Section 6. After an outlook on future work in Section 7 we conclude in Section 8. 2 Matrix-based Multigrid Metods Multigrid (MG) metods are usually used to solve large sparse linear systems of equations arising from finite element discretizations (or related tecniques) of partial differential equations (PDEs) typically of elliptic type. Due to te ability to acieve asymptotically optimal complexity, MG as been proven to be one of te most efficient solvers for tis type of problems. In contrast to Krylov subspace solvers, te number of needed iterations in te MG metod in order to acieve a prescribed accuracy does not depend on te number of unknowns N or te grid spacing. Hence, te costs on finer grids only increase linearly in N due to te complexity of te applied operators. In tis sense, MG is superior to oter iterative metods (see e.g. [2,9]). Te main idea of MG metods is based on coarse grid corrections and te smooting properties of classical iterative scemes. Hig frequency error components can be eliminated efficiently by using elementary iterative solvers suc as Jacobi or Gauß-Seidel. On te oter and, smoot error components cannot be reduced efficiently on fine grids. Tis effect can be mitigated by performing a sequence of coarsening steps and transferring te smoot errors by restricting te defect to coarser mes levels. On eac coarser mes, pre-smooting is applied to reduce error components inerited by te grid transfer. Moreover, on tis coarser level te smoot components can be damped muc faster. Wen te coarsest level is reaced, te so-called coarse grid problem as to be solved exactly or by an iterative metod wit sufficient accuracy. By applying te prolongation operator to te corrected defect te next finer level can be reaced again. Any remaining ig frequency error components can be eliminated by post-smooting. Tis recursive execution of inter-grid transfer operators and smooters is called te MG cycle. Details on te basic properties of te MG metod as well as error analysis and programming remarks can be found in [2, 9, 8, 4] and in references provided terein. Aspects of parallel MG are e.g. discussed in [6]. Te MG cycle is an iterative and recursive metod, sown in Algoritm. Here, L u = f is te discrete version of te underlying PDE on te refinement

6 4 level wit mes size. On eac level, pre- and post-smooting is applied in te form u (n) = RELAX ν (ũ (n), L, f ). Te parameter ν denotes te number of smooting iterations. Grid transfer operators are denoted by I H (restriction) and IH (prolongation) were is representing te fine grid size and H is te coarse grid size. Te transferred residual r H is te input to te MG iteration on te coarser level were te initial value for te error is taken by zero. Te parameter γ specifies te number of two-grid cycle iterations on eac level and tus specifies ow often te coarsest level is visited. For γ = we obtain so called V-cycles wereas γ = 2 results in W-cycles. Te obtained solution after eac cycle serves as te input for te successive cycle. Algoritm : Multigrid cycle: u (n) = MG(u (n ), L, f, ν, ν 2, γ) () ū (n) := RELAX ν (u (n ), L, f ) pre-smooting (2) r (n) := f L ū (n) compute residual (3) r (n) H := IH r (n) restriction (4) if (H == ) ten L H e (n) H = r(n) H exact solution on te coarse grid (5) else e (n) H = MG(, L H, r (n) H, ν, ν 2, γ) recursion end if (6) e (n) := IHe (n) H prolongation (7) ũ (n) := ū (n) + e(n) correction (8) u (n) = RELAX ν 2 (ũ (n), L, f ) post-smooting In tis work, we consider matrix-based geometric MG metods [9]. In contrast to stencil-based geometric MG metods, all differential operators and grid transfer operators are not expressed by fixed stencils on equidistant grids but ave te full flexibility of sparse matrix representations. On te one and, tis approac gives us flexibility wit respect to complex geometries, non-uniform grids resulting from local mes refinements, and space-dependent coefficients in te underlying PDE. On te oter and, te solvers can be built upon standard building blocks of numerical libraries. Te convergence and robustness of MG metods strongly depend on te applied smooters. Wile classical MG performs very well wit additive smooters suc as Gauß-Seidel or Jacobi, strong anisotropies require more advanced smooters in order to acieve full MG efficiency. In te latter case, incomplete LU decompositions ave demonstrated convincing results, e.g. for 2D cases in [2]. 3 Parallel Smooters for Matrix-Based Multigrid Stencil-based geometric MG metods can be efficiently performed in parallel by domain sub-structuring and alo excange for te stencils [6]. In contrast,

7 5 te parallel implementation of a matrix-based MG requires more work. Wile te grid transfer operators are explicit updates wit sufficient locality, parallel smooters rely on implicit steps were typically triangular systems need to be solved. And as we are working on unstructured meses, straigtforward parallelization strategies like wave-front or red-black-coloring cannot be applied. Terefore, te focus of our work is directed to fine-grained parallelism for te smooting step. We consider additive and multiplicative matrix splittings used for designing efficient parallel smooters considering te following two restrictions: first, te proposed smooters sould rely on a ig degree of parallelism and sould sow significant speedups on multicore CPUs and GPUs. Second, te parallel versions of te smooters sould maintain smooting properties comparable to teir sequential version. Te considered smooters are based on te block-wise decomposition into small sub-matrices. In te additive and multiplicative splittings, typically a large amount of forward and backward sweeps in triangular solvers needs to be processed. For te description of te proposed parallel smooters we point out te link between smooters and preconditioning tecniques. Iterative solvers can generally be interpreted in fixed point form by x k+ = Gx k + f were te linear system of te type Ax = b is transformed by te additive splitting A = M + N and taking G = M N = M (M A) = I M A and f = M b. Tis version can be reformulated as a preconditioned defect correction sceme given by x k+ = x k + M (b Ax k ). () In tis context, we apply preconditioners M as smooters for te linear system Ax = b. Additive preconditioners are standard splitting scemes typically based on te block-wise decomposition A = D + L + R were L is a strictly lower triangular matrix, R is a strictly upper triangular matrix, and D is te matrix containing te diagonal blocks of A. We coose M = D (Jacobi), M = D + L (Gauß-Seidel) or M = (D + L)D (D + R) (symmetric Gauß-Seidel). For multiplicative splittings we coose M = LU in () were L is a lower triangular and U is an upper triangular matrix. For incomplete LU (ILU) factorizations wit or witout fill-ins we decompose te system matrix A into te product A = LU + R wit a remainder matrix R tat absorbs unwanted fill-in elements. Typically, diagonal entries of L are taken to be one and bot matrices L and U are stored in te same data structure (omitting te ones). Te quality of te approximation in tis case depends on te number of fill-in elements. Te tird class of considered parallel smooters are te approximate inverse preconditioners. Here, we are focusing on te factorized sparse approximate inverse (FSAI) algoritms [7] tat compute a direct approximation of A. Tese scemes are based on te minimization of te Frobenius norm I GA F were one looks for a symmetric preconditioner in te form G := G T L G L. In oter words one directly builds an approximation of te Colesky decomposition based on a given sparse matrix structure. FSAI() uses te sparsity pattern of A, FSAI(q), q 2, uses te sparsity pattern of A q respectively.

8 6 In order to arness parallelism witin eac block of te decomposition we apply multi-coloring tecniques [4]. For splitting-based metods (like Gauß- Seidel and SOR) and ILU() decompositions witout fill-ins te original sparsity pattern of te system matrix is preserved in te additive or multiplicative decompositions. Here, only subsets of te original sparsity pattern are populated and no additional matrix elements are inserted. Before applying te smooter, te matrix is re-organized suc tat diagonal blocks in te block decomposition are diagonal itself. Ten, inversion of te diagonal blocks is just an easy vector operation [3]. Furtermore, we allow fill-ins (i.e. additional matrix elements) in te ILU(p) factorization for acieving a iger level of coupling wit increased efficiency. Te power(q)-pattern metod is applied to ILU(p) factorizations wit fill-ins [4]. Tis metod is based on an incomplete factorization of te system matrix A subject to a predetermined non-zero pattern derived from a multi-coloring analysis of te matrix power A q and its associated multi-coloring permutation π. It as been proven in [4] tat te obtained sparsity pattern is a superset of te modified ILU(p) factorization applied to πaπ. As a result, for q = p + tis modified ILU(p,q) sceme applied to te multi-colored system matrix as no fill-ins into its diagonal blocks. Tis leads to an inerently parallel execution of triangular ILU(p,q) sweeps and ence to a parallel and efficient smooter. Te degree of parallelism can be increased by taking q < p + at te expense of some fill-ins into te diagonal blocks. In tis scenario (e.g. for te ILU(,) smooter in our experiments) we use a drop-off tecnique tat erases fill-ins into te diagonal blocks. Tese tecniques ave already been successfully tested in te context of parallel preconditioners [4]. Te major advantage is tat multicoloring can be applied before performing te ILU(p) wit additional fill-ins were fill-ins only occur outside te blocks on te diagonal. By precomputing te superset of te data distribution pattern we also eliminate costly insertion of new matrix elements into dynamic data structures and obtain compact loops [4]. 4 Implementation Aspects Flexibility of solution tecniques and software is a decisive factor in te current computing landscape. We ave incorporated te described multigrid solvers and parallel smooters into te multi-platform and multi-purpose parallel finite element software package HiFlow 3 [, 2]. Wit te concept of object-oriented programming in C++, HiFlow 3 provides a flexible, generic and modular framework for building efficient parallel solvers and preconditioners for PDEs of various types. HiFlow 3 tackles productivity and performance issues by its conceptual approac. It is structured in several modules. Its linear algebra operations are based on two communication and computation layers, te LAtoolbox for internode operations and te lmplatoolbox for intra-node operations in a parallel system. Te lmplatoolbox as backends for multiple platforms wit igly optimized implementations iding all ardware details from te user wile

9 7 maintaining optimal usage of resources. Numerical solvers in HiFlow 3 are built on te basis of unified interfaces to te different ardware backends and parallel programming approaces were only a single code base is required for all platforms. By tis means, HiFlow 3 is portable across diverse computing systems including multicore CPUs, CUDA-enabled GPUs and OpenCL-capable platforms. Te modular approac also allows an easy extension to emerging systems wit eterogeneous configuration. Te main modules of HiFlow 3 are described in te following: Mes, DoF/FEM and Linear Algebra. See Figure for an abstraction. Mes - Tis module provides a set of classes and functions for andling general types of meses. Tis library deals wit complex unstructured distributed meses wit different types of cells via a uniform interface. Te data structures provided can be used to represent meses of arbitrary topological and geometrical dimension, and even wit different cell types in one and te same mes. It is possible to create ierarcies of meses troug refinement and also to coarsen cells given suc a ierarcy. DoF/FEM (Degrees of Freedom / Finite Element Metods) - Treatment of te FEM and DoF is igly interdependent and terefore, te major idea of tis module is to capture bot sub-modules and unify tem witin a single scope. Te DoF sub-module deals wit te numbering of all DoF resulting from te finite element ansatz on eac mes cell. It can andle discontinuous and continuous finite element ansantz functions. Tis data is passed to te FEM sub-module. Tis part of te library andles te task of representing te cosen finite element ansatz on te reference cell and transforming it into te pysical space depending on its geometry. Linear Algebra - Tis module andles te basic linear algebra operations and offers (non-)linear solvers and preconditioners. It is implemented as a two-level library: te global level is an MPI-layer wic andles te distribution of data among te nodes and performs cross-node computations. Te local level (local multi-platform LAtoolbox) takes care of te on-node routines offering a unified interface to several platforms by providing different platform backends, e.g. to multicore CPUs (OpenMP, MKL) and GPUs (CUDA, OpenCL). In tis work we are only focusing on single node parallelism. Terefore, we do not consider te global MPI level. Based on te matrix and vector classes HiFlow 3 offers several Krylov subspace solvers and local additive and multiplicative preconditioners. Furter information about tis library, solvers and preconditioners can be found in [4, 2,3, 5]. Implementation of Matrix-based Multigrid in HiFlow 3 Te proposed multigrid solvers and smooters ave been implemented in te context of HiFlow 3. Starting on an initial mes, te HiFlow 3 mes module refines

10 8 Fig.. Modular configuration of te parallel FEM package HiFlow 3. te grid in order to reac te requested number of cells according to te demands for accuracy. Te current version supports quadrilateral elements as well as triangles for two dimensional problems. In te tree-dimensional case tetraedrons and exaedrons are used. For our two-dimensional test case, an unstructured mixed mes containing bot types of elements wit local refinements as been used. On eac refinement level, te corresponding stiffness matrix and te rigt and side are assembled for te given equation. Our implementation performs a bi-linear interpolation to ascend to a finer level and a full weigting restriction to descend to a coarser level respectively. Wit te relation I2 = 2dim (I 2)T for te intergrid transfer operators for te standard coarsening procedure, it is sufficient to assemble te prolongation matrix only. Due to te modular approac of te lmplatoolbox, te matrix-based MG solver is based on a single source code for all different platforms (CPUs, GPUs, etc). Te grid transformations and te defect computation are based on matrixvector and vector-vector routines wic can be efficiently performed in parallel on te considered devices. Moreover, a similar approac is taken for te smooting steps based on te preconditioned defect correction (). In tis context, our smooters make use of parallel preconditioners of additive and multiplicative type. See also [4] for a detailed description of te parallel concepts for preconditioners, te power(q)-pattern enanced multi-colored ILU(p) scemes and a compreensive performance analysis wit respect to solver acceleration and parallel speedups on multicore CPUs and GPUs.

11 9 5 Numerical Experiments and Smooter Efficiency As a numerical test problem we are solving te Poisson problem wit omogeneous Diriclet boundary conditions u = f in Ω, (2) u = on Ω in te two-dimensional L-saped domain Ω := (,) 2 \(,.5) 2 wit a reentrant corner. For our test case we coose te rigt and side f = 8π 2 cos(2πx) cos(2πy). We solve (2) by means of bilinear Q elements on quadrilaterals and linear P elements on triangles [, 2]. A uniform discretization of te L-saped domain is depicted in Figure 2 (left). Te numerical solution of (2) wit boundary layers is detailed in Figure 2 (rigt). Due to te steep gradients at te reentrant corner we Fig. 2. Discretization of locally refined L-saped domain (left) and discrete solution of te Poisson problem (2) (rigt). are using a locally refined mes in order to obtain a proper problem resolution. Figure 3 (left) sows a zoom-in into a uniformly refined coarse mes. In te mes in Figure 3 (rigt) we are using additional triangular and quadrilateral elements for local refinement and avoiding anging nodes and deformed elements []. Our coarsest mes in te multigrid ierarcy is a locally refined mes based on triangular and quadrilateral cells wit Q and P elements summing up to 3,266 degrees of freedom (DoF). By applying six global refinement steps to tis coarse mes we obtain te ierarcy of nested grids were te finest mes as 3,2,425 DoF. Details of te mes ierarcy are listed in Table. None of our grids is a uniform grid wit equidistant mes spacings. For te inter-grid operations we are using full weigting restrictions and bi-linear interpolations.

12 Fig. 3. Zoom-in to a uniform mes wit 3,2 DoF (left) and a locally refined mes wit 3,266 DoF (rigt) of te L-saped domain around te reentrant corner our coarsest MG mes wit level. level #cells #DoF max min at (.5,.5) 3,77 3, ,78 2, ,832 5, ,328 2, ,32 83, ,253,248 3,2, Table. Caracteristics of te six refinement levels of te MG ierarcy. For te motivation of our locally refined grids we consider te following example. Te gradient of te solution on a fine uniform mes wit 3,49,825 DoF sows a low resolution as can be seen in Figure 4 (left). Te gradient of te solution on te locally refined mes wit 3,2,425 DoF (our finest grid in te multigrid ierarcy) is approximated muc more accurately as depicted in Figure 4 (rigt). Clearly, wit te same amount of unknowns we can solve te problem wit iger accuracy only on locally refined meses. Tere is a significant improvement of approximation quality wit less tan 2 % additional grid points. Fig. 4. Zoom-in plots of te gradient of te solution of (2)on a uniform mes wit 3,49,825 DoF (left) and on a locally refined mes wit 3,2,425 DoF (rigt) our finest MG mes wit level 6. In tis work we study te beavior of parallel smooters based on additive splittings (Jacobi, multi-colored Gauß-Seidel and multi-colored symmetric Gauß-

13 Seidel), incomplete factorizations (multi-colored ILU() and power(q)-pattern enanced multi-colored ILU(p) wit fill-ins) and FSAI smooters. We investigate teir properties wit respect to ig-frequency error reduction and convergence beavior of te V- and W-cycle parallel MG. In particular, we consider te multigrid beavior on different test platforms. Our numerical experiments are performed on a ybrid platform, a dual-socket Intel Xeon (E545) quad-core system (wit eigt cores in total) tat is accelerated by an NVIDIA Tesla S7 GPU system wit four GPUs attaced pairwise by PCIe to one socket eac. Te memory capacity of a single CPU and GPU device is 6GB and 4GB respectively. We are only using double precision for te computations. We perform several tests wit different configurations for te pre- and postsmooting steps. We determine te number of cycles required to acieve a relative residual less tan 6. In Table 2 te iteration counts are sown for te V-cycle based MG. Note, tat te iteration counts in tis table do not reflect te total amount of work. Some smooting steps, e.g. ILU(p), need more work tan oters. From a teoretical point of view, a Gauß-Seidel smooting step is alf as costly as a symmetric Gauß-Seidel step; ILU() is ceaper tan ILU(,) wic is ceaper tan ILU(,2). For te MG solver, ν +ν 2 is te number of smooting steps on eac level and sould be kept low in order to reduce te total amount of work (and ence te solver time). For estimation of te corresponding work load, computing times are included for te MG solver on te GPU platform. For Jacobi smooters tere is no convergence of te MG solver witin 99 iterations. Multi-colored symmetric Gauß-Seidel (SGS) is better tan multi-colored Gauß-Seidel (GS) in terms of iteration count. Multi-colored ILU smooters are better tan additive factorizations (GS, SGS). Te quality of te ILU(p) scemes in terms of iteration counts gets better wit increasing p. Te drop-off tecnique for ILU(,) is providing only little improvements compared to ILU(). Iteration counts for te FSAI smooters are in te same range as te ILU smooters. Best candidates wit respect to reduced iteration counts are ILU(,2) and FSAI(2). Minima wit respect to te iteration count can be found for configurations were ν + ν 2 = 3 or 4. For larger values tere are no more significant improvements. Te run time results sow tat te benefits for te SGS smooter in terms of smooting properties and reduced iteration count are eaten up in terms of run time due to te additional overead in eac smooting step and an additional V-cycle. For te ILU smooters te additional work complexity still yields improvements in run time. Te best performance is obtained for te ILU(,2) smooter. Te drop-off tecnique for ILU(,) as some diminisment in performance. Te FSAI smooters give no particular improvements in run time compared to te oter smooters. Table 3 sows iteration counts and run times for te W-cycle based MG solver. Te iteration counts are reduced compared to te V-cycle based MG solver. However, te run times sow tat te W-cycle based MG solver is by a factor of 2 to 3 slower tan te V-cycle based MG solver. Te W-cycle based MG solver is slower because more smooting steps are performed on coarser grids. Tis involves a larger number of calls to SpMV routines for small matrices wit

14 2 (ν, ν 2 ) (,) (,2) (,3) (,4) (,) (,) (,2) (,3) (,4) (2,) (2,) (2,2) Jacobi [#its] >99 >99 >99 >99 >99 >99 >99 >99 >99 >99 >99 >99 GS [#its] time [sec] SGS [#its] time[sec] ILU() [#its] time [sec] ILU(,) [#its] time [sec] ILU(,2) [#its] time [sec] FSAI() [#its] time [sec] FSAI(2) [#its] time [sec] FSAI(3) [#its] time [sec] (ν, ν 2 ) (2,3) (2,4) (3,) (3,) (3,2) (3,3) (3,4) (4,) (4,) (4,2) (4,3) (4,4) Jacobi [#its] >99 >99 >99 >99 >99 >99 >99 >99 >99 >99 >99 >99 GS [#its] time [sec] SGS [#its] time [sec] ILU() [#its] time [sec] ILU(,) [#its] time[sec] ILU(,2) [#its] time [sec] FSAI() [#its] time [sec] FSAI(2) [#its] time [sec] FSAI(3) [#its] time [sec] Table 2. Number of MG V-cycles for different smooters and total run times of te V-cycle based MG solver on te GPU for different smooter configurations; ν is te number of pre-smooting steps, ν 2 is te post-smooting step count. significant overead (in particular kernel call overeads on te GPU). Te best results for te W-cycle based MG are obtained for te ILU() smooter wit Gauß-Seidel following next. Except of te FSAI smooters, all oter smooters are on te same performance level.

15 (ν, ν 2 ) (,) (,2) (,3) (,4) (,) (,) (,2) (,3) (,4) (2,) (2,) (2,2) GS [#its] time [sec] SGS [#its] time [sec] ILU() [#its] time [sec] ILU(,) [#its] time [sec] ILU(,2) [#its] time [sec] FSAI() [#its] time [sec] FSAI(2) [#its] time [sec] FSAI(3) [#its] time [sec] (ν, ν 2 ) (2,3) (2,4) (3,) (3,) (3,2) (3,3) (3,4) (4,) (4,) (4,2) (4,3) (4,4) GS [#its] time [sec] SGS [#its] time [sec] ILU() [#its] time [sec] ILU(,) [#its] time [sec] ILU(,2) [#its] time [sec] FSAI() [#its] time [sec] FSAI(2) [#its] time [sec] FSAI(3) [#its] time [sec] Table 3. Number of multigrid W-cycles for different smooters and total run times of te W-cycle based MG solver on te GPU for different smooter configurations; ν is te number of pre-smooting steps, ν 2 is te post-smooting step count. In Table 4 te run times and iteration counts for te best V-cycle and W- cycle based smooter configurations for te parallel MG solvers on te GPU are summarized. Te first column in tis table specifies te optimal values for te parameters ν and ν 2 tat correspond to te number of pre- and post-smooting steps on eac refinement level. Te tird column gives te number of necessary MG cycle iterations.

16 4 V-cycle based MG W-cycle based MG Smooter (ν, ν 2 ) time [sec] #its (ν, ν 2 ) time [sec] #its GS (,4) (2,3) 7. 6 SGS (,3) (2,3) ILU() (,4) (2,3) ILU(,) (,3) (,3) 8. 6 ILU(,2) (,2) (2,2) FSAI() (,2) (,3) FSAI(2) (,2) (,3) 4 4 FSAI(3) (,) (,2).73 5 Table 4. Minimal run times and corresponding iteration counts for te V-cycle and W-cycle based parallel MG solvers on te GPU for various smooters and optimal configurations of te corresponding number of pre- and post-smooting steps (ν, ν 2 ). Compared to te results of te Krylov subspace solvers, MG performance is superior. In Table 5 te run times are listed for te preconditioned conjugate gradient (CG) metod on te CPU and GPU test platforms. Te considered smooters are used as preconditioners in tis context. Te number of CG iterations is given in te first row. We find tat all preconditioners significantly reduce te number of iterations. Run times are presented for te sequential CPU version, te eigt-core OpenMP parallel version, and te GPU version. We see tat te MG solver on te GPU is faster by a factor of up to 6 tan te preconditioned CG solver on te GPU. Te best preconditioner is ILU(,2) on te GPU. Precond None Jacobi SGS ILU() ILU(,) ILU(,2) FSAI() FSAI(2) FSAI(3) # iter 5,65 4,67 2,323 2,45 2,66,387 2,98,493,39 Sequential 492s 34s 433s 472s 347s 498s 27s 95s 82.5s OpenMP 64.4s 573.4s 662.6s 676.9s 63.s 589.7s 435.5s 438.5s 5.32s GPU 273s 28.3s 86.6s 95.s 26.7s 58.s 24.s 28.s s Table 5. Run times in seconds and iteration counts for te preconditioned CG solver for various preconditioners on te CPU (sequential and eigt-core OpenMP parallel) and on te GPU. In te following we consider te smooting properties in more details. In Figure 5 te initial error wit its ig oscillatory parts is sown for a random initial guess. In te following figures we present te reduction of te error (compared to te final MG solution) after one and tree smooting steps wit te corresponding smooter. For tis experiment we coose te locally refined mes of level 2 wit 2,795 DoF (te second coarsest mes in our MG ierarcy). We see tat te effect from Jacobi smooting as sown in Figure 6 is worse tan tat from Gauß-Seidel smooting sown in Figure 7 wic itself is worse tan te symmetric Gauß-Seidel smooter presented in Figure 8. Even more smoot results are observed for ILU(), ILU(,) and ILU(,2) as sown in Figure 9,

17 5 and. Wit te FSAI smooters, some iger order oscillations are still observed after te initial smooting step as can bee seen in Figures 2, 3 and 4. Error of initial guess.5 - Fig. 5. Initial error for te MG iteration wit random initial guess. Error after Jacobi iteration Error after 3 Jacobi iterations Fig. 6. Damped error after (left) and 3 (rigt) Jacobi smooting steps. Error after GS iteration Error after 3 GS iterations Fig. 7. Damped error after (left) and 3 (rigt) Gauß-Seidel smooting steps.

18 6 Error after SGS iteration Error after 3 SGS iterations Fig. 8. Damped error after (left) and 3 (rigt) symmetric Gauß-Seidel smooting steps. Error after ILU() iteration Error after 3 ILU() iterations Fig. 9. Damped error after (left) and 3 (rigt) ILU() smooting steps. Error after ILU(,) iteration Error after 3 ILU(,) iterations Fig.. Damped error after (left) and 3 (rigt) ILU(,) smooting steps. 6 Performance Analysis on Multicore CPUs and GPUs In tis section we conduct a performance analysis wit respect to te different test platforms. We assess te corresponding solver times and te parallel speedups. In Figure 5 we compare run times of te V-cycle (left) and W-cycle (rigt) MG solver wit various smooters for te Poisson problem on te L- saped domain. It sows te results for te sequential version and te OpenMP parallel version on eigt cores of te CPU as well as for te GPU version. For tis test problem, tere are no significant differences in performance for te tested smooters on a specific platform. Te multiplicative ILU-type smooters

19 7 Error after ILU(,2) iteration Error after 3 ILU(,2) iterations Fig.. Damped error after (left) and 3 (rigt) ILU(,2) smooting steps. Error after FSAI iteration Error after 3 FSAI iterations Fig. 2. Damped error after (left) and 3 (rigt) FSAI() smooting steps Error after FSAI2 iteration Error after 3 FSAI2 iterations Fig. 3. Damped error after (left) and 3 (rigt) FSAI(2) smooting steps Error after FSAI3 iteration Error after 3 FSAI3 iterations Fig. 4. Damped error after (left) and 3 (rigt) FSAI(3) smooting steps

20 8 are sligtly faster tan te additive splitting-type smooters. Te ILU-based smooters are more efficient in terms of smooting properties and reduced iteration counts, but tey are more expensive to be executed. In total, bot effects interact suc tat te total execution time is only sligtly better. Best performance results are obtained for te ILU(,2) smooter based MG solver on te GPU. In tis 2D Poisson test problem wit unstructured grids based on Q and P elements, te resulting stiffness matrix as only six colors in te multicoloring decomposition. By increasing te sparsity pattern wit respect to te structure of A 2 used for building te ILU(,2) decomposition, we obtain 6 colors. In tis case te triangular sweeps can still be performed wit a ig degree of parallelism. However, on te CPU te FSAI algoritms perform better due to utilization of cace effects. Te FSAI algoritms are performed by relying on parallel matrix-vector multiplications only. Tis is in contrast to te multicoloring tecnique wic re-orders te matrix by grouping te unknowns. Tis distribution is performing better on te GPU due to te lack of bank conflicts in te sparse matrix-vector multiplications. On te oter and, te FSAI algoritms are performing better on te cace-based arcitecture due to te large number of elements tat can benefit from data reuse. Fig. 5. Run times of V-cycle (left) and W-cyle (rigt) MG solver wit various smooters for te Poisson problem on te L-saped domain: sequential version and OpenMP parallel version on eigt cores on te CPU and GPU version. Te parallel speedups of te V-cycle and W-cycle based MG solvers are detailed in Figure 6. Te OpenMP parallel speedup is sligtly above two. In te sequential run on a single CPU core, about less tan one tird of te eigt core peak bandwidt can be utilized (see measurements in [3]). Terefore, te speedup of te eigt-core OpenMP parallel version is tecnically limited by a factor of less tan tree on tis particular test platform. Tis performance expectations are reflected by our measurements reported ere. Te GPU version is by a factor of two to tree faster tan te OpenMP parallel version. Tese factors are in good conformance wit experience for GPU acceleration of bandwidtbound kernels. For te FSAI smooters, te speedup on te GPU falls a little

21 9 bit beind since it is better suited for CPU arcitectures. Te presented parallel speedup results demonstrate te scalability of our parallel approac for efficient multigrid smooters. Parallel multigrid solvers are typically affected by bad communication/computation ratios and load imbalance on coarser grids. As we are working on sared memory type devices, tese influences do not occur. In contrast owever, our matrix-based multigrid solver depends by construction on te calls to SpMV kernels. On coarser grids, tese kernels ave a significant call overead at te expense of parallel efficiency. Tis as a direct impact on te speedups of te W-cycle based MG solvers on te GPU as can be seen in Figure 6 (rigt). Fig. 6. Parallel speedup of te V-cycle (left) and W-cycle (rigt) based multigrid solvers for various smooters. Similar results are obtained for te performance numbers and speedups of te preconditioned CG solver. In te left part of Figure 7 run time results are listed for various preconditioners. Te rigt figure details corresponding speedups for te OpenMP parallel version and te GPU implementation. Speedups for te CG are larger tan tose for te MG solver. Altoug bot solvers consist of te same building blocks, CG is more efficient since it fully relies on te finest grid of te MG ierarcy wit uge sparse matrices. In contrast, te MG solver is doing work on coarser grids were call overeads for SpMV kernels ave a significant influence on te results. 7 Outlook on Future Work In our future work te proposed multigrid solvers will be extended wit respect to iger order finite element metods. Tis is basically a question of inter-grid transfer operators since HiFlow 3 allows finite elements of arbitrary degree. Furter performance results will be included for our OpenCL backends. More importantly our solvers, smooters and preconditioners will be extended for parallelism on distributed memory systems. Te LAtoolbox in HiFlow 3 is built on an MPI-communication and computation layer. Te major work as

22 2 Fig. 7. Run times and parallel speedups for te preconditioned CG solvers on te CPU (sequential and eigt-core OpenMP parallel version) and on te GPU. to be done on te algoritmic side. Intensive researc needs to be invested into ybrid parallelization were MPI-based parallelism is coupled wit node-level parallelism (OpenMP, CUDA). MPI-parallel versions of efficient preconditioners and smooters on unstructured meses are subject of current researc. Moreover, te ierarcical memory sub-systems and ybrid configurations of modern multiand manycore systems require new ardware-aware algoritmic approaces. Due to te limited efficiency of SpMV kernels on GPUs wit respect to coarse grids, U-cycles for te MG solvers sould be investigated tat stop on finer grids. 8 Conclusion Matrix-based multi-grid solvers on unstructured meses are efficient numerical scemes for solving complex and igly relevant problems. Te paradigm sift towards manycore devices brings up new callenges in two dimensions: first, te algoritms need to express fine-grained parallelism and need to be designed wit respect to scalability to undreds and tousands of cores. Secondly, software solutions and implementations need to be designed flexible and portable in order to exploit te potential of various platforms in a unified approac. Te proposed tecniques for parallel smooters and preconditioners provide bot efficient and scalable parallel scemes. We ave demonstrated ow sopisticated matematical tecniques can be extended wit respect to scalable parallelism and ow tese tecniques can arness te computing power of modern manycore platforms. In particular, wit te formulation in terms of SpMV kernels te capabilities of GPUs can be exploited. Wit te described solvers, solution times for realistic problems can be kept at a moderate level. And wit te concept of our generic and portable software package, te numerical solvers can be used on a variety of platforms on a single code base. Te users are freed from specific ardware knowledge and platform-oriented optimizations. We ave reported speedups and we ave demonstrated tat even complex algoritms can be successfully ported to GPUs wit additional performance gains. Te proposed ILU(,2) smooter and te FSAI(2) smooter sow convincing performance and scalability results

23 2 for parallel matrix-based MG. Te proposed V-cycle based MG solvers wit parallel smooters on te GPU are by a factor of 6 faster tan te preconditioned CG solvers on te GPU. But due to te absence of coarse grid operations, te CG solvers ave sligtly better scalability properties on GPUs. Acknowledgements Te Sared Researc Group 6- received financial support by te Concept for te Future of Karlsrue Institute of Tecnology (KIT) in te framework of te German Excellence Initiative and its collaboration partner Hewlett-Packard. Te Engineering Matematics and Computing Lab (EMCL) at KIT as been awarded NVIDIA CUDA Center of Researc for its researc and efforts in GPU computing. Te autors acknowledge te support by means of ardware donation from te company NVIDIA in tat framework. We would also like to express our gratitude to Felix Rien and Niels Weg wo ave contributed to te implementation of our software. References. Andrew, R.B., Serman, A.H., Weiser, A.: Some refinement algoritms and data structures for regular local mes refinement (983) 2. Anzt, H., Augustin, W., Baumann, M., Bockelmann, H., Gengenbac, T., Han, T., Heuveline, V., Ketelaer, E., Lukarski, D., Otzen, A., Ritterbusc, S., Rocker, B., Ronnas, S., Scick, M., Subramanian, C., Weiss, J.P., Wilelm, F.: HiFlow 3 a flexible and ardware-aware parallel finite element package. Tec. Rep. 2-6, EMCL, KIT (2). URL ttp:// 3. Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on trougput-oriented processors. In: SC 9: Proc. of te Conf. on Hig Perf. Computing Networking, Storage and Analysis, pp.. ACM, New York (29). DOI ttp://doi.acm.org/5/ Demmel, J.W.: Applied numerical linear algebra. SIAM, Piladelpia (997) 5. Geveler, M., Ribbrock, D., Göddeke, D., Zajac, P., Turek, S.: Efficient finite element geometric multigrid solvers for unstructured grids on GPUs. In: P. Iványi, B.H. Topping (eds.) Second Int. Conf. on Parallel, Distributed, Grid and Cloud Computing for Engineering, p. 22 (2). DOI 23/ccp Geveler, M., Ribbrock, D., Göddeke, D., Zajac, P., Turek, S.: Towards a complete FEM-based simulation toolkit on GPUs: Geometric multigrid solvers. In: 23rd Int. Conf. on Parallel Computational Fluid Dynamics (ParCFD ) (2) 7. Göddeke, D.: Fast and accurate finite-element multigrid solvers for PDE simulations on GPU clusters. P.D. tesis, Tecnisce Universität Dortmund (2) 8. Haase, G., Liebmann, M., Douglas, C., Plank, G.: A parallel algebraic multigrid solver on grapical processing unit. In: W. Zang, et al. (eds.) HPCA 2, vol. LNCS 5938, pp Springer, Heidelberg (2) 9. Hackbusc, W.: Multi-grid metods and applications, 2. print. edn. Springer series in computational matematics ; 4. Springer, Berlin (23). Henson, V.E., Yang, U.M.: BoomerAMG: a parallel algebraic multigrid solver and preconditioner. Appl. Numer. Mat. 4(), (22). DOI ttp://dx.doi.org/.6/s ()5-5

24 22. Heuveline, V., et al.: HiFlow 3 - parallel finite element software (2). URL ttp:// 2. Heuveline, V., Lukarski, D., Subramanian, C., Weiss, J.P.: Parallel preconditioning and modular finite element solvers on ybrid CPU-GPU systems. In: P. Iványi, B.H. Topping (eds.) Proceedings of te 2nd Int. Conf. on Parallel, Distributed, Grid and Cloud Computing for Engineering, Paper 36. Civil-Comp Press, Stirlingsire, UK (2). DOI doi:23/ccp Heuveline, V., Lukarski, D., Weiss, J.P.: Scalable multi-coloring preconditioning for multi-core CPUs and GPUs. In: UCHPC, Euro-Par 2 Parallel Processing Worksops, LNCS vol. 6586, pp (2) 4. Heuveline, V., Lukarski, D., Weiss, J.P.: Enanced parallel ILU(p)-based preconditioners for multi-core CPUs and GPUs te power(q)-pattern metod. Tec. Rep. 2-8, EMCL, KIT (2). URL ttp:// 5. Heuveline, V., Subramanian, C., Lukarski, D., Weiss, J.P.: A multi-platform linear algebra toolbox for finite element solvers on eterogeneous clusters. In: PPAAC, IEEE Cluster 2 Worksops (2) 6. Hülsemann, F., Kowarscik, M., Mor, M., Rüde, U.: Parallel geometric multigrid. In: Numerical Solution of Partial Differential Equations on Parallel Computers, volume 5 of LNCSE, capter 5, pp (25) 7. Kolotilina, L., Yeremin, A.: Factorized sparse approximate inverse preconditionings, I: teory. SIAM J. Matrix Anal. Appl. (993) 8. Saad, Y.: Iterative metods for sparse linear systems. Society for Industrial and Applied Matematics, Piladelpia, PA, USA (23) 9. Sapira, Y.: Matrix-based multigrid. Teory and applications (Numerical metods and algoritms). Springer (23) 2. Trottenberg, U.: Multigrid. Academic Press, Amsterdam (2). Academic Press 2. Wittum, G.: On te robustness of ILU smooting. SIAM Journal on Scientific and Statistical Computing, :69977 (989)

25 Preprint Series of te Engineering Matematics and Computing Lab recent issues No. 2-8 Vincent Heuveline, Dimitar Lukarski, Jan-Pilipp Weiss: Enanced Parallel ILU(p)-based Preconditioners for Multi-core CPUs and GPUs Te Power(q)-pattern Metod No. 2-7 Tomas Gengenbac, Vincent Heuveline, Rolf Mayer, Matias J. Krause, Simon Zimny: A Preprocessing Approac for Innovative Patient-specific Intranasal Flow Simulations No. 2-6 Hartwig Anzt, Maribel Castillo, Juan C. Fernández, Vincent Heuveline, Francisco D. Igual, Rafael Mayo, Enrique S. Quintana-Ortí: Optimization of Power Consumption in te Iterative Solution of Sparse Linear Systems on Grapics Processors No. 2-5 Hartwig Anzt, Maribel Castillo, José I. Aliaga, Juan C. Fernández, Vincent Heuveline, Rafael Mayo, Enrique S. Quintana-Ortí: Analysis and Optimization of Power Consumption in te Iterative Solution of Sparse Linear Systems on Multi-core and Many-core Platforms No. 2-4 Vincent Heuveline, Micael Scick: A local time dependent Generalized Polynomial Caos metod for Stocastic Dynamical Systems No. 2-3 Vincent Heuveline, Micael Scick: Towards a ybrid numerical metod using Generalized Polynomial Caos for Stocastic Differential Equations No. 2-2 Panagiotis Adamidis, Vincent Heuveline, Florian Wilelm: A Hig-Efficient Scalable Solver for te Global Ocean/Sea-Ice Model MPIOM No. 2- Hartwig Anzt, Maribel Castillo, Juan C. Fernández, Vincent Heuveline, Rafael Mayo, Enrique S. Quintana-Ortí, Björn Rocker: Power Consumption of Mixed Precision in te Iterative Solution of Sparse Linear Systems No. 2-7 Werner Augustin, Vincent Heuveline, Jan-Pilipp Weiss: Convey HC- Hybrid Core Computer Te Potential of FPGAs in Numerical Simulation No. 2-6 Hartwig Anzt, Werner Augustin, Martin Baumann, Hendryk Bockelmann, Tomas Gengenbac, Tobias Han, Vincent Heuveline, Eva Ketelaer, Dimitar Lukarski, Andrea Otzen, Sebastian Ritterbusc, Björn Rocker, Staffan Ronnås, Micael Scick, Candramowli Subramanian, Jan-Pilipp Weiss, Florian Wilelm: HiFlow 3 A Flexible and Hardware-Aware Parallel Finite Element Package No. 2-5 Martin Baumann, Vincent Heuveline: Evaluation of Different Strategies for Goal Oriented Adaptivity in CFD Part I: Te Stationary Case No. 2-4 Hartwig Anzt, Tobias Han, Vincent Heuveline, Björn Rocker: GPU Accelerated Scientific Computing: Evaluation of te NVIDIA Fermi Arcitecture; Elementary Kernels and Linear Solvers No. 2-3 Hartwig Anzt, Vincent Heuveline, Björn Rocker: Energy Efficiency of Mixed Precision Iterative Refinement Metods using Hybrid Hardware Platforms: An Evaluation of different Solver and Hardware Configurations No. 2-2 Hartwig Anzt, Vincent Heuveline, Björn Rocker: Mixed Precision Error Correction Metods for Linear Systems: Convergence Analysis based on Krylov Subspace Metods Te responsibility for te contents of te working papers rests wit te autors, not te Institute. Since working papers are of a preliminary nature, it may be useful to contact te autors of a particular working paper about results or caveats before referring to, or quoting, a paper. Any comments on working papers sould be sent directly to te autors.

Verifying Numerical Convergence Rates

Verifying Numerical Convergence Rates 1 Order of accuracy Verifying Numerical Convergence Rates We consider a numerical approximation of an exact value u. Te approximation depends on a small parameter, suc as te grid size or time step, and

More information

Optimized Data Indexing Algorithms for OLAP Systems

Optimized Data Indexing Algorithms for OLAP Systems Database Systems Journal vol. I, no. 2/200 7 Optimized Data Indexing Algoritms for OLAP Systems Lucian BORNAZ Faculty of Cybernetics, Statistics and Economic Informatics Academy of Economic Studies, Bucarest

More information

FINITE DIFFERENCE METHODS

FINITE DIFFERENCE METHODS FINITE DIFFERENCE METHODS LONG CHEN Te best known metods, finite difference, consists of replacing eac derivative by a difference quotient in te classic formulation. It is simple to code and economic to

More information

Large-scale Virtual Acoustics Simulation at Audio Rates Using Three Dimensional Finite Difference Time Domain and Multiple GPUs

Large-scale Virtual Acoustics Simulation at Audio Rates Using Three Dimensional Finite Difference Time Domain and Multiple GPUs Large-scale Virtual Acoustics Simulation at Audio Rates Using Tree Dimensional Finite Difference Time Domain and Multiple GPUs Craig J. Webb 1,2 and Alan Gray 2 1 Acoustics Group, University of Edinburg

More information

Multigrid computational methods are

Multigrid computational methods are M ULTIGRID C OMPUTING Wy Multigrid Metods Are So Efficient Originally introduced as a way to numerically solve elliptic boundary-value problems, multigrid metods, and teir various multiscale descendants,

More information

A Multigrid Tutorial part two

A Multigrid Tutorial part two A Multigrid Tutorial part two William L. Briggs Department of Matematics University of Colorado at Denver Van Emden Henson Center for Applied Scientific Computing Lawrence Livermore National Laboratory

More information

The EOQ Inventory Formula

The EOQ Inventory Formula Te EOQ Inventory Formula James M. Cargal Matematics Department Troy University Montgomery Campus A basic problem for businesses and manufacturers is, wen ordering supplies, to determine wat quantity of

More information

Research on the Anti-perspective Correction Algorithm of QR Barcode

Research on the Anti-perspective Correction Algorithm of QR Barcode Researc on te Anti-perspective Correction Algoritm of QR Barcode Jianua Li, Yi-Wen Wang, YiJun Wang,Yi Cen, Guoceng Wang Key Laboratory of Electronic Tin Films and Integrated Devices University of Electronic

More information

Scalability Study of HiFlow3 based on a Fluid Flow Channel Benchmark

Scalability Study of HiFlow3 based on a Fluid Flow Channel Benchmark Scalability Study of HiFlow3 based on a Fluid Flow Channel Benchmark V. Heuveline, E. Ketelaer, S. Ronna s, M. Schmidtobreick, M. Wlotzka No. 2012-05 Preprint Series of the Engineering Mathematics and

More information

Analysis and Optimization of Power Consumption in the Iterative Solution of Sparse Linear Systems on Multi-core and Many-core Platforms

Analysis and Optimization of Power Consumption in the Iterative Solution of Sparse Linear Systems on Multi-core and Many-core Platforms Analysis and Optimization of Power Consumption in the Iterative Solution of Sparse Linear Systems on Multi-core and Many-core Platforms H. Anzt, M. Castillo, J. I. Aliaga, J. C. Ferna ndez, V. Heuveline,

More information

SHAPE: A NEW BUSINESS ANALYTICS WEB PLATFORM FOR GETTING INSIGHTS ON ELECTRICAL LOAD PATTERNS

SHAPE: A NEW BUSINESS ANALYTICS WEB PLATFORM FOR GETTING INSIGHTS ON ELECTRICAL LOAD PATTERNS CIRED Worksop - Rome, 11-12 June 2014 SAPE: A NEW BUSINESS ANALYTICS WEB PLATFORM FOR GETTING INSIGTS ON ELECTRICAL LOAD PATTERNS Diego Labate Paolo Giubbini Gianfranco Cicco Mario Ettorre Enel Distribuzione-Italy

More information

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State

More information

Geometric Stratification of Accounting Data

Geometric Stratification of Accounting Data Stratification of Accounting Data Patricia Gunning * Jane Mary Horgan ** William Yancey *** Abstract: We suggest a new procedure for defining te boundaries of te strata in igly skewed populations, usual

More information

Comparison between two approaches to overload control in a Real Server: local or hybrid solutions?

Comparison between two approaches to overload control in a Real Server: local or hybrid solutions? Comparison between two approaces to overload control in a Real Server: local or ybrid solutions? S. Montagna and M. Pignolo Researc and Development Italtel S.p.A. Settimo Milanese, ITALY Abstract Tis wor

More information

An Intuitive Framework for Real-Time Freeform Modeling

An Intuitive Framework for Real-Time Freeform Modeling An Intuitive Framework for Real-Time Freeform Modeling Mario Botsc Leif Kobbelt Computer Grapics Group RWTH Aacen University Abstract We present a freeform modeling framework for unstructured triangle

More information

HiFlow3 A Flexible and HardwareAware Parallel Finite Element Package

HiFlow3 A Flexible and HardwareAware Parallel Finite Element Package HiFlow3 A Flexible and HardwareAware Parallel Finite Element Package H. Anzt, W. Augustin, M. Baumann, H. Bockelmann, T. Gengenbach, T. Hahn, V. Heuveline, E. Ketelaer, D. Lukarski, A. Otzen, S. Ritterbusch,

More information

Part II: Finite Difference/Volume Discretisation for CFD

Part II: Finite Difference/Volume Discretisation for CFD Part II: Finite Difference/Volume Discretisation for CFD Finite Volume Metod of te Advection-Diffusion Equation A Finite Difference/Volume Metod for te Incompressible Navier-Stokes Equations Marker-and-Cell

More information

OPTIMAL DISCONTINUOUS GALERKIN METHODS FOR THE ACOUSTIC WAVE EQUATION IN HIGHER DIMENSIONS

OPTIMAL DISCONTINUOUS GALERKIN METHODS FOR THE ACOUSTIC WAVE EQUATION IN HIGHER DIMENSIONS OPTIMAL DISCONTINUOUS GALERKIN METHODS FOR THE ACOUSTIC WAVE EQUATION IN HIGHER DIMENSIONS ERIC T. CHUNG AND BJÖRN ENGQUIST Abstract. In tis paper, we developed and analyzed a new class of discontinuous

More information

SAMPLE DESIGN FOR THE TERRORISM RISK INSURANCE PROGRAM SURVEY

SAMPLE DESIGN FOR THE TERRORISM RISK INSURANCE PROGRAM SURVEY ASA Section on Survey Researc Metods SAMPLE DESIG FOR TE TERRORISM RISK ISURACE PROGRAM SURVEY G. ussain Coudry, Westat; Mats yfjäll, Statisticon; and Marianne Winglee, Westat G. ussain Coudry, Westat,

More information

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008 A tutorial on: Iterative methods for Sparse Matrix Problems Yousef Saad University of Minnesota Computer Science and Engineering CRM Montreal - April 30, 2008 Outline Part 1 Sparse matrices and sparsity

More information

A New Cement to Glue Nonconforming Grids with Robin Interface Conditions: The Finite Element Case

A New Cement to Glue Nonconforming Grids with Robin Interface Conditions: The Finite Element Case A New Cement to Glue Nonconforming Grids wit Robin Interface Conditions: Te Finite Element Case Martin J. Gander, Caroline Japet 2, Yvon Maday 3, and Frédéric Nataf 4 McGill University, Dept. of Matematics

More information

Abstract. Introduction

Abstract. Introduction Fast solution of te Sallow Water Equations using GPU tecnology A Crossley, R Lamb, S Waller JBA Consulting, Sout Barn, Brougton Hall, Skipton, Nort Yorksire, BD23 3AE. amanda.crossley@baconsulting.co.uk

More information

Design and Analysis of a Fault-Tolerant Mechanism for a Server-Less Video-On-Demand System

Design and Analysis of a Fault-Tolerant Mechanism for a Server-Less Video-On-Demand System Design and Analysis of a Fault-olerant Mecanism for a Server-Less Video-On-Demand System Jack Y. B. Lee Department of Information Engineering e Cinese University of Hong Kong Satin, N.., Hong Kong Email:

More information

Iterative Solvers for Linear Systems

Iterative Solvers for Linear Systems 9th SimLab Course on Parallel Numerical Simulation, 4.10 8.10.2010 Iterative Solvers for Linear Systems Bernhard Gatzhammer Chair of Scientific Computing in Computer Science Technische Universität München

More information

TESLA Report 2003-03

TESLA Report 2003-03 TESLA Report 23-3 A multigrid based 3D space-charge routine in the tracking code GPT Gisela Pöplau, Ursula van Rienen, Marieke de Loos and Bas van der Geer Institute of General Electrical Engineering,

More information

Accelerating CFD using OpenFOAM with GPUs

Accelerating CFD using OpenFOAM with GPUs Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide

More information

Can a Lump-Sum Transfer Make Everyone Enjoy the Gains. from Free Trade?

Can a Lump-Sum Transfer Make Everyone Enjoy the Gains. from Free Trade? Can a Lump-Sum Transfer Make Everyone Enjoy te Gains from Free Trade? Yasukazu Icino Department of Economics, Konan University June 30, 2010 Abstract I examine lump-sum transfer rules to redistribute te

More information

HPC Deployment of OpenFOAM in an Industrial Setting

HPC Deployment of OpenFOAM in an Industrial Setting HPC Deployment of OpenFOAM in an Industrial Setting Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd, United Kingdom PRACE Seminar: Industrial Usage of HPC Stockholm, Sweden, 28-29 March 2011 HPC Deployment

More information

How High a Degree is High Enough for High Order Finite Elements?

How High a Degree is High Enough for High Order Finite Elements? This space is reserved for the Procedia header, do not use it How High a Degree is High Enough for High Order Finite Elements? William F. National Institute of Standards and Technology, Gaithersburg, Maryland,

More information

OPTIMAL FLEET SELECTION FOR EARTHMOVING OPERATIONS

OPTIMAL FLEET SELECTION FOR EARTHMOVING OPERATIONS New Developments in Structural Engineering and Construction Yazdani, S. and Sing, A. (eds.) ISEC-7, Honolulu, June 18-23, 2013 OPTIMAL FLEET SELECTION FOR EARTHMOVING OPERATIONS JIALI FU 1, ERIK JENELIUS

More information

Computer Science and Engineering, UCSD October 7, 1999 Goldreic-Levin Teorem Autor: Bellare Te Goldreic-Levin Teorem 1 Te problem We æx a an integer n for te lengt of te strings involved. If a is an n-bit

More information

Improved dynamic programs for some batcing problems involving te maximum lateness criterion A P M Wagelmans Econometric Institute Erasmus University Rotterdam PO Box 1738, 3000 DR Rotterdam Te Neterlands

More information

Schedulability Analysis under Graph Routing in WirelessHART Networks

Schedulability Analysis under Graph Routing in WirelessHART Networks Scedulability Analysis under Grap Routing in WirelessHART Networks Abusayeed Saifulla, Dolvara Gunatilaka, Paras Tiwari, Mo Sa, Cenyang Lu, Bo Li Cengjie Wu, and Yixin Cen Department of Computer Science,

More information

Distances in random graphs with infinite mean degrees

Distances in random graphs with infinite mean degrees Distances in random graps wit infinite mean degrees Henri van den Esker, Remco van der Hofstad, Gerard Hoogiemstra and Dmitri Znamenski April 26, 2005 Abstract We study random graps wit an i.i.d. degree

More information

Training Robust Support Vector Regression via D. C. Program

Training Robust Support Vector Regression via D. C. Program Journal of Information & Computational Science 7: 12 (2010) 2385 2394 Available at ttp://www.joics.com Training Robust Support Vector Regression via D. C. Program Kuaini Wang, Ping Zong, Yaoong Zao College

More information

Turbomachinery CFD on many-core platforms experiences and strategies

Turbomachinery CFD on many-core platforms experiences and strategies Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29

More information

ON LOCAL LIKELIHOOD DENSITY ESTIMATION WHEN THE BANDWIDTH IS LARGE

ON LOCAL LIKELIHOOD DENSITY ESTIMATION WHEN THE BANDWIDTH IS LARGE ON LOCAL LIKELIHOOD DENSITY ESTIMATION WHEN THE BANDWIDTH IS LARGE Byeong U. Park 1 and Young Kyung Lee 2 Department of Statistics, Seoul National University, Seoul, Korea Tae Yoon Kim 3 and Ceolyong Park

More information

M(0) = 1 M(1) = 2 M(h) = M(h 1) + M(h 2) + 1 (h > 1)

M(0) = 1 M(1) = 2 M(h) = M(h 1) + M(h 2) + 1 (h > 1) Insertion and Deletion in VL Trees Submitted in Partial Fulfillment of te Requirements for Dr. Eric Kaltofen s 66621: nalysis of lgoritms by Robert McCloskey December 14, 1984 1 ackground ccording to Knut

More information

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,

More information

To motivate the notion of a variogram for a covariance stationary process, { Ys ( ): s R}

To motivate the notion of a variogram for a covariance stationary process, { Ys ( ): s R} 4. Variograms Te covariogram and its normalized form, te correlogram, are by far te most intuitive metods for summarizing te structure of spatial dependencies in a covariance stationary process. However,

More information

A perturbed two-level preconditioner for the solution of three-dimensional heterogeneous Helmholtz problems with applications to geophysics

A perturbed two-level preconditioner for the solution of three-dimensional heterogeneous Helmholtz problems with applications to geophysics Institut National Polytecnique de Toulouse(INP Toulouse) Matématiques, Informatiques et Télécommunication Xavier Pinel mardi18mai2010 A perturbed two-level preconditioner for te solution of tree-dimensional

More information

An inquiry into the multiplier process in IS-LM model

An inquiry into the multiplier process in IS-LM model An inquiry into te multiplier process in IS-LM model Autor: Li ziran Address: Li ziran, Room 409, Building 38#, Peing University, Beijing 00.87,PRC. Pone: (86) 00-62763074 Internet Address: jefferson@water.pu.edu.cn

More information

Multicore Parallel Computing with OpenMP

Multicore Parallel Computing with OpenMP Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large

More information

AeroFluidX: A Next Generation GPU-Based CFD Solver for Engineering Applications

AeroFluidX: A Next Generation GPU-Based CFD Solver for Engineering Applications AeroFluidX: A Next Generation GPU-Based CFD Solver for Engineering Applications Dr. Bjoern Landmann Dr. Kerstin Wieczorek Stefan Bachschuster 18.03.2015 FluiDyna GmbH, Lichtenbergstr. 8, 85748 Garching

More information

In other words the graph of the polynomial should pass through the points

In other words the graph of the polynomial should pass through the points Capter 3 Interpolation Interpolation is te problem of fitting a smoot curve troug a given set of points, generally as te grap of a function. It is useful at least in data analysis (interpolation is a form

More information

HIGH PERFORMANCE CONSULTING COURSE OFFERINGS

HIGH PERFORMANCE CONSULTING COURSE OFFERINGS Performance 1(6) HIGH PERFORMANCE CONSULTING COURSE OFFERINGS LEARN TO TAKE ADVANTAGE OF POWERFUL GPU BASED ACCELERATOR TECHNOLOGY TODAY 2006 2013 Nvidia GPUs Intel CPUs CONTENTS Acronyms and Terminology...

More information

How To Ensure That An Eac Edge Program Is Successful

How To Ensure That An Eac Edge Program Is Successful Introduction Te Economic Diversification and Growt Enterprises Act became effective on 1 January 1995. Te creation of tis Act was to encourage new businesses to start or expand in Newfoundland and Labrador.

More information

High-fidelity electromagnetic modeling of large multi-scale naval structures

High-fidelity electromagnetic modeling of large multi-scale naval structures High-fidelity electromagnetic modeling of large multi-scale naval structures F. Vipiana, M. A. Francavilla, S. Arianos, and G. Vecchi (LACE), and Politecnico di Torino 1 Outline ISMB and Antenna/EMC Lab

More information

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications

More information

Large-Scale Reservoir Simulation and Big Data Visualization

Large-Scale Reservoir Simulation and Big Data Visualization Large-Scale Reservoir Simulation and Big Data Visualization Dr. Zhangxing John Chen NSERC/Alberta Innovates Energy Environment Solutions/Foundation CMG Chair Alberta Innovates Technology Future (icore)

More information

P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE

P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE 1 P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE JEAN-MARC GRATIEN, JEAN-FRANÇOIS MAGRAS, PHILIPPE QUANDALLE, OLIVIER RICOIS 1&4, av. Bois-Préau. 92852 Rueil Malmaison Cedex. France

More information

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420-001,

More information

Strategic trading in a dynamic noisy market. Dimitri Vayanos

Strategic trading in a dynamic noisy market. Dimitri Vayanos LSE Researc Online Article (refereed) Strategic trading in a dynamic noisy market Dimitri Vayanos LSE as developed LSE Researc Online so tat users may access researc output of te Scool. Copyrigt and Moral

More information

A PARALLEL GEOMETRIC MULTIGRID METHOD FOR FINITE ELEMENTS ON OCTREE MESHES

A PARALLEL GEOMETRIC MULTIGRID METHOD FOR FINITE ELEMENTS ON OCTREE MESHES A PARALLEL GEOMETRIC MULTIGRID METHOD FOR FINITE ELEMENTS ON OCTREE MESHES RAHUL S. SAMPATH AND GEORGE BIROS Abstract. In this article, we present a parallel geometric multigrid algorithm for solving elliptic

More information

Tangent Lines and Rates of Change

Tangent Lines and Rates of Change Tangent Lines and Rates of Cange 9-2-2005 Given a function y = f(x), ow do you find te slope of te tangent line to te grap at te point P(a, f(a))? (I m tinking of te tangent line as a line tat just skims

More information

SWITCH T F T F SELECT. (b) local schedule of two branches. (a) if-then-else construct A & B MUX. one iteration cycle

SWITCH T F T F SELECT. (b) local schedule of two branches. (a) if-then-else construct A & B MUX. one iteration cycle 768 IEEE RANSACIONS ON COMPUERS, VOL. 46, NO. 7, JULY 997 Compile-ime Sceduling of Dynamic Constructs in Dataæow Program Graps Soonoi Ha, Member, IEEE and Edward A. Lee, Fellow, IEEE Abstract Sceduling

More information

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools

More information

A system to monitor the quality of automated coding of textual answers to open questions

A system to monitor the quality of automated coding of textual answers to open questions Researc in Official Statistics Number 2/2001 A system to monitor te quality of automated coding of textual answers to open questions Stefania Maccia * and Marcello D Orazio ** Italian National Statistical

More information

Optimizing Desktop Virtualization Solutions with the Cisco UCS Storage Accelerator

Optimizing Desktop Virtualization Solutions with the Cisco UCS Storage Accelerator Optimizing Desktop Virtualization Solutions wit te Cisco UCS Accelerator Solution Brief February 2013 Higligts Delivers linear virtual desktop storage scalability wit consistent, predictable performance

More information

Catalogue no. 12-001-XIE. Survey Methodology. December 2004

Catalogue no. 12-001-XIE. Survey Methodology. December 2004 Catalogue no. 1-001-XIE Survey Metodology December 004 How to obtain more information Specific inquiries about tis product and related statistics or services sould be directed to: Business Survey Metods

More information

2 Limits and Derivatives

2 Limits and Derivatives 2 Limits and Derivatives 2.7 Tangent Lines, Velocity, and Derivatives A tangent line to a circle is a line tat intersects te circle at exactly one point. We would like to take tis idea of tangent line

More information

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG) Lehrstuhl für Informatik 10 (Systemsimulation) Massively Parallel Multilevel Finite

More information

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906

More information

- 1 - Handout #22 May 23, 2012 Huffman Encoding and Data Compression. CS106B Spring 2012. Handout by Julie Zelenski with minor edits by Keith Schwarz

- 1 - Handout #22 May 23, 2012 Huffman Encoding and Data Compression. CS106B Spring 2012. Handout by Julie Zelenski with minor edits by Keith Schwarz CS106B Spring 01 Handout # May 3, 01 Huffman Encoding and Data Compression Handout by Julie Zelenski wit minor edits by Keit Scwarz In te early 1980s, personal computers ad ard disks tat were no larger

More information

Overview of Component Search System SPARS-J

Overview of Component Search System SPARS-J Overview of omponent Searc System Tetsuo Yamamoto*,Makoto Matsusita**, Katsuro Inoue** *Japan Science and Tecnology gency **Osaka University ac part nalysis part xperiment onclusion and Future work Motivation

More information

Fast Multipole Method for particle interactions: an open source parallel library component

Fast Multipole Method for particle interactions: an open source parallel library component Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,

More information

Projective Geometry. Projective Geometry

Projective Geometry. Projective Geometry Euclidean versus Euclidean geometry describes sapes as tey are Properties of objects tat are uncanged by rigid motions» Lengts» Angles» Parallelism Projective geometry describes objects as tey appear Lengts,

More information

HPC enabling of OpenFOAM R for CFD applications

HPC enabling of OpenFOAM R for CFD applications HPC enabling of OpenFOAM R for CFD applications Towards the exascale: OpenFOAM perspective Ivan Spisso 25-27 March 2015, Casalecchio di Reno, BOLOGNA. SuperComputing Applications and Innovation Department,

More information

h Understanding the safe operating principles and h Gaining maximum benefit and efficiency from your h Evaluating your testing system's performance

h Understanding the safe operating principles and h Gaining maximum benefit and efficiency from your h Evaluating your testing system's performance EXTRA TM Instron Services Revolve Around You It is everyting you expect from a global organization Te global training centers offer a complete educational service for users of advanced materials testing

More information

Unemployment insurance/severance payments and informality in developing countries

Unemployment insurance/severance payments and informality in developing countries Unemployment insurance/severance payments and informality in developing countries David Bardey y and Fernando Jaramillo z First version: September 2011. Tis version: November 2011. Abstract We analyze

More information

College Planning Using Cash Value Life Insurance

College Planning Using Cash Value Life Insurance College Planning Using Cas Value Life Insurance CAUTION: Te advisor is urged to be extremely cautious of anoter college funding veicle wic provides a guaranteed return of premium immediately if funded

More information

Tis Problem and Retail Inventory Management

Tis Problem and Retail Inventory Management Optimizing Inventory Replenisment of Retail Fasion Products Marsall Fiser Kumar Rajaram Anant Raman Te Warton Scool, University of Pennsylvania, 3620 Locust Walk, 3207 SH-DH, Piladelpia, Pennsylvania 19104-6366

More information

Multigrid preconditioning for nonlinear (degenerate) parabolic equations with application to monument degradation

Multigrid preconditioning for nonlinear (degenerate) parabolic equations with application to monument degradation Multigrid preconditioning for nonlinear (degenerate) parabolic equations with application to monument degradation M. Donatelli 1 M. Semplice S. Serra-Capizzano 1 1 Department of Science and High Technology

More information

The modelling of business rules for dashboard reporting using mutual information

The modelling of business rules for dashboard reporting using mutual information 8 t World IMACS / MODSIM Congress, Cairns, Australia 3-7 July 2009 ttp://mssanz.org.au/modsim09 Te modelling of business rules for dasboard reporting using mutual information Gregory Calbert Command, Control,

More information

Multivariate time series analysis: Some essential notions

Multivariate time series analysis: Some essential notions Capter 2 Multivariate time series analysis: Some essential notions An overview of a modeling and learning framework for multivariate time series was presented in Capter 1. In tis capter, some notions on

More information

Theoretical calculation of the heat capacity

Theoretical calculation of the heat capacity eoretical calculation of te eat capacity Principle of equipartition of energy Heat capacity of ideal and real gases Heat capacity of solids: Dulong-Petit, Einstein, Debye models Heat capacity of metals

More information

Channel Allocation in Non-Cooperative Multi-Radio Multi-Channel Wireless Networks

Channel Allocation in Non-Cooperative Multi-Radio Multi-Channel Wireless Networks Cannel Allocation in Non-Cooperative Multi-Radio Multi-Cannel Wireless Networks Dejun Yang, Xi Fang, Guoliang Xue Arizona State University Abstract Wile tremendous efforts ave been made on cannel allocation

More information

1. Case description. Best practice description

1. Case description. Best practice description 1. Case description Best practice description Tis case sows ow a large multinational went troug a bottom up organisational cange to become a knowledge-based company. A small community on knowledge Management

More information

A unified Energy Footprint for Simulation Software

A unified Energy Footprint for Simulation Software A unified Energy Footprint for Simulation Software H. Anzt, A. Beglarian, S. Chilingaryan, A. Ferrone, V. Heuveline, A. Kopmann No. 212-4 Preprint Series of the Engineering Mathematics and Computing Lab

More information

Wireless Interconnect for Board and Chip Level,

Wireless Interconnect for Board and Chip Level, 213 IEEE. Reprinted, wit permission, from Gerard P. Fettweis, Najeeb ul Hassan, Lukas Landau, and Erik Fiscer, Wireless Interconnect for Board and Cip Level, in Proceedings of te Design Automation and

More information

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France

More information

ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING

ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING ACCELERATING COMMERCIAL LINEAR DYNAMIC AND Vladimir Belsky Director of Solver Development* Luis Crivelli Director of Solver Development* Matt Dunbar Chief Architect* Mikhail Belyi Development Group Manager*

More information

Determine the perimeter of a triangle using algebra Find the area of a triangle using the formula

Determine the perimeter of a triangle using algebra Find the area of a triangle using the formula Student Name: Date: Contact Person Name: Pone Number: Lesson 0 Perimeter, Area, and Similarity of Triangles Objectives Determine te perimeter of a triangle using algebra Find te area of a triangle using

More information

Dynamically Scalable Architectures for E-Commerce

Dynamically Scalable Architectures for E-Commerce MKWI 2010 E-Commerce und E-Business 1289 Dynamically Scalable Arcitectures for E-Commerce A Strategy for Partial Integration of Cloud Resources in an E-Commerce System Georg Lackermair 1,2, Susanne Straringer

More information

ACT Math Facts & Formulas

ACT Math Facts & Formulas Numbers, Sequences, Factors Integers:..., -3, -2, -1, 0, 1, 2, 3,... Rationals: fractions, tat is, anyting expressable as a ratio of integers Reals: integers plus rationals plus special numbers suc as

More information

Staffing and routing in a two-tier call centre. Sameer Hasija*, Edieal J. Pinker and Robert A. Shumsky

Staffing and routing in a two-tier call centre. Sameer Hasija*, Edieal J. Pinker and Robert A. Shumsky 8 Int. J. Operational Researc, Vol. 1, Nos. 1/, 005 Staffing and routing in a two-tier call centre Sameer Hasija*, Edieal J. Pinker and Robert A. Sumsky Simon Scool, University of Rocester, Rocester 1467,

More information

High Performance Matrix Inversion with Several GPUs

High Performance Matrix Inversion with Several GPUs High Performance Matrix Inversion on a Multi-core Platform with Several GPUs Pablo Ezzatti 1, Enrique S. Quintana-Ortí 2 and Alfredo Remón 2 1 Centro de Cálculo-Instituto de Computación, Univ. de la República

More information

Operation go-live! Mastering the people side of operational readiness

Operation go-live! Mastering the people side of operational readiness ! I 2 London 2012 te ultimate Up to 30% of te value of a capital programme can be destroyed due to operational readiness failures. 1 In te complex interplay between tecnology, infrastructure and process,

More information

Haptic Manipulation of Virtual Materials for Medical Application

Haptic Manipulation of Virtual Materials for Medical Application Haptic Manipulation of Virtual Materials for Medical Application HIDETOSHI WAKAMATSU, SATORU HONMA Graduate Scool of Healt Care Sciences Tokyo Medical and Dental University, JAPAN wakamatsu.bse@tmd.ac.jp

More information

Best practices for efficient HPC performance with large models

Best practices for efficient HPC performance with large models Best practices for efficient HPC performance with large models Dr. Hößl Bernhard, CADFEM (Austria) GmbH PRACE Autumn School 2013 - Industry Oriented HPC Simulations, September 21-27, University of Ljubljana,

More information

Free Shipping and Repeat Buying on the Internet: Theory and Evidence

Free Shipping and Repeat Buying on the Internet: Theory and Evidence Free Sipping and Repeat Buying on te Internet: eory and Evidence Yingui Yang, Skander Essegaier and David R. Bell 1 June 13, 2005 1 Graduate Scool of Management, University of California at Davis (yiyang@ucdavis.edu)

More information

OpenFOAM Optimization Tools

OpenFOAM Optimization Tools OpenFOAM Optimization Tools Henrik Rusche and Aleks Jemcov h.rusche@wikki-gmbh.de and a.jemcov@wikki.co.uk Wikki, Germany and United Kingdom OpenFOAM Optimization Tools p. 1 Agenda Objective Review optimisation

More information

Floodless in SEATTLE: A Scalable Ethernet Architecture for Large Enterprises

Floodless in SEATTLE: A Scalable Ethernet Architecture for Large Enterprises Floodless in SEATTLE: A Scalable Eternet Arcitecture for Large Enterprises Cangoon Kim Mattew Caesar Jennifer Rexford Princeton University Princeton University Princeton University Abstract IP networks

More information

Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters

Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters Tis article as been accepted for publication in a future issue of tis journal, but as not been fully edited Content may cange prior to final publication Citation information: DOI 101109/TCC20152389842,

More information

Code Verification and Numerical Accuracy Assessment for Finite Volume CFD Codes. Subrahmanya P. Veluri

Code Verification and Numerical Accuracy Assessment for Finite Volume CFD Codes. Subrahmanya P. Veluri Code Verification and Numerical Accuracy Assessment for Finite Volume CFD Codes Subramanya P. Veluri Dissertation submitted to te faculty of te Virginia Polytecnic Institute and State University in partial

More information

Artificial Neural Networks for Time Series Prediction - a novel Approach to Inventory Management using Asymmetric Cost Functions

Artificial Neural Networks for Time Series Prediction - a novel Approach to Inventory Management using Asymmetric Cost Functions Artificial Neural Networks for Time Series Prediction - a novel Approac to Inventory Management using Asymmetric Cost Functions Sven F. Crone University of Hamburg, Institute of Information Systems crone@econ.uni-amburg.de

More information

FORCED AND NATURAL CONVECTION HEAT TRANSFER IN A LID-DRIVEN CAVITY

FORCED AND NATURAL CONVECTION HEAT TRANSFER IN A LID-DRIVEN CAVITY FORCED AND NATURAL CONVECTION HEAT TRANSFER IN A LID-DRIVEN CAVITY Guillermo E. Ovando Cacón UDIM, Instituto Tecnológico de Veracruz, Calzada Miguel Angel de Quevedo 2779, CP. 9860, Veracruz, Veracruz,

More information

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching

More information

PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms

PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms P. E. Vincent! Department of Aeronautics Imperial College London! 25 th March 2014 Overview Motivation Flux Reconstruction Many-Core

More information

CUDA for Real Time Multigrid Finite Element Simulation of

CUDA for Real Time Multigrid Finite Element Simulation of CUDA for Real Time Multigrid Finite Element Simulation of SoftTissue Deformations Christian Dick Computer Graphics and Visualization Group Technische Universität München, Germany Motivation Real time physics

More information