UoB Structured CFD Code



Similar documents
CFD Lab Department of Engineering The University of Liverpool

Simulation of Fluid-Structure Interactions in Aeronautical Applications

ME6130 An introduction to CFD 1-1

Turbomachinery CFD on many-core platforms experiences and strategies

ABSTRACT FOR THE 1ST INTERNATIONAL WORKSHOP ON HIGH ORDER CFD METHODS

Module 6 Case Studies

Overset Grids Technology in STAR-CCM+: Methodology and Applications

HPC Deployment of OpenFOAM in an Industrial Setting

Finite Element Formulation for Beams - Handout 2 -

Multi-Block Gridding Technique for FLOW-3D Flow Science, Inc. July 2004

OpenFOAM Optimization Tools

CONVERGE Features, Capabilities and Applications

Express Introductory Training in ANSYS Fluent Lecture 1 Introduction to the CFD Methodology

Lecture 16 - Free Surface Flows. Applied Computational Fluid Dynamics

Introductory FLUENT Training

Computational Modeling of Wind Turbines in OpenFOAM

C3.8 CRM wing/body Case

PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms

Introduction to CFD Analysis

Aeroelastic Investigation of the Sandia 100m Blade Using Computational Fluid Dynamics

Overset and Adaptive Meshes for Stabilized Finite-Element Scheme

CFD Based Reduced Order Models for T-tail flutter

Aerodynamic Department Institute of Aviation. Adam Dziubiński CFD group FLUENT

Part II: Finite Difference/Volume Discretisation for CFD

Advanced CFD Methods 1

TIME-ACCURATE SIMULATION OF THE FLOW AROUND THE COMPLETE BO105 WIND TUNNEL MODEL

Lecture 7 - Meshing. Applied Computational Fluid Dynamics

Differentiating a Time-dependent CFD Solver

Advanced discretisation techniques (a collection of first and second order schemes); Innovative algorithms and robust solvers for fast convergence.

Divergence-Free Elements for Incompressible Flow on Cartesian Grids

Customer Training Material. Lecture 2. Introduction to. Methodology ANSYS FLUENT. ANSYS, Inc. Proprietary 2010 ANSYS, Inc. All rights reserved.

Multiphase Flow - Appendices

Multicore Parallel Computing with OpenMP

STCE. Outline. Introduction. Applications. Ongoing work. Summary. STCE RWTH-Aachen, Industrial Applications of discrete adjoint OpenFOAM, EuroAD 2014

Performance Comparison and Analysis of Different Schemes and Limiters

MEL 807 Computational Heat Transfer (2-0-4) Dr. Prabal Talukdar Assistant Professor Department of Mechanical Engineering IIT Delhi

HIGH ORDER WENO SCHEMES ON UNSTRUCTURED TETRAHEDRAL MESHES

CSE Case Study: Optimising the CFD code DG-DES

MPI Hands-On List of the exercises

Modeling Rotor Wakes with a Hybrid OVERFLOW-Vortex Method on a GPU Cluster

NUMERICAL ANALYSIS OF WELLS TURBINE FOR WAVE POWER CONVERSION

TFAWS AUGUST 2003 VULCAN CFD CODE OVERVIEW / DEMO. Jeffery A. White. Hypersonic Airbreathing Propulsion Branch

TWO-DIMENSIONAL FINITE ELEMENT ANALYSIS OF FORCED CONVECTION FLOW AND HEAT TRANSFER IN A LAMINAR CHANNEL FLOW

AERODYNAMIC ANALYSIS OF BLADE 1.5 KW OF DUAL ROTOR HORIZONTAL AXIS WIND TURBINE

CCTech TM. ICEM-CFD & FLUENT Software Training. Course Brochure. Simulation is The Future

Finite Element Method (ENGC 6321) Syllabus. Second Semester

Customer Training Material. Lecture 5. Solver Settings ANSYS FLUENT. ANSYS, Inc. Proprietary 2010 ANSYS, Inc. All rights reserved.

A New Solution Adaption Capability for the OVERFLOW CFD Code

Introduction to ANSYS

Large-Scale Reservoir Simulation and Big Data Visualization

Introduction to CFD Analysis

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

THE CFD SIMULATION OF THE FLOW AROUND THE AIRCRAFT USING OPENFOAM AND ANSA

Mesh Moving Techniques for Fluid-Structure Interactions With Large Displacements

Trace Layer Import for Printed Circuit Boards Under Icepak

Finite Element Formulation for Plates - Handout 3 -

CUDA for Real Time Multigrid Finite Element Simulation of

Spatial Discretisation Schemes in the PDE framework Peano for Fluid-Structure Interactions

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Best practices for efficient HPC performance with large models

GPU Acceleration of the SENSEI CFD Code Suite

NUMERICAL ANALYSIS OF THE EFFECTS OF WIND ON BUILDING STRUCTURES

Gas Handling and Power Consumption of High Solidity Hydrofoils:

1 Finite difference example: 1D implicit heat equation

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

A Load Balancing Tool for Structured Multi-Block Grid CFD Applications

CFD SIMULATION OF SDHW STORAGE TANK WITH AND WITHOUT HEATER

The Influence of Aerodynamics on the Design of High-Performance Road Vehicles

Automated moving mesh techniques in CFD

Lecture 6 - Boundary Conditions. Applied Computational Fluid Dynamics

The simulation of machine tools can be divided into two stages. In the first stage the mechanical behavior of a machine tool is simulated with FEM

COMPUTATIONAL FLUID DYNAMICS (CFD) ANALYSIS OF INTERMEDIATE PRESSURE STEAM TURBINE

Numerical methods for American options

OpenFOAM: Year in Review

Customer Training Material. Lecture 4. Meshing in Mechanical. Mechanical. ANSYS, Inc. Proprietary 2010 ANSYS, Inc. All rights reserved.

Computational Fluid Dynamics Research Projects at Cenaero (2011)

CFD ANALYSIS OF CONTROLLABLE PITCH PROPELLER USED IN MARINE VEHICLE

This tutorial provides a recipe for simulating L

State of Stress at Point

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

8. Linear least-squares

TwinMesh for Positive Displacement Machines: Structured Meshes and reliable CFD Simulations

Harvesting-Combine-Flow Simulation Technique

High Performance Computing in CST STUDIO SUITE

How To Write A Program For The Pd Framework

HPC enabling of OpenFOAM R for CFD applications

APPLIED MATHEMATICS ADVANCED LEVEL

CBE 6333, R. Levicky 1 Differential Balance Equations

Behavioral Animation Simulation of Flocking Birds

Operation Count; Numerical Linear Algebra

Coupled CFD and Vortex Methods for Modelling Hydro- and Aerodynamics of Tidal Current Turbines and On- and Offshore Wind Turbines

Ridgeway Kite Innova've Technology for Reservoir Engineers A Massively Parallel Architecture for Reservoir Simula'on

Introduction. 1.1 Motivation. Chapter 1

Aerodynamic Design Optimization Discussion Group Case 4: Single- and multi-point optimization problems based on the CRM wing

Transcription:

UoB Structured CFD Code C.B. Allen April, 2013

Contents Background to code formulations Aspects of the coding structure, data structure, and parallel message passing approach Performance figures, serial and scalar Issues to consider for GPU porting; simple and not so! Code not really research area. Used primarily as tool to demonstrate other methods. Main research thrust is universal code- and mesh-independent technology: CFD-CSD coupling, volume and surface control and deformation, optimisation, data taransfer, etc..

Introductory Information Structured, multiblock Third-order upwind spatial stencil (convective), 5-points each direction Multigrid acceleration Explicit local time-stepping, steady Implicit pseudo-time, unsteady, with explicit local time-stepping within pseudo-time Aeroelastic coupled, forced and deforming; meshless CFD-CSD coupling Meshless mesh deformation approach Non-matching boundaries, with meshless interpolation Fortran90 and MPI.

Example Applications Unsteady rotor simulation: forward-flight with cyclic pitch variation. 4M and 32M cell, 208 block mesh.

Example Applications Static aeroelastic simulation, CFD-CSD coupling and mesh deformation via meshless radial basis function approach (details later). Mode 4 demonstration of CFD-CSD coupling. MDO wing static deflection calculation, C L = 0.65 0.19.

Example Applications Domain element shape parameterisation and mesh deformation, coupled with parallel gradient-based optimisation. Two-bladed rotor in hover, M Tip = 0.8, 63 parameters, minimise torque. C T = 29.6%.

Mesh Format Domain decomposed by blocks No global storage; mesh never considered in entirety. Grid header file: nblocks nsym version number block1filename block2filename etc. Each block separate file: ni nj nk x1 y1 z1 x2 y2 z2.. iminflag neighbour orientation imaxflag neighbour orientation jminflag neighbour orientation jmaxflag neighbour orientation kminflag neighbour orientation kmaxflag neighbour orientation

Decomposition Preprocessor processes mesh and control file and produces most of data required: - Moving mesh case, surface data processed and connectivity produced (later) - Aeroelastic case, surface mesh and structural mesh processed and interpolation dependence produced (later) Preproc grid.dims file. Solver initialisation routine splits blocks over processes using sizes in grid.dims: - Sorts block numbers by size then target ncells used to decide block split per process - Currently work split at block level only, for example 16 blocks means nprocs <= 16 - GPU port to allow work split at cell level ideal load balance.

Parallelisation and Data Structure Written in Fortran90 Domain decomposed by blocks No global storage; mesh never considered in entirety. Only list of block sizes and process owners built (plus tag data): nblocks(nprocs) blocknum(nprocs,nblocks(nprocs)) numproc(nblock) i1tag(nblock) k2tag(nblock) i1nb(nblock) k2nb(nblock) - number of blocks owned by each process - block number for each block owned by each process - process owner for each block - block boundary flags for each block - neighbour block number for each boundary for each block i1orient(nblock) k2orient(nblock) - orientation of neighbour block for each boundary for each block.

Initialisation and Coding Approach Code developed to mimise communication and storage Master performs all initialisation Blocks split over processes and all simulation data processed Global integer arrays and flow data broadcast to all processes Each process then defines its own local workspace length Solution loads and monitoring data processed on each process and sent to master Master collects convergence and load data and outputs Also sends (not broadcast) changed data to slaves Every process outputs its own solution data when required, to unique name No global collection or processing of solution data

Coding and Data Structure Each process builds its own storage (no global storage) All data stored in 1D arrays: rho(0:nijknb), x(0:nijknb), etc. NIJKNB local to each process. Sum over mglevels, nblocks, cells. (Different) 1D pointer to each block and multigrid level for each process offset(nb,m) do nbb=1,nblocks(myid) nb=blocknum(myid,nbb) do m=1,mglevels ioff=offset(nb,m) do k=1,nk(nb,m) do j=1,nj(nb,m) do i=1,ni(nb,m) ii=cellid(ioff,i,j,k) vol(ii)=vol(ii) + areas(ii)*dxns(ii)*...

Coding for Scalar Functions Code originally developed to minimise storage and number of ops Many local scalars and vectors defined and discarded Many dependencies. Example, cell left and right states (i west and east): F w (i) = F + e(i 1)+F w(i) do k=1,nk(nb,m) do j=1,nj(nb,m) do i=0,ni(nb,m)+1 ii=cellid(ioff,i,j,k) srho = limiter(rho(ii-1),rho(ii),rho(ii+1)) rhow = musclpos(srho,rho(ii-1),rho(ii),rho(ii+1),dsew(ii)...) rhoe = musclneg(srho,rho(ii-1),rho(ii),rho(ii+1),dsew(ii)...)... uw, ue, etc. IF(not solid surface) THEN FE = fluxneg(rhoe,ue,ve,we,ee,ce,pe,dxnw(ii+1),dynw(ii+1),dznw(ii+1)...) FW = FE + fluxpos(rhow,uw,vw,ww,ew,cw,pw,dxnw(ii),dynw(ii),dznw(ii)...) ELSE... RESW(ii) = FW*areaw(ii) enddo enddo enddo

Updated code more memory intensive: Coding for Scalar Functions ALLOCATE(rhow(0:ni(nb,m)+1,nj(nb,m),nk(nb,m)), ALLOCATE(uw(0:ni(nb,m)+1,nj(nb,m),nk(nb,m)), etc. ALLOCATE(rhoe(0:ni(nb,m)+1,nj(nb,m),nk(nb,m)), ALLOCATE(ue(0:ni(nb,m)+1,nj(nb,m),nk(nb,m)), etc. CALL EASTWEST(rhow,rhoe,...rho,u,v,w...,ni(nb,m),nj(nb,m),nk(nb,m)) -> do i=0,ni(nb,m)+1 cells -> overwrite surface values WALL_CELL() do k=1,nk(nb,m), do j=1,nj(nb,m) do i=1,ni(nb,m)+1 ii=cellid(ioff,i,j,k) RESW(ii)= totalflux(rhow,rhoe,uw,ue,vw,ve...)*areaw(ii) enddo enddo enddo Latest version of code: 800MBytes/million cells, double precision.

Parallelisation: Structure and Message Passing Code written such that no message passing in main subroutines: resid(), update(), restrict() etc. Called once per block, so independent of process/block/mg level. At each block boundary, two layers of halo data required for solution vector, for convective term evaluation. Separate subroutine for main message passing, boundarysolution(). One layer for prolong(), velocitygrads(), dtsmooth(). All messages packed into 1D temp arrays. Code written to allow same logic for serial and parallel version All message passing written so serial and parallel logic same. All messages sent as soon as available for efficiency No message ordering, non-blocking MPI calls throughout mpi wait() used to ensure message completion. Minimum use of mpi barrier().

Parallelisation: Structure and Message Passing At each block face requiring data passing, data packed into 1D temp arrays: boundxx and boundi/j/kxx XX = face number 1-6, variable number 1-? Consider connected imin face do i=1,2 do k=1,nk(nb,m) bound11(i1,nnn)=rho(ii) enddo do j=1,nj(nb,m) length=2*nj(nb,m)*nk(nb,m) nbn=i1nb(nb) nbb=numproc(nbn) IF(nbb.ne.myid) THEN i_to=nbb-1 if(i1orient(nb).eq.1) then i_tag1=6*nvariables*nbn+1 elseif(i1orient(nb).eq.2) then i_tag1=6*nvariables*nbn+nvariables+1 etc... call mpi_isend(bound11(1,nnn),length,mpi_...,i_to,i_tag1,...) i_tag1=6*nvariables*nb+1 call mpi_irecv(boundi11(1,nnn2),length,mpi_...,i_from,i_tag1,...)

Parallelisation: Structure and Message Passing ELSE ENDIF if(i1orient(nb).eq.1) then do ij=1,length boundi11(ij,blockpointer(nbn))=bound11(ij,nnn) elseif(i1orient(nb).eq.2) then do ij=1,length boundi21(ij,blockpointer(nbn))=bound11(ij,nnn) etc...! UNPACK if(i1orient(nb).eq.1) then do i=0,-1,-1 do k=1,nk(nb,m) do j=1,nj(nb,m) ij counter determined by orientation rho(ii)=boundi11(ij,nnn2) elseif(i1orient(nb).eq.2) then do i=0,-1,-1 do k=1,nk(nb,m) do j=1,nj(nb,m) ij counter determined by orientation rho(ii)=boundi21(ij,nnn2) etc. Adds small overhead to scalar code but nicer logic.

Solution Methods van Leer and Roe upwind convective fluxes, plus van Albada limited MUSCL 5-stage R-K time-stepping; halo values only set once per step, not every stage V-cycle multigrid used - Restrict residual, solution, error vector all levels. Once per cycle (message passing) - Standard volume weighted restriction - Single solution iteration on way down - 2 m 1 or 3 m 1 iterations on way up, limited at coarsest level - Trilinear interpolation for prolongation; no smoothing - Ramped up from min max min; 0.7 1.0 - One layer of Us exchanged at boundaries (message passing).

Convergence and Monitoring outprint() routine computes residuals and loads, plus standard deviations. Average flowfield conserved variables and loads stored each cycle σ(flow,loads) = standard deviation of this over nsamp cycles. Example, NACA0012 aerofoil case, 257 129 mesh. Inviscid case, M=0.5. COST: 32768 cells, 6 level MG, serial computation, gfortran -O3. Converge: 10 8, 199 cycles, 30s. 10 10, 261 cycles, 40s.

Code scaled at Daresbury 2004. Parallel Performance 996x speed-up for 1024 cores.

Profiling Typical mesh size for steady simulation gives following: Serial; Parallel resid() 80% outprint() 5% timestep() 4% boundsol() 4% update() 2% prolong() 2% geometric+initialise 2% restrictalllevels() 1% (calls resid,boundsol) resid() 72% boundsol() 7% outprint() 6% timestep() 5% prolong() 4% +geometric+initialise 3% update() 2% restrictalllevels() 1% (calls resid,boundsol) Consideration for GPUs: Once resid() 20 faster, outprint() most expensive function!

Issues for GPU Porting Message passing harness; single harness. Serial/multicore, multicore/gpu Multicore approach now limited to decomposition at block level. GPU to offer decomposition at cell/thread level ideal load balance CPU/GPU load balancing Core numerics no issues at all CFD-CSD coupling requires linear system solutions Mesh deformation requires linear system solutions Meshless interpolation requires expensive searches, complex message passing, AND numerous linear system solutions

Meshless Methods Much research performed on mesh independent methods Application areas CFD-CSD coupling, mesh deformation Based on function approximation methods, radial basis functions Global n-dimensional volume control methods: dimensions may be (x,y,z) and function displacement; or (Re,M,α,θ,γ) and function C L,C D etc. Objective: universal code- and mesh-independent methods.

Meshless Methods (a) Close surfaces. (b) Wing and beam. First application area CFD-CSD coupling Method sought to interpolate forces and displacements across the fluid-structure interface that satisfies the following requirements: - Mesh connectivity free code and mesh independent and perfectly parallel - Conservation of energy, total force and moment - Exact recovery of translation and rotation - Force and displacement association - Position of aerodynamic nodes a linear function of the position of the structural nodes.

Meshless Methods: Radial Basis Function Interpolation Define a coupling matrix, H, that transforms the displacements of the aerodynamic surface nodes according to the displacements of the structural nodes in a linear fashion using energy and force conservation can show that u a = Hu s (1) f s = H T f a (2) Let f(x) be the original function to be modelled, f i known values at N the control points, x i,i = 1,...,N, where x i is the n-dimensional vector at i. φ is the chosen basis function and the Euclidean norm, then an interpolation model s has the form N s(x) = β i φ( x x i )+p(x) (3) i=1 where β i,i = 1,...,N are model coefficients, and p is an optional polynomial.

These coefficients are found by requiring exact recovery of the original data, s X = f, for all points in the training data set X. For example, assume training data position of structural nodes, exact recovery of the centres gives, using up to linear polynomial terms X s = C ss a x Y s = C ss a y Z s = C ss a z (4) where X s = 0 0 0 0 x s x s = x s 1. x sn a x = γ x 0 γ x x γ x y γ x z β x s 1. β x s N (5)

(Analogous definitions hold for Y s and Z s and their a vectors) C ss = 0 0 0 0 1 1 1 0 0 0 0 x s1 x s2 x sn 0 0 0 0 y s1 y s2 y sn 0 0 0 0 z s1 z s2 z sn 1 x s1 y s1 z s1 φ s1 s 1 φ s1 s 2 φ s1 s N.......... 1 x sn y sn z sn φ sn s 1 φ sn s 2 φ sn s N (6) with φ s1 s 2 = φ( x s1 x s2 ) (7) To compute the aerodynamic surface points, equation (3) can be applied point by point Perfectly parallel. Either C 1 ss must be computed, or system solved for coefficient vectors.

Meshless Methods Much work performed to minimise cost. CFD-CSD not an issue, as system size Nstruct Nstruct, and use set of smaller patches. Mesh deformation, system size is Nsurface Nsurface; could be 10 6 10 6! Efficient point reduction and optimisation scheme developed, using greedy point selection, system size < 1000. Z X Y Two stages to mesh deformation: 1) system solution; 2) position vector update. 1) System solved on every process no comms 2) Mesh points on each process moved independently no comms Stage 2) ideal for GPU.

Mesh Deformation: Code Stages DO NT=1,NREALTIMESTEPS Update surface positions - prescribed Solve linear system -> betax(n) Solve linear system -> betay(n) Solve linear system -> betaz(n) Update mesh; X()=X0()+DeltaX() Update geometric data, gridspeeds, volumes (GCL) DO NIT=1,NMGCYCLES Update solution ENDDO ENDDO DO NT=1,NREALTIMESTEPS DO NC=1,NCOUPLINGCYCLES CFD surface pressures -> structural loads Compute new structural positions CSD displacements -> CFD surface positions Solve linear system -> betax(n) Solve linear system -> betay(n) Solve linear system -> betaz(n) Update mesh: X()=X0()+DeltaX() Update geometric data, gridspeeds (0 for static), volumes (GCL not for static) DO NIT=1,NMGCYCLES Update solution ENDDO ENDDO ENDDO

Mesh Deformation Greedy point selection scales as N op N sel n=1 (n3 +n.n surface ) Volume mesh deformation scales as N op N 3 sel +N sel.n volume Typical case: N surface = 7 10 5,N volume = 8 10 6 LAPACK CG method for system solution, single process, Mesh deformation 5-20% of solver cost. GPU solver: system solution dominates need faster system solution.

Meshless Methods: Non-Matching Boundaries Discontinuous patch boundary between two mesh blocks (A and B). Similar RBF interpolation used for high-order data transfer across boundary. Third to fifth order of accuracy proven.

NACA0012 Mach 0.5, drag coefficient convergence tested for non-matching. 129x65 129x21 + 81x21 mesh 257x129 257x49 + 161x49 mesh

NACA0012 Mach 0.5, drag coefficient convergence.

Non-Matching Boundaries: Nozzle Case Continuous (upper), discontinuous, spacing ratio 1.5 (centre). Every fourth point. Mach contours, continuous 512 80 mesh, M=0.8 (lower).

Mach contours and mesh. Continuous and discontinuous meshes. Entropy contours. Continuous and discontinuous meshes.

Mesh Spacing Ratio C d Difference (%) Continuous - 0.029971 - Discontinuous 1.33 0.029912 0.20 Discontinuous 1.50 0.029894 0.26 Discontinuous 2.00 0.029887 0.28 Discontinuous 3.00 0.029888 0.28

Non-Matching Interfaces: Issues Preprocessor currently computes cloud lists for each cell adjacent to a tagged interface. For each interface cell, halo point(s) need local cloud of control points ncloud(i,j,k,nb) list of points in terms of i,j,k, and nb values. Need to contruct φ = [ncloud ncloud] for each cloud Then solve φβ ρ = ρ, φβ E = E etc. or construct Aφ 1 where A = [2 ncloud] Point clouds include points from multiple block/processes, and each different size! Proof of concept stage: file I/O used: 2D + 20%; 3D + 50%. PROBLEM 1: Preprocessor performs searches and lists on every MG level For unsteady case, need every timestep need efficient search algorithms. PROBLEM 2: Complex data communication (not system solution and update costs).