MIKE 21 Flow Model FM. Parallelisation using GPU. Benchmarking report



Similar documents
MIKE 21 FLOW MODEL HINTS AND RECOMMENDATIONS IN APPLICATIONS WITH SIGNIFICANT FLOODING AND DRYING

2D Modeling of Urban Flood Vulnerable Areas

MIKE by DHI 2014 e sviluppi futuri

1 in 30 year 1 in 75 year 1 in 100 year 1 in 100 year plus climate change (+30%) 1 in 200 year

Flood Modelling for Cities using Cloud Computing FINAL REPORT. Vassilis Glenis, Vedrana Kutija, Stephen McGough, Simon Woodman, Chris Kilsby

06 - NATIONAL PLUVIAL FLOOD MAPPING FOR ALL IRELAND THE MODELLING APPROACH

Havnepromenade 9, DK-9000 Aalborg, Denmark. Denmark. Sohngaardsholmsvej 57, DK-9000 Aalborg, Denmark

Modelling of Urban Flooding in Dhaka City

Appendix C - Risk Assessment: Technical Details. Appendix C - Risk Assessment: Technical Details

Flooding in Urban Areas - 2D Modelling Approaches for Buildings and Fences

Interactive Data Visualization with Focus on Climate Research

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

Interactive comment on A simple 2-D inundation model for incorporating flood damage in urban drainage planning by A. Pathirana et al.

Characteristics of storm surges in German estuaries

CHAPTER 3 STORM DRAINAGE SYSTEMS

USING DETAILED 2D URBAN FLOODPLAIN MODELLING TO INFORM DEVELOPMENT PLANNING IN MISSISSAUGA, ON

Modelling Flood Inundation of Urban Areas in the UK Using 2D / 1D Hydraulic Models

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Installation Guide. (Version ) Midland Valley Exploration Ltd 144 West George Street Glasgow G2 2HG United Kingdom

ST810 Advanced Computing

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture

Fort Dodge Stormwater Master Planning. Prepared By: Ralph C. Stark, Jr., P.E., C.F.M. Joel N. Krause, P.E., C.F.M.

London Borough of Waltham Forest LOCAL FLOOD RISK MANAGEMENT STRATEGY. Summary Document

GeoImaging Accelerator Pansharp Test Results

NVIDIA GRID DASSAULT CATIA V5/V6 SCALABILITY GUIDE. NVIDIA Performance Engineering Labs PerfEngDoc-SG-DSC01v1 March 2016

How To Understand And Understand The Flood Risk Of Hoang Long River In Phuon Vietnam

Modelling and mapping of urban storm water flooding Using simple approaches in a process of Triage.

Icepak High-Performance Computing at Rockwell Automation: Benefits and Benchmarks

SUBJECT: SOLIDWORKS HARDWARE RECOMMENDATIONS UPDATE

Recommended hardware system configurations for ANSYS users

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

Accelerating CFD using OpenFOAM with GPUs

High Performance Computing in CST STUDIO SUITE

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Real-time Ocean Forecasting Needs at NCEP National Weather Service

Flow Through an Abrupt Constriction. 2D Hydrodynamic Model Performance and Influence of Spatial Resolution

Multicore Parallel Computing with OpenMP

GPUs for Scientific Computing

Emergency Spillways (Sediment basins)

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism

Vital Earth Composting Facility Flood Risk and Drainage Statement

River Flood Damage Assessment using IKONOS images, Segmentation Algorithms & Flood Simulation Models

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

. Address the following issues in your solution:

Chapter 9. Steady Flow in Open channels

Several tips on how to choose a suitable computer

Chapter 13 OPEN-CHANNEL FLOW

3-D Object recognition from point clouds

Several tips on how to choose a suitable computer

SIMULATION OF SEDIMENT TRANSPORT AND CHANNEL MORPHOLOGY CHANGE IN LARGE RIVER SYSTEMS. Stephen H. Scott 1 and Yafei Jia 2

DESCRIPTION OF STORMWATER STRUCTURAL CONTROLS IN MS4 PERMITS

PART 6 HA 113/05 DRAINAGE

Managing sewer flood risk

FLOODPLAIN DELINEATION IN MUGLA-DALAMAN PLAIN USING GIS BASED RIVER ANALYSIS SYSTEM

CHAPTER 9 CHANNELS APPENDIX A. Hydraulic Design Equations for Open Channel Flow

1 Bull, 2011 Bull Extreme Computing

FLOOD PLAIN DESIGNATION AND PROTECTION

Appendix 4-C. Open Channel Theory

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

PUBLIC WORKS DESIGN, SPECIFICATIONS & PROCEDURES MANUAL LINEAR INFRASTRUCTURE

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

LLamasoft K2 Enterprise 8.1 System Requirements

Lars-Göran Gustafsson, DHI Water and Environment, Box 3287, S Växjö, Sweden

Introduction to GPU hardware and to CUDA

Topic 8: Open Channel Flow

Urban Flood Modelling

Flood risk assessment through a detailed 1D/2D coupled model

SECTION 6A STORM DRAIN DESIGN Mar S E C T I O N 6A STORM DRAIN - DESIGN

GPGPU Computing. Yong Cao

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

APPLICATION PROCEDURES FOR PLACEMENT OF UNDERGROUND WATER AND SEWER PIPELINES IN THE VICINITY OF TRANSPORTATION FACILITIES UNDER THE

Turbomachinery CFD on many-core platforms experiences and strategies

Lecture 24 Flumes & Channel Transitions. I. General Characteristics of Flumes. Flumes are often used:

The Alternatives of Flood Mitigation in The Downstream Area of Mun River Basin

New Coastal Study for Puerto Rico FIRMs. Paul Weberg, FEMA RII Mat Mampara, Dewberry Jeff Gangai, Dewberry Krista Collier, Baker

4.14 Netherlands. Interactive flood risk map of a part of the province of Gelderland in the Netherlands. Atlas of Flood Maps

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

PassMark - G3D Mark High End Videocards - Updated 12th of November 2015

Index. protection. excavated drop inlet protection (Temporary) Block and gravel inlet Protection (Temporary)

Travel Time. Computation of travel time and time of concentration. Factors affecting time of concentration. Surface roughness

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

REHABILITATION METHOD FOR INCREASING FLOW VELOCITY AND REDUCING SEDIMENTATION

THE AMD MISSION 2 AN INTRODUCTION TO AMD NOVEMBER 2014

Introduction to GPGPU. Tiziano Diamanti

Simulating Sedimentation Model in Balarood Dam Reservoir Using CCHE2D Software

OPEN-CHANNEL FLOW. Free surface. P atm

Multiphase Flow - Appendices

The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems

FLOOD INFORMATION SERVICE EXPLANATORY NOTES

Computer Graphics Hardware An Overview

CHAPTER 2 HYDRAULICS OF SEWERS

Designing a Basic White Water Canoe/Kayak Course using InfoWorks ICM

Parallel Programming Survey

Impact of water harvesting dam on the Wadi s morphology using digital elevation model Study case: Wadi Al-kanger, Sudan

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Transcription:

MIKE 21 Flow Model FM Parallelisation using Benchmarking report MIKE by DHI 2014

DHI headquarters Agern Allé 5 DK-2970 Hørsholm Denmark +45 4516 9200 Telephone +45 4516 9333 Support +45 4516 9292 Telefax mikebydhi@dhigroup.com www.mikebydhi.com gpu_benchmarking.docx/ors/jca/kle/2014-09-26 - DHI

PLEASE NOTE COPYRIGHT This document refers to proprietary computer software, which is protected by copyright. All rights are reserved. Copying or other reproduction of this manual or the related programmes is prohibited without prior written consent of DHI. For details please refer to your 'DHI Software Licence Agreement'. LIMITED LIABILITY The liability of DHI is limited as specified in Section III of your 'DHI Software Licence Agreement': 'IN NO EVENT SHALL DHI OR ITS REPRESENTATIVES (AGENTS AND SUPPLIERS) BE LIABLE FOR ANY DAMAGES WHATSOEVER INCLUDING, WITHOUT LIMITATION, SPECIAL, INDIRECT, INCIDENTAL OR CONSEQUENTIAL DAMAGES OR DAMAGES FOR LOSS OF BUSINESS PROFITS OR SAVINGS, BUSINESS INTERRUPTION, LOSS OF BUSINESS INFORMATION OR OTHER PECUNIARY LOSS ARISING OUT OF THE USE OF OR THE INABILITY TO USE THIS DHI SOFTWARE PRODUCT, EVEN IF DHI HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. THIS LIMITATION SHALL APPLY TO CLAIMS OF PERSONAL INJURY TO THE EXTENT PERMITTED BY LAW. SOME COUNTRIES OR STATES DO NOT ALLOW THE EXCLUSION OR LIMITATION OF LIABILITY FOR CONSEQUENTIAL, SPECIAL, INDIRECT, INCIDENTAL DAMAGES AND, ACCORDINGLY, SOME PORTIONS OF THESE LIMITATIONS MAY NOT APPLY TO YOU. BY YOUR OPENING OF THIS SEALED PACKAGE OR INSTALLING OR USING THE SOFTWARE, YOU HAVE ACCEPTED THAT THE ABOVE LIMITATIONS OR THE MAXIMUM LEGALLY APPLICABLE SUBSET OF THESE LIMITATIONS APPLY TO YOUR PURCHASE OF THIS SOFTWARE.' PRINTING HISTORY September 2013... Release 2014 May 2014... Release 2014 July 2014..Release 2014 MIKE by DHI 2014

MIKE 21 Flow Model FM Parallelisation using - DHI

CONTENTS MIKE 21 Flow Model FM Parallelisation using Benchmarking report 1 Vision and Scope... 1 2 Methodology... 3 2.1 Parallelisation... 3 2.2 Hardware... 3 2.3 Software... 4 2.4 Performance of the Parallelisation... 4 3 Description of Test Cases... 5 3.1 Mediterranean Sea... 5 3.1.1 Description... 5 3.1.2 Setup... 5 3.1.3 Data and specification files... 6 3.2 Ribe Polder... 6 3.2.1 Description... 6 3.2.2 Setup... 7 3.2.3 Data and specification files... 9 3.3 EA2D Test 8A... 9 3.3.1 Description... 9 3.3.2 Setup... 11 3.3.3 Data and specification files... 11 3.4 EA2D Test 8B... 12 3.4.1 Description... 12 3.4.2 Setup... 14 3.4.3 Data and specification files... 14 4 Benchmarking using a GeForce GTX Titan Card... 15 4.1 Mediterranean Sea... 15 4.2 Ribe Polder... 17 4.3 EA2D Test 8A... 18 4.4 EA2D Test 8B... 19 5 Benchmarking using a Quadro K5000 Card... 21 5.1 Mediterranean Sea... 21 5.2 Ribe Polder... 23 5.3 EA2D Test 8B... 24 i

MIKE 21 Flow Model FM 6 Benchmarking using a Tesla K20c Card... 25 6.1 Mediterranean Sea... 25 6.2 Ribe Polder... 27 6.3 EA2D Test 8B... 28 7 Benchmarking using a NVS 4200M Card... 29 7.1 Mediterranean Sea... 29 8 Benchmarking using a GeForce GTX Titan Black Card... 33 8.1 Mediterranean Sea... 33 8.2 Ribe Polder... 34 8.3 EA2D Test 8B... 35 9 Benchmarking using a 16 Core DELL Workstation... 37 9.1 Mediterranean Sea... 37 9.2 Ribe Polder... 42 9.3 EA2D Test 8B... 43 10 Comparison of Graphics Cards... 47 11 Conclusions... 49 12 References... 51 ii Parallelisation using - DHI

Vision and Scope 1 Vision and Scope A set of well-defined test cases for the version of the MIKE 21 Flow Model FM, have been established. These test cases also cover MIKE FLOOD using the version of the MIKE 21 Flow Model FM for the 2D surface flow calculation. The test-suite is used to test the performance across platforms with different graphics cards. It is essential that it is possible to run the simulation with different spatial resolutions to be able to evaluate the scalability of the parallelization. The main focus is to benchmark the parallelisation of the flexible mesh modelling system. For comparison some simulations have also been performed using the MPI parallelisation of MIKE 21 Flow Model FM and OpenMP parallelisation of MIKE 21 Flow Model Classic. 1

MIKE 21 Flow Model FM 2 Parallelisation using - DHI

Methodology 2 Methodology 2.1 Parallelisation The computing approach uses the computers graphics card to perform the computational intensive calculations. This approach is based on CUDA by NVIDIA and can be executed on NVIDIA graphics cards with Compute Capability 2.0 or higher. Currently, only the computational intensive hydrodynamic calculations are performed on the. The additional calculations are performed on the CPU and these calculations are parallelised based on the Shared memory approach, OpenMP. 2.2 Hardware The benchmarks have been performed using the following graphics cards: Table 2.1 specifications Compute Capability Number of CUDA cores Memory (GB) Single/Double precision floating point performance GeForce GTX Titan Quadro K5000 3.5 2688 6 4.5 Tflops / 1.27 Tflops 3.0 1536 4 2.15 Tflops / 90 Gflops Tesla K20c 3.5 2496 5 3.52 Tflops / 1.17 Tflops NVS 4200M 2.1 48 1 155 Gflops GeForce GTX Titan Black 3.5 2880 6 5.1 Tflops / 1.3 Tflops The following hardware platforms have been used: Table 2.2 Hardware platforms Computer model Processor Memory (GB) Operating system 1 DELL Precision T7610 2 x Intel Xeon E5-2687W v2 (8 cores, 3.40 GHz) 32 Windows 7 Enterprise SP1, 64-bit 2 Lenovo Thinkpad 520 Intel Core i7-2670qm (4 cores, 2.20 GHZ) 8 Windows 7 Professional SP1, 64-bit 3 DELL Precision T7600 1 x Intel Xeon E5-2687W 8C (8 cores, 3.10 GHz) 16 Windows 7 Enterprise SP1, 64-bit 3

MIKE 21 Flow Model FM 2.3 Software All benchmarks have been performed using the Release 2014 version of the MIKE by DHI software. The simulations for EA2D Test8A were performed using Service Pack 1 for Release 2014. 2.4 Performance of the Parallelisation The parallel performance is illustrated by measuring the speedup. The speedup is defined as the elapsed time using the existing CPU version of MIKE 21 Flow Model FM (one core/thread) divided by the elapsed time using the new version (one core/thread for the CPU part of the calculation). The elapsed time is the total elapsed time (excluding pre- and post-processing). The performance metric is highly dependent on not only the hardware but also the CPU hardware. Using OpenMP or MPI parallelisation the speedup is defined as elapsed time using one thread/processor divided by the elapsed time using N threads/processors. For the simulations the number of threads per block on the is 128. 4 Parallelisation using - DHI

Description of Test Cases 3 Description of Test Cases 3.1 Mediterranean Sea 3.1.1 Description 3.1.2 Setup This test case has been established for benchmarking of the MIKE 21 Flow model FM. In the Western parts of the Mediterranean Sea tides are dominated by the Atlantic tides entering through the Strait of Gibraltar, while the tides in the Eastern parts are dominated by astronomical tides, forced directly by the Earth-Moon-Sun interaction. The bathymetry is shown in Figure 3.1. Simulations are performed using four meshes with different resolution (see Table 3.1). The meshes are generated specifying the value for the maximum area of 0.04, 0.005, 0.00125 and 0.0003125 degree 2, respectively. The simulation period for the benchmarks covers 2 days starting 1 January 2004 for the simulations using mesh A, B and C. The simulation period is reduced to 6 hours for the simulations using mesh D. At the Atlantic boundary a time varying level boundary is applied. The tidal elevation data is based on global tidal analysis (Andersen, 1995). For the bed resistance the Manning formulation is used with a Manning number of 32. For the eddy viscosity the Smagorinsky formulation is used with a Smagorinsky of 1.5. Tidal potential is applied with 11 components (default values). The shallow water equations are solved using both the first-order scheme and the higherorder scheme in time and space. The averaged time step for the simulations using A, B, C and D is 17.65s, 5.61s, 2.86s and 1.46s, respectively, for both the first-order scheme and the higher-order scheme in time and space. Table 3.1 Computational mesh for the Mediterranean Sea case Element shape Elements Nodes Max. area Degree 2 A Triangular 11287 6283 0.04 B Triangular 80968 41825 0.005 C Triangular 323029 164161 0.00125 D Triangular 1292116 651375 0.0003125 5

MIKE 21 Flow Model FM Figure 3.1 Bathymetry for the Mediterranean Sea case 3.1.3 Data and specification files 3.2 Ribe Polder 3.2.1 Description The data and specification files used for the present case are located in the directory: Benchmarking\Mediterranean_Sea The tests are performed using the following specification files: 2004_tide_0.04_1st.m21fm 2004_tide_0.005_1st.m21fm 2004_tide_0.00125_1st.m21fm 2004_tide_0.0003125_1st.m21fm 2004_tide_0.04_2nd.m21fm 2004_tide_0.005_2nd.m21fm 2004_tide_0.00125_2nd.m21fm 2004_tide_0.0003125_2nd.m21fm This test case has been established for benchmarking of the MIKE 21 Flow Model FM. The model area is located, on the southern part of Jutland, Denmark, around the city of Ribe. The area is protected from storm floods in the Wadden Sea to the west by a dike. The water course Ribe Å runs through the area and crosses the dike through a lock. The flood condition where the dike is breached during a storm flood is illustrated by numerical modelling. The concept applied to model the breach failure in the hydrodynamic model is based on prescribing the breach by a dynamic bathymetry that change in accordance with the relation applied for the temporal development of the breach. Use of this method requires that the location of the breach is defined and known at an early stage, so that it can be resolved properly and built into the bathymetry. The shape and temporal development of the breach is defined with a time-varying distribution along the dike crest. It is further defined how far normal to the crest line the breach can be felt. Within this distance the bathymetry is following the level of the breach, if the local level is 6 Parallelisation using - DHI

Description of Test Cases lower than the breach level no changes are introduced. The area of influence of the breach will therefore increase with time. The breach and flood modelling has been carried out based on a historical high water event (24 November, 1981), shown in Figure 3.2. Characteristic for this event is that high tide occurs at the same time as the extreme water level. Højer sluice is located about 40 km south of the breach, while Esbjerg is located about 20 km to the north. Based on the high water statistics for Ribe the extreme high water level has been estimated for an event having a return period of 10,000 years. The observed water level at Højer is hereafter adjusted gradually over two tidal cycles to the extreme high water level estimated for the given return periods at Ribe, as indicated in Figure 3.2. The water level time series established in this way are shown in Figure 3.2. Figure 3.2 Runoff from the catchment is included as specified discharges given for the two streams Ribe Å and Kongeåen The crossing between the dike and Ribe Å is shown in Figure 3.4. The crossing is in the form of a navigational chamber lock. It is represented in the model bathymetry as a culvert that can be closed by a gate. The points defining the dike next to the creek are modified to have increased levels in order to ensure a well defined bathymetry where flow only occurs through the cells defining the creek proper. The sluice is defined as a check valve allowing only flow towards the sea. 3.2.2 Setup The bathymetry is shown in Figure 3.3. The computational mesh contains 173101 elements. A satisy resolution of the breach is obtained by a fine mesh of structured triangles and rectangles as shown in Figure 3.4. The areas in and offshore of the dike is defined by a relatively fine mesh to avoid instabilities due to humps or holes caused by large elements with centroids just outside the area of influence from the breach. The simulation period is 42 hours. 7

MIKE 21 Flow Model FM Figure 3.3 Bathymetry for the Ribe Polder case Figure 3.4 Close-up of the bathymetry At the offshore boundary a time series of level variations is applied. 8 Parallelisation using - DHI

Description of Test Cases A constant discharge of 9.384 m 3 /s and 14.604 m 3 /s, respectively, are applied for the two streams Ribe Å and Kongeåen. For the bed resistance the Manning formulation is used with a Manning number of 18. For the eddy viscosity the Smagorinsky formulation is used with a Smagorinsky of 0.28. The shallow water equations are solved using both the first-order scheme and higherorder scheme in space and time. The averaged time step is 0.21s for both the first-order scheme and higher-order scheme in space and time. 3.2.3 Data and specification files 3.3 EA2D Test 8A 3.3.1 Description The data and specification files used for the present case are located in the directory Benchmarking\Ribe_Polder The tests are performed using the following specification files: Event_10000_1st.m21fm Event_10000_2nd.m21fm This test is Test 8A in the benchmarks test developed during the Joint Defra/Environment Agency research programme. This tests the package s capability to simulate shallow inundation originating from a point source and from rainfall applied directly to the model grid, at a relatively high resolution. This test case has been established for benchmarking of the MIKE 21 Flow model FM. The modelled area is approximately 0.4 km by 0.96 km and covers entirely the DEM provided and shown in Figure 3.5. Ground elevations span a range of ~21m to ~37m. Figure 3.5 Bathymetry for the EA2D Test 8A case 9

MIKE 21 Flow Model FM The flood is assumed to arise from two sources: a uniformly distributed rainfall event illustrated by the hyetograph in Figure. 3.6. This is applied to the modelled area only (the rest of the catchment is ignored). a point source at the location (264896, 664747) (Map projection: British national grid), and illustrated by the inflow time series in Figure 3.7. (This may for example be assumed to arise from a surcharging culvert.) Figure 3.6 Rainfall Figure 3.7 Discharge from source DEM is a 0.5m resolution Digital Terrain Model (no vegetation or buildings) created from LiDAR data collected on 13th August 2009 and provided by the Environment Agency (http://www.geomatics-group.co.uk). Model grid resolution should be 2m (or ~97000 nodes in the 0.388 km 2 area modelled). 10 Parallelisation using - DHI

Description of Test Cases 3.3.2 Setup All buildings at the real location (Cockenzie Street and surrounding streets in Glasgow, UK) are ignored and the modelling is carried out using the bare-earth DEM provided. A land-cover dependent roughness value is applied, with 2 categories: 1) Roads and pavements; 2) Any other land cover type. Manning s n = 0.02 is applied for roads and pavements n = 0.05 everywhere else. All boundaries in the model area are closed (no flow) and the initial condition is dry bed. The model is run until time T = 5 hours to allow the flood to settle in the lower parts of the modelled domain. Simulations are performed using four meshes with different resolution (see Table 3.2). The four meshes uses regular quadrilateral elements with grid spacing 2m, 1m, 0.5m and 0.25m, respectively. A corresponds to the original mesh used in the EA2D test, and the additional meshes are obtained by refining this mesh. Table 3.2 Computational mesh for the EA2D Test 8A case Element shape Elements Nodes Grid spacing metres A Quadrilateral 95719 96400 2 B Quadrilateral 384237 385600 1 C Quadrilateral 1539673 1542400 0.5 D Quadrilateral 6164145 6169600 0.25 The shallow water equations are solved using the first-order scheme in time and space. The averaged time step for the simulation using A, B, C and D is 0.22s, 0.10s, 0.5s and 0.027s, respectively. 3.3.3 Data and specification files The data and specification files used for the present case are located in the directory: Benchmarking\EA2D_Test_8A The tests are performed using the following specification files: Test8A_quadratic_2m.m21fm Test8A_quadratic_1m.m21fm Test8A_quadratic_0.5m.m21fm Test8A_quadratic_0.25m.m21fm 11

MIKE 21 Flow Model FM 3.4 EA2D Test 8B 3.4.1 Description This test is Test 8B in the benchmarks test developed during the Joint Defra/Environment Agency research programme. This tests the package s capability to simulate shallow inundation originating from a surcharging underground pipe, at relatively high resolution. This test case has been established for benchmarking of the MIKE Flood using MIKE 21 Flow model FM of the 2d surface flow calculation and MIKE Urban for calculation of the pipe flow. The modelled area is approximately 0.4 km by 0.96 km and covers entirely the DEM provided and shown in Figure 3.8Figure 3.8 Bathymetry for the EA2D Test8B case. Ground elevations span a range of ~21m to ~37m. Figure 3.8 Bathymetry for the EA2D Test8B case A culverted watercourse of circular section, 1400mm in diameter, ~1070m in length, and with invert level uniformly 2m below ground is assumed to run through the modelled area. An inflow boundary condition is applied at the upstream end of the pipe, illustrated in Figure 3.9. A surcharge is expected to occur at a vertical manhole of 1m 2 cross-section located 467m from the top end of the culvert, and at the location (264896, 664747). For the downstream boundary condition free out fall (critical flow is assumed). The base flow (uniform initial condition) is 1.6 m/s. The manhole is connected to the grid in one point and the surface flow is assumed not to affect the manhole outflow. 12 Parallelisation using - DHI

Description of Test Cases Figure 3.9 Inflow hydrograph applied for the EA2D Test8B at upstream end of culvert DEM is a 0.5m resolution Digital Terrain Model (no vegetation or buildings) created from LiDAR data collected on 13th August 2009 and provided by the Environment Agency (http://www.geomatics-group.co.uk). Model grid resolution should be 2m (or ~97000 nodes in the 0.388 km 2 area modelled). The presence of a large number of buildings in the modelled area is taken into account. Building outlines are provided with the dataset. Roof elevations are not provided. A land-cover dependent roughness value is applied, with 2 categories: 1) Roads and pavements; 2) Any other land cover type. Manning s n = 0.02 is applied for roads and pavements n = 0.05 everywhere else. All boundaries in the model area are closed (no flow) and the initial condition is dry bed. The model is run until time T = 5 hours to allow the flood to settle in the lower parts of the modelled domain. 13

MIKE 21 Flow Model FM 3.4.2 Setup Simulations are performed using four meshes with different resolution (see Table 3.3). The four meshes uses regular quadrilateral elements with grid spacing 2m, 1m, 0.5m and 0.25m, respectively. A corresponds to the original mesh used in the EA2D test, and the additional meshes are obtained by refining this mesh. Table 3.3 Computational mesh for the EA2D Test 8B case Element shape Elements Nodes Grid spacing metres A Quadrilateral 95719 96400 2 B Quadrilateral 384237 385600 1 C Quadrilateral 1539673 1542400 0.5 D Quadrilateral 6164145 6169600 0.25 The shallow water equations are solved using the first-order scheme in time and space. The averaged time step for the simulation using A, B, C and D is 0.27s, 0.15s, 0.76s and 0.025s, respectively. 3.4.3 Data and specification files The data and specification files used for the present case are located in the directory: Benchmarking\EA2D_Test_8B The tests are performed using the following specification files: Test8B_quadratic_2m.couple Test8B_quadratic_1m.couple Test8B_quadratic_0.5m.couple Test8B_quadratic_0.25m.couple 14 Parallelisation using - DHI

Benchmarking using a GeForce GTX Titan Card 4 Benchmarking using a GeForce GTX Titan Card These tests have been performed using a GeForce GTX Titan graphics card and hardware platform 1 specified in Table 2.2. 4.1 Mediterranean Sea Table 4.1 Simulations are carried out using single precision and using the first-order scheme in time and space A 64.19 11.89 5.40 B 1454.61 63.39 22.95 C 16814.01 267.33 62.90 D 13235.29 173.98 76.07 Table 4.2 Simulations are carried out using single precision and using the higher-order scheme in time and space A 167.00 22.39 7.46 B 4325.12 160.37 26.97 C 46896.29 834.49 56.19 D 39764.76 644.74 61.67 Table 4.3 Simulations are carried out using double precision and using the first-order scheme in time and space. A 64.19 12.73 5.04 B 1454.61 86.38 16.84 C 16814.01 339.74 49.49 D 13235.29 240.77 54.97 15

MIKE 21 Flow Model FM Table 4.4 Simulations are carried out using double precision and using the higher-order scheme in time and space. A 167.00 25.07 6.66 B 4325.12 208.62 20.72 C 46896.29 1036.75 45.23 D 39764.76 835.48 47.59 Figure 4.1 using single precision. Black line: first-order scheme; Red line: higher-order scheme 16 Parallelisation using - DHI

Benchmarking using a GeForce GTX Titan Card Figure 4.2 using double precision. Black line: first-order scheme; Red line: higher-order scheme 4.2 Ribe Polder Table 4.5 Simulations are carried out using single precision and using the first-order scheme in time and space A 26555.23 4383.86 6.06 Table 4.6 Simulations are carried out using single precision and using the higher-order scheme in time and space A 97717.09 7972.54 12.26 Table 4.7 Simulations are carried out using double precision and using the first-order scheme in time and space A 26555.23 4872.61 5.45 17

MIKE 21 Flow Model FM Table 4.8 Simulations are carried out using double precision and using the higher-order scheme in time and space A 97717.09 8896.15 10.98 4.3 EA2D Test 8A Table 4.9 Simulations are carried out using single precision and using the first-order scheme in time and space A 1638.88 238.77 6.86 B 15725.61 1165.82 13.48 C 128194.12 6221.17 20.60 D 47334.01 Table 4.10 Simulations are carried out using double precision and using the first-order scheme in time and space A 1638.88 284.72 5.75 B 15725.61 1374.08 11.44 C 128194.12 7735.15 16.57 D 54159.71 18 Parallelisation using - DHI

Benchmarking using a GeForce GTX Titan Card 4.4 EA2D Test 8B Table 4.11 Simulations are carried out using single precision and using the first-order scheme in time and space A 810.08 213.54 3.79 B 6568.61 838.77 7.83 C 52093.89 5734.40 9.08 D 33089.37 Table 4.12 Simulations are carried out using double precision and using the first-order scheme in time and space A 810.08 261.73 3.10 B 6568.61 1103.38 5.95 C 52093.89 6153.81 8.47 D 39265.75 19

MIKE 21 Flow Model FM 20 Parallelisation using - DHI

Benchmarking using a Quadro K5000 Card 5 Benchmarking using a Quadro K5000 Card These tests have been performed using a Quadro K5000 graphics card and hardware platform 1 specified in Table 2.2. 5.1 Mediterranean Sea Table 5.1 Simulations are carried out using single precision and using the first-order scheme in time and space A 64.19 9.94 6.46 B 1454.61 69.90 20.81 C 16814.01 355.73 47.27 D 13235.29 293.66 45.07 Table 5.2 Simulations are carried out using single precision and using the higher-order scheme in time and space A 167.00 26.71 6.25 B 4325.12 257.16 16.82 C 46896.29 1665.07 28.16 D 39764.76 1522.75 26.11 Table 5.3 Simulations are carried out using double precision and using the first-order scheme in time and space. A 64.19 14.31 4.49 B 1454.61 146.07 9.96 C 16814.01 878.34 19.14 D 13235.29 789.57 16.76 21

MIKE 21 Flow Model FM Table 5.4 Simulations are carried out using double precision and using the higher-order scheme in time and space. A 167.00 35.99 4.64 B 4325.12 462.41 9.35 C 46896.29 3107.30 15.09 D 39764.76 2868.52 13.86 Figure 5.1 using single precision. Black line: first-order scheme; Red line: higher-order scheme 22 Parallelisation using - DHI

Benchmarking using a Quadro K5000 Card Figure 5.2 using double precision. Black line: first-order scheme; Red line: higher-order scheme 5.2 Ribe Polder Table 5.5 Simulations are carried out using single precision and using the first-order scheme in time and space A 26555.23 4407.03 6.03 Table 5.6 Simulations are carried out using single precision and using the higher-order scheme in time and space A 97717.09 10441.86 9.36 Table 5.7 Simulations are carried out using double precision and using the first-order scheme in time and space A 26555.23 6169.98 4.30 23

MIKE 21 Flow Model FM Table 5.8 Simulations are carried out using double precision and using the higher-order scheme in time and space A 97717.09 15650.28 6.24 5.3 EA2D Test 8B Table 5.9 Simulations are carried out using single precision and using the first-order scheme in time and space A 810.08 203.89 3.97 B 6568.61 914.10 7.19 C 52093.89 6112.87 8.52 D Table 5.10 Simulations are carried out using double precision and using the first-order scheme in time and space A 810.08 298.15 2.72 B 6568.61 1558.46 4.21 C 52093.89 8302.07 6.27 D 24 Parallelisation using - DHI

Benchmarking using a Tesla K20c Card 6 Benchmarking using a Tesla K20c Card These tests have been performed using a Tesla K20c graphics card and hardware platform 1 specified in Table 2.2. 6.1 Mediterranean Sea Table 6.1 Simulations are carried out using single precision and using the first-order scheme in time and space A 64.19 4.86 13.21 B 1454.61 37.26 39.04 C 16814.01 213.53 78.74 D 13235.29 191.15 69.24 Table 6.2 Simulations are carried out using single precision and using the higher-order scheme in time and space A 167.00 12.33 13.54 B 4325.12 126.66 34.15 C 46896.29 838.72 55.91 D 39764.76 779.83 50.99 Table 6.3 Simulations are carried out using double precision and using the first-order scheme in time and space. A 64.19 5.30 12.11 B 1454.61 48.62 29.92 C 16814.01 300.69 55.92 D 13235.29 275.30 48.08 25

MIKE 21 Flow Model FM Table 6.4 Simulations are carried out using double precision and using the higher-order scheme in time and space. A 167.00 13.95 11.97 B 4325.12 162.76 26.57 C 46896.29 1116.31 42.01 D 39764.76 1018.26 39.05 Figure 6.1 using single precision. Black line: first-order scheme; Red line: higher-order scheme 26 Parallelisation using - DHI

Benchmarking using a Tesla K20c Card Figure 6.2 using double precision. Black line: first-order scheme; Red line: higher-order scheme 6.2 Ribe Polder Table 6.5 Simulations are carried out using single precision and using the first-order scheme in time and space A 26555.23 2884.68 9.21 Table 6.6 Simulations are carried out using single precision and using the higher-order scheme in time and space A 97717.09 5981.10 16.34 Table 6.7 Simulations are carried out using double precision and using the first-order scheme in time and space A 26555.23 3311.27 8.02 27

MIKE 21 Flow Model FM Table 6.8 Simulations are carried out using double precision and using the higher-order scheme in time and space A 97717.09 7015.96 13.93 6.3 EA2D Test 8B Table 6.9 Simulations are carried out using single precision and using the first-order scheme in time and space A 810.08 128.48 6.31 B 6568.61 675.85 9.72 C 52093.89 5716.72 9.11 D 32281.23 Table 6.10 Simulations are carried out using double precision and using the first-order scheme in time and space A 810.08 158.08 5.12 B 6568.61 875.74 7.50 C 52093.89 6676.75 7.80 D 38633.17 28 Parallelisation using - DHI

Benchmarking using a NVS 4200M Card 7 Benchmarking using a NVS 4200M Card These tests have been performed using a NVS 4200M graphics card and hardware platform 2 specified in Table 2.2. 7.1 Mediterranean Sea Table 7.1 Simulations are carried out using single precision and using the first-order scheme in time and space A 97.02 35.56 2.73 B 2218.03 431.35 5.14 C 21741.99 2843.36 7.65 D 18961.83 2661.32 7.12 Table 7.2 Simulations are carried out using single precision and using the higher-order scheme in time and space A 259.00 107.28 2.41 B 6415.06 1775.67 3.61 C 61794.60 13064.44 4.73 D 54500.34 12613.17 4.32 Table 7.3 Simulations are carried out using double precision and using the first-order scheme in time and space A 97.02 56.14 1.73 B 2218.03 892.48 2.49 C 21741.99 6317.88 3.44 D 18961.83 6177.64 3.07 29

MIKE 21 Flow Model FM Table 7.4 Simulations are carried out using double precision and using the higher-order scheme in time and space A 259.00 168.54 1.54 B 6415.06 3107.54 2.06 C 61794.60 23011.84 2.69 D 54500.34 22619.52 2.41 Figure 7.1 using single precision. Black line: first-order scheme; Red line: higher-order scheme 30 Parallelisation using - DHI

Benchmarking using a NVS 4200M Card Figure 7.2 using double precision. Black line: first-order scheme; Red line: higher-order scheme 31

MIKE 21 Flow Model FM 32 Parallelisation using - DHI

Benchmarking using a GeForce GTX Titan Black Card 8 Benchmarking using a GeForce GTX Titan Black Card These tests have been performed using a GeForce GTX Titan Black graphics card and hardware platform 3 specified in Table 2.2. The elapsed times used for the simulations are the times obtained using platform 1 (see chapter 4). 8.1 Mediterranean Sea Table 8.1 Simulations are carried out using single precision and using the first-order scheme in time and space A 64.19 11.02 5.82 B 1454.61 54.02 26.92 C 16814.01 211.31 79.57 D 13235.29 146.21 90.52 E 1099.56 Table 8.2 Simulations are carried out using single precision and using the higher-order scheme in time and space A 167.00 20.63 8.09 B 4325.12 129.56 33.38 C 46896.29 686.02 68.35 D 39764.76 568.40 67.58 E 4483.07 33

MIKE 21 Flow Model FM Table 8.3 Simulations are carried out using double precision and using the first-order scheme in time and space. A 64.19 11.22 5.72 B 1454.61 68.25 21.31 C 16814.01 272.75 61.64 D 13235.29 205.49 64.40 E 1600.80 Table 8.4 Simulations are carried out using double precision and using the higher-order scheme in time and space. A 167.00 21.95 7.60 B 4325.12 161.01 26.86 C 46896.29 874.97 53.59 D 39764.76 745.67 53.32 E 6096.90 8.2 Ribe Polder Table 8.5 Simulations are carried out using single precision and using the first-order scheme in time and space A 26555.23 3567.78 7.44 Table 8.6 Simulations are carried out using single precision and using the higher-order scheme in time and space A 97717.09 6573.99 14.86 34 Parallelisation using - DHI

Benchmarking using a GeForce GTX Titan Black Card Table 8.7 Simulations are carried out using double precision and using the first-order scheme in time and space A 26555.23 3872.58 6.85 Table 8.8 Simulations are carried out using double precision and using the higher-order scheme in time and space A 97717.09 7179.47 13.61 8.3 EA2D Test 8B Table 8.11 Simulations are carried out using single precision and using the first-order scheme in time and space A 810.08 190.66 4.24 B 6568.61 696.91 9.42 C 52093.89 3664.42 14.21 D 23671.27 Table 8.12 Simulations are carried out using double precision and using the first-order scheme in time and space A 810.08 209.17 3.87 B 6568.61 750.53 8.75 C 52093.89 3931.35 13.25 D 24859.79 35

MIKE 21 Flow Model FM 36 Parallelisation using - DHI

Benchmarking using a 16 Core DELL Workstation 9 Benchmarking using a 16 Core DELL Workstation For comparison simulations have also been performed using MIKE 21 Flow Model FM with MPI parallelisation and using MIKE 21 Flow Model Classic with OpenMP parallelisation (only the EA2D Test 8B). These tests have been performed using hardware specification 1 in Tabl2 2.2. 9.1 Mediterranean Sea Table 9.1 Simulations are carried out using MIKE 21 Flow Model FM with MPI parallelisation and using the first-order scheme in time and space No. of processors CPU time 1 64.19 1 2 31.11 2.06 4 16.43 3.90 6 11.78 5.44 A 8 9.25 6.93 10 7.80 8.22 12 6.49 9.89 14 5.72 11.22 16 5.57 11.52 No. of processors CPU time 1 1454.61 1 2 698.05 2.08 4 352.27 4.01 6 261.17 5.56 B 8 194.21 7.48 10 160.92 9.03 12 136.19 10.60 14 117.30 12.40 16 105.49 13.78 37

MIKE 21 Flow Model FM No. of processors CPU time 1 16814.01 1 2 8103.79 2.07 4 2930.33 5.73 6 2151.25 7.81 C 8 1638.91 10.25 10 1356.37 12.39 12 1169.18 14.38 14 1026.68 16.37 16 945.18 17.78 No. of processors CPU time 1 13235.29 1 2 6484.43 2.04 4 4777.07 2.77 6 2464.61 5.37 D 8 2414.19 5.48 10 1447.18 9.14 12 1279.71 10.34 14 1130.21 11.71 16 1054.46 12.55 38 Parallelisation using - DHI

Benchmarking using a 16 Core DELL Workstation Table 9.2 Simulations are carried out using MIKE 21 Flow Model FM with MPI parallelisation and using the higher-order scheme in time and space No. of processors CPU time 1 167.00 1 2 80.79 2.06 4 42.73 3.90 6 30.37 5.49 A 8 23.64 7.06 10 19.94 8.37 12 16.48 10.13 14 14.54 11.48 16 13.84 12.06 No. of processors CPU time 1 4325.12 1 2 1902.26 2.27 4 986.41 4.38 6 713.91 6.05 B 8 512.33 8.44 10 429.85 10.06 12 361.79 11.95 14 312.38 13.84 16 284.05 15.22 39

MIKE 21 Flow Model FM No. of processors CPU time 1 46896.29 1 2 23520.01 1.99 4 9028.19 5.19 6 6455.55 7.26 C 8 4986.79 9.40 10 4065.37 11.53 12 3405.06 13.77 14 29.72.80 15.77 16 2673.44 17.54 No. of processors CPU time 1 39764.76 1 2 18573.54 2.14 4 13268.76 2.99 6 7026.04 5.65 D 8 6167.24 6.44 10 4318.39 9.20 12 3661.66 10.88 14 3220.27 12.34 16 3069.51 12.95 40 Parallelisation using - DHI

Benchmarking using a 16 Core DELL Workstation Figure 9.1 using first-order scheme in time and space. Black line: MPI parallelisation with mesh A; Blue line: MPI parallelisation with mesh B; Green line: MPI parallelisation with mesh C; Light blue line: MPI parallelisation with mesh D; Red line: Ideal speedup Figure 9.2 using higher-order scheme in time and space. Black line: MPI parallelisation with mesh A; Blue line: MPI parallelisation with mesh B; Green line: MPI parallelisation with mesh C; Light blue line: MPI parallelisation with mesh D; Red line: Ideal speedup 41

MIKE 21 Flow Model FM 9.2 Ribe Polder Table 9.3 Simulations are carried out using MIKE 21 Flow Model FM with MPI parallelisation and using the first-order-order scheme in time and space No. of processors CPU time 1 26555.23 1 2 11358.86 2.33 4 7954.57 3.33 6 6375.35 4.16 A 8 5216.54 5.09 10 4380.82 6.06 12 4251.42 6.24 14 3924.76 6.76 16 3585.42 7.40 Table 9.4 Simulations are carried out using MIKE 21 Flow Model FM with MPI parallelisation and using the higher-order scheme in time and space No. of processors CPU time 1 97717.09 1 2 45005.15 2.17 4 34872.54 2.80 6 25202.68 3.87 A 8 20629.71 4.73 10 16966.85 5.75 12 16669.84 5.86 14 15261.20 6.40 16 13159.84 7.42 42 Parallelisation using - DHI

Benchmarking using a 16 Core DELL Workstation Figure 9.3 Simulations are carried out using MIKE 21 Flow Model FM with MPI parallelisation. Black line: First-order scheme; Blue line: Higher-order scheme; Red line: Ideal speedup. 9.3 EA2D Test 8B Figure 9.4 for MIKE 21 Flow model FM with MPI parallelisation. Black line: A; Blue line: B; Green line: C; Red line: Ideal speedup. 43

MIKE 21 Flow Model FM Table 9.5 Simulations are carried out using MIKE 21 Flow Model FM with MPI parallelisation. No. of processors CPU time 1 810.08 1 2 374.85 2.16 4 218.05 3.71 6 182.82 4.43 A 8 141.95 5.70 10 124.16 6.52 12 108.74 7.44 14 93.49 8.66 16 96.53 8.39 1 6568.61 1 2 4124.83 1.59 4 1821.88 3.60 6 1510.88 4.34 B 8 1295.62 5.06 10 1054.72 6.23 12 1036.45 6.33 14 1012.74 6.48 16 969.25 6.77 1 52093.89 1 2 26155.70 1.99 4 20881.94 2.49 6 15086.92 3.45 C 8 14098.96 3.69 10 11416.85 4.56 12 9135.77 5.70 14 8744.19 5.95 16 8719.11 5.97 44 Parallelisation using - DHI

Benchmarking using a 16 Core DELL Workstation In the simulations using MIKE 21 Flow Model Classic, the time step is 1s and 0.5s, respectively, for A and B. Table 9.6 Simulations are carried out using the MIKE 21 Flow Model Classic with OpenMP parallelisation. No. of processors CPU time 1 147.38 1 2 102.53 1.43 4 94.76 1.55 6 75.37 1.95 A 8 77.36 1.90 10 74.65 1.97 12 72.59 2.03 14 70.28 2.09 16 73.45 2.00 1 1716.06 1 2 1096.15 1.56 4 805.41 2.13 6 692.28 2.47 B 8 632.84 2.71 10 602.01 2.85 12 565.35 3.03 14 565.64 3.03 16 561.69 3.05 45

MIKE 21 Flow Model FM Figure 9.5 for MIKE 21 Flow model Classic with OpenMP parallelisation. Black line: A; Blue line: B; Red line: Ideal speedup. 46 Parallelisation using - DHI

Comparison of Graphics Cards 10 Comparison of Graphics Cards For simulations having a small number of elements in the computational grid, the best performance is obtained with the Tesla K20c card while for simulations with a large number of elements the best performance is obtained with the GeForce GTX Titan card. This is illustrated in Fig.10.1 and 10.2. Here the speedup as function of the number of elements for the Mediterranean Sea test case is compared for the three graphic cards: GeForce GTX Titan; Quadro K5000; Tesla K20c. All simulations were performed on hardware platform 1 specified in Table 2.2. In Fig.10.1 the speedup using single precision is compared and in Fig. 10.2 the speedup using double precision is compared. For the A case the Tesla K20c card is roughly a 2 faster than the GeForce GTX TITAN card. As illustrated in Figures 10.1 and 10.2 the speedup for the Quadro K5000 card is not near as impressive as the speedup for the Tesla K20c and the GeForce GTX Titan cards. This is, however, expected since the Quadro K5000 card has fewer CUDA cores and a lower floating point precision performance than the two other workstation cards. Some graphics cards are not suitable for high performance computing, e.g. the NVS 4200M laptop card. Even though small speedup s are measured for the Mediterranean Sea test cases, these speedup s are not very impressive compared to the speedup s measured for the 3 workstation cards. Figure 10.1 using single precision and using the higher-order scheme in time and space. Black line: Quadro K5000; Blue line: Tesla K20c; Green line: GeForce GTX Titan 47

MIKE 21 Flow Model FM Figure 10.2 using double precision and using the higher-order scheme in time and space. Black line: Quadro K5000; Blue line: Tesla K20c; Green line: GeForce GTX Titan 48 Parallelisation using - DHI

Conclusions 11 Conclusions The overall conclusions of the benchmarks are The numerical scheme and the implementation of the version of the MIKE 21 Flow Model FM are identical to the CPU version of MIKE 21 Flow Model FM. Simulations without flooding and drying produces identical results using the two versions. Simulations with extensive flooding and drying produce results that may contain small differences. The performance of the new version of MIKE 21 Flow Model FM depends highly on the graphics card and the model setup. When evaluating the performance by comparing with a single core (no parallelization) CPU simulation the performance also depend highly on the specifications for the CPU. The speedup of simulations with no flooding and drying increases with increasing number of elements in the computational mesh. When the number of elements becomes larger than approximately 400.000 then there is only a very limited increase in the speedup for increasing number of elements. In simulations with extensive flooding and drying the average number of wet elements can be small compared to the total number of elements in the mesh. In these cases a speedup of 5-15 is typically obtained. The single precision version of MIKE 21 Flow Model FM is generally faster than the double precision version. For simulations with no flooding and drying the single precision version is approximately a 1.1-1.4 faster than the double precision version. For the Quadro K5000 card this is even higher, 1.4-2.6, but this is due to a relative low double precision floating point performance, see Table 2.1. However, for simulations with extensive flooding and drying there is only a limited reduction in the computational time when using the single precision version compared to the double precision version. In the Mediterranean Sea case the maximum speedup using the first-order scheme was 76.07 and 54.97, respectively, with the single precision and the double precision version. Using the higher-order scheme the speedup was 61.67 and 47.59, respectively, with the single precision and the double precision version. These numbers are for the GeForce GTX Titan card. 49

MIKE 21 Flow Model FM 50 Parallelisation using - DHI

References 12 References /1/ Andersen, O.B., 1995, Global ocean tides from ERS-1 and TOPEX/POSEIDON altimetry, J. Geophys Res. 100 (C12), 25,249-25,259. /2/ Bo Brahtz Christensen, Nils Drønen, Peter Klagenberg, John Jensen, Rolf Deigaard and Per Sørensen, 2013, Multiscale modelling of coastal flooding, Coastal Dynamics 2013, paper no. 053. /3/ Néelz S., Pender G., 2010, Benchmarking of 2D Hydraulic Modelling Packages, Report published by Environment Agency, www.environment-agency.gov.uk, Copies of this report are available from the publications catalogue: http://publications.environment-agency.gov.uk. 51

MIKE 21 Flow Model FM 52 Parallelisation using - DHI