MANUFACTURING WEATHER FORECASTING SIMULATIONS ON HPC INFRASTRUCTURES

Similar documents
High Performance Computing in CST STUDIO SUITE

Schedule WRF model executions in parallel computing environments using Python

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

1 Bull, 2011 Bull Extreme Computing

HPC Wales Skills Academy Course Catalogue 2015

Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers

Performance Analysis of a Numerical Weather Prediction Application in Microsoft Azure

A Performance and Cost Analysis of the Amazon Elastic Compute Cloud (EC2) Cluster Compute Instance

Recent Advances in HPC for Structural Mechanics Simulations

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

Manual for using Super Computing Resources

Part I Courses Syllabus

The Asterope compute cluster

SR-IOV: Performance Benefits for Virtualized Interconnects!

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

Cloud Computing through Virtualization and HPC technologies

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Estonian Scientific Computing Infrastructure (ETAIS)

Auto-Tuning TRSM with an Asynchronous Task Assignment Model on Multicore, GPU and Coprocessor Systems

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

Overview of HPC Resources at Vanderbilt

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC Denver

Accelerating CFD using OpenFOAM with GPUs

Parallel Programming Survey

Parallel Computing with MATLAB

Overview of HPC systems and software available within

FLOW-3D Performance Benchmark and Profiling. September 2012

Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber

CNR-INFM DEMOCRITOS and SISSA elab Trieste

Recommended hardware system configurations for ANSYS users

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

Parallel Processing using the LOTUS cluster

Trends in High-Performance Computing for Power Grid Applications

A quick tutorial on Intel's Xeon Phi Coprocessor

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

OpenMP Programming on ScaleMP

The Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist

Cloud Computing. Alex Crawford Ben Johnstone

Using the Windows Cluster

Stream Processing on GPUs Using Distributed Multimedia Middleware

Set Up and Run WRF (ARW-Ideal, ARW-real, and NMM-real)

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania)

Climate-Weather Modeling Studies Using a Prototype Global Cloud-System Resolving Model

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises

High Productivity Computing With Windows

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

A National Computing Grid: FGI

Simulation Platform Overview

Finite Elements Infinite Possibilities. Virtual Simulation and High-Performance Computing

LLamasoft K2 Enterprise 8.1 System Requirements

Working with HPC and HTC Apps. Abhinav Thota Research Technologies Indiana University

Large-Data Software Defined Visualization on CPUs

Cluster, Grid, Cloud Concepts

Cluster Implementation and Management; Scheduling

ArcGIS Pro: Virtualizing in Citrix XenApp and XenDesktop. Emily Apsey Performance Engineer

Computing in High- Energy-Physics: How Virtualization meets the Grid

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

Integrated Grid Solutions. and Greenplum

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

Multicore Parallel Computing with OpenMP

BLM 413E - Parallel Programming Lecture 3

Enterprise HPC & Cloud Computing for Engineering Simulation. Barbara Hutchings Director, Strategic Partnerships ANSYS, Inc.

icer Bioinformatics Support Fall 2011

High-Performance Computing and Big Data Challenge

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

wu.cloud: Insights Gained from Operating a Private Cloud System

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age

1 DCSC/AU: HUGE. DeIC Sekretariat /RB. Bilag 1. DeIC (DCSC) Scientific Computing Installations

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

The Hartree Centre helps businesses unlock the potential of HPC

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

benchmarking Amazon EC2 for high-performance scientific computing

HPC Cloud. Focus on your research. Floris Sluiter Project leader SARA

Building a Private Cloud with Eucalyptus

Neptune. A Domain Specific Language for Deploying HPC Software on Cloud Platforms. Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams

Introduction to ACENET Accelerating Discovery with Computational Research May, 2015

Interactive Data Visualization with Focus on Climate Research

CUDA in the Cloud Enabling HPC Workloads in OpenStack With special thanks to Andrew Younge (Indiana Univ.) and Massimo Bernaschi (IAC-CNR)

Virtualization of a Cluster Batch System

Clusters: Mainstream Technology for CAE

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

Transcription:

MANUFACTURING WEATHER FORECASTING SIMULATIONS ON HPC INFRASTRUCTURES LADISLAV HLUCHÝ V. ŠIPKOVÁ, M. DOBRUCKÝ, J. BARTOK, B.M. NGUYEN INSTITUTE OF INFORMATICS, SLOVAK ACADEMY OF SCIENCES ECW 2016 - ENVIRONMENTAL COMPUTING WORKSHOP - ESCIENCE 2016

PARTNERS IISAS: INSTITUTE OF INFORMATICS, SLOVAK ACADEMY OF SCIENCES (ACADEMIC SECTOR) MICROSTEP-MIS: MONITORING AND INFORMATION SYSTEMS (COMMERCIAL SECTOR) IMS MODEL SUITE: COMPLEX SOFTWARE SYSTEM FOR METEOROLOGY AND CRISIS MANAGEMENT THIS PAPER PRESENTS A PART OF MANUFACTURING WRF ON HPC INFRASTRUCTURE FOR IMS MODEL SUITE

WRF - WEATHER RESEARCH AND FORECASTING DESIGNED FOR RESEARCH AND OPERATIONAL PURPOSES NUMERICAL WEATHER PREDICTION ATMOSPHERIC SIMULATION TWO DYNAMIC SOLVERS ARW: ADVANCE RESEARCH WRF NMM: NON-HYDROSTATIC MESOSCALE MODEL FLEXIBLE AND PORTABLE CODE SEQUENTIAL PARALLEL (MPI) WITH OR WITHOUT MULTI-THREADING SUPPORTS A TWO-LEVEL DOMAIN DECOMPOSITION AT FIRST INTO PATCHES FOR DISTRIBUTED MEMORY, THEN WITHIN EACH PATCH MULTI-THREADING IS APPLIED FOR SHARED MEMORY

OBJECTIVES DEVELOPMENT OF MANAGEMENT TOOLS TO FACILITATE THE EXECUTION OF THE WRF SIMULATION PROCESS ON HPC INFRASTRUCTURES LOCAL HPC CLUSTER GRID INFRASTRUCTURE (EGI) PERFORMANCE INVESTIGATION OF PARALLEL WRF MODELS TO FIND OUT THE MOST SUITABLE CONFIGURATION WITH THE GIVEN INPUT SCENARIO FOR 3D METEOROLOGICAL MODELLING MPI MPI + OPENMP THE NUMBER OF COMPUTE NODES, CORES, MPI PROCESSES, OPENMP THREADS THE MANAGEMENT TOOLS ARE ALSO USED FOR PARAMETER TUNING OF THE MODELS (FOR IMS BY MICROSTEP-MIS) THAT REQUIRES TENS OF EVALUATIONS OF THE PARAMETERIZED MODEL ACCURACY EACH EVALUATION OF THE MODEL PARAMETERS REQUIRES RE-RUNNING OF THE HUNDREDS OF METEOROLOGICAL SITUATIONS COLLECTED OVER THE YEARS AND COMPARISON OF THE MODEL OUTPUT WITH THE OBSERVED DATA

3D METEOROLOGICAL MODELLING DOMAINS - WEATHER MODELLING HORIZONTAL, VERTICAL AND TIME RESOLUTION, SO THE MODEL CAN CATCH LOCAL CONDITIONS METEOROLOGICAL INITIAL AND BOUNDARY CONDITIONS FROM THE GLOBAL MODEL GFS (GLOBAL FORECASTING SYSTEM) OF US NATIONAL WEATHER SERVICE THE SETTING ENABLED TO MODEL THE ARABIAN PENINSULA WEATHER THE UPPERMOST DOMAIN WITH THE RESOLUTION 50X50 KM THE FINAL DOMAIN WITH THE RESOLUTION 1.8 KM, AROUND DUBAI AND ABU DHABI

WRF SIMULATION Pi MPI process Tj OpenMP thread WRF SIMULATION CONSISTS OF MANY EXECUTABLE PROGRAMS VARIOUS TYPE AND COMPLEXITY, SEQUENTIAL AND PARALLEL TAKING A DIFFERENT NUMBER OF PROCESSOR CORES FOR EXECUTION WRF WORKFLOW - DAG GRAPH (JOB 1) WPS PREPROCESSING, (JOB 2) WRF MODELING, (JOB 3) UPP POST-PROCESSING

WRF WORKFLOW MORE DETAILS JOB 1 - WPS PREPROCESSING: CONVERSION OF INPUTS FROM GRIB TO NETCDF FORMAT USING GEOGRID.EXE (SERIAL/MPI) UNGRIB.EXE (SERIAL) METGRID.EXE (SERIAL/MPI) JOB 2 - WRF MODELING - NUMERICAL MODELING USING REAL.EXE INITIALIZATION - REAL DATA PREPROCESSOR (MPI/MPI+OPENMP) WRF.EXE NUMERICAL INTEGRATION - ARW SOLVER (MPI/MPI+OPENMP) JOB 3 - UPP POST-PROCESSING CONVERSION OF OUTPUTS FROM NETCDF TO GRIB FORMAT USING UNIPOST.EXE (SERIAL/MPI) IN A NESTED CYCLE FOR ALL HOURS OF THE PREDICTED TIME PERIOD THERE IS NO DEPENDENCY BETWEEN PROCESSING DATA OF INDIVIDUAL HOURS, SO, THE JOB CAN BE STRUCTURED AS A PARAMETRIC STUDY (PS), WHERE EACH SUB-JOB HANDLES A SECTION OF THE TIME PERIOD

WRF WORKFLOW EXECUTION STARTS ON THE UI MACHINE THROUGH THE INVOCATION OF THE WRF WORKFLOW-MANAGER ENCOMPASSED WITH NEEDED INPUT PARAMETERS IS REALIZED WITHIN THE RUNNING-ENVIRONMENT LOCATED IN THE SHARED ADDRESS SPACE WHICH HAS THE DIRECTORY STRUCTURE GEOG CFG PARM BIN INPUT_ARCH OUTPUT_ARCH WPS_RUN MODEL_RUN POSTPR_RUN GEOGRAPHICAL DATA, SEVERAL GEO-TABLES CONFIGURATION FILES FOR INPUT SCENARIO AND SIMULATION OPTIONS UPP POST-PROCESSING PARAMETERS RUN-SCRIPTS AND EXECUTABLES INPUT DATA FILES OUTPUT DATA FILES WPS PREPROCESSING WRF MODELING UPP POST-PROCESSING

IISAS HPC CLUSTER v HARDWARE CONFIGURATION 52X IBM DX360 M3 (2X INTEL E5645 @2.4GHZ, 48 GB RAM, 2X 500 GB SCRATCH DISK), 2X IBM DX360 M3 (2X INTEL E5645 @2.4GHZ, 48 GB RAM, 2X 500 GB SCRATCH DISK, NVIDIA TESLA M2070: 6 GB RAM + 448 CUDA CORES), 2X X3650 M3 MANAGING SERVERS (2X INTEL E5645 @2.4GHZ, 48 GB RAM, 6X 500 GB DISKS), 4X X3650 M3 DATA- MANAGING SERVERS (2X INTEL E5645 @2.4GHZ, 48 GB RAM, 2X 500 GB DISKS, 2X 8 GBPS FC), 1X X3550 M4 SERVER (1X INTEL E5-2640 @2.5GHZ, 8 GB RAM, 2X 500 GB DISKS), INFINIBAND 2X 40 GBPS (IN 52+2+2+4 NODES), 2X DS3512 WITH 72TB DISKS v SOFTWARE INSTALATION WRF PACKAGE VERSION 3.7.1 (WRF, WPS, TERRESTRIAL DATASETS), UPP VERSION 3.0, LIBRARIES NETCDF 4, JASPER 1.7, GNU COMPILERS VERSION 4.4.7 (GFORTRAN, GCC, OPENMP LIBRARY), OPEN MPI VERSION 1.10.0

PERFORMANCE RESULTS WRF MODEL: SEQUENTIAL ON THE LOCAL CLUSTER PREDICTION TIME PERIOD 3 HOURS IN THIS PAPER FOR SCALING WRF SIMULATIONS FOR TESTING PURPOSE WITH GIVEN HW/SW CONFIGURATIONS 48 HOURS IN REAL SIMULATIONS (MICROSTEP-MIS) TO MODEL THE ARABIAN PENINSULA WEATHER THE NEED OF HPC TO ACCELERATE SIMULATIONS Number of nodes Number of cores per node Execution time hh:mm:ss WPS 1 1 00:39:54 WRF 1 1 15:57:53 UPP (2 jobs) 1 1 00:03:48 Complete simulation process 16:41:35

PERFORMANCE RESULTS WRF MODEL: MPI ON LOCAL CLUSTER FIXED NUMBER OF CORES PER NODE Number of nodes Number of cores per node Number of MPI processes Execution time hh:mm:ss WPS 1 10 10 00:04:22 WRF 1 8 8 02:36:33 WRF 2 8 16 01:27:01 WRF 4 8 32 00:49:03 WRF 8 8 64 00:30:13 WRF 16 8 128 00:20:47 WRF 32 8 256 00:13:57 UPP (2 jobs) 1 3 3 00:01:44 Complete simulation process (best) 00:20:03

PERFORMANCE RESULTS WRF MODEL: MPI + OPENMP ON LOCAL CLUSTER FIXED MPI PROCESSES Number of nodes x cores Number of MPI processes ( per node) Number of OpenMP threads Execution time hh:mm:ss WRF 8x12 32 (4) 2 00:31:31 WRF 16x12 32 (2) 4 00:20:52 WRF 16x12 32 (2) 6 00:17:21 WRF 32x12 32 (1) 8 00:15:47 WRF 32x12 32 (1) 10 00:15:15 WRF 32x12 32 (1) 12 00:15:20

PERFORMANCE RESULTS WRF MODEL: MPI + OPENMP ON LOCAL CLUSTER FIXED NUMBER OF OPENMP THREADS Number of nodes x cores Number of MPI processes (per node) Number of OpenMP threads Execution time hh:mm:ss WRF 8x12 32 (4) 3 00:24:44 WRF 12x12 48 (4) 3 00:19:28 WRF 16x12 64 (4) 3 00:17:27 WRF 24x12 96 (4) 3 00:13:49 WRF 32x12 128 (4) 3 00:12:24 WRF 16x12 32 (2) 6 00:17:21 WRF 24x12 48 (2) 6 00:14:31 WRF 32x12 64 (2) 6 00:12:09 WRF 40x12 80 (2) 6 00:12:01

WRF MODEL MPI ON GRID INFRASTRUCTURE EGI WRF RUNNING-ENVIRONMENT IN ITS INITIAL STATE, ALL EXECUTABLES AND INPUT FILES ARE STORED IN GRID STORAGE ELEMENT (SE), FROM WHICH THEY ARE DOWNLOADED GEOGRAPHICAL DATASETS (174 GB) ARE LOCATED IN CLUSTER SHARED ADDRESS SPACE, THEY DO NOT PARTICIPATE ON THE DATA TRANSFER GRID WRF WORKFLOW IS DESIGNED AS ONE GRID JOB ENCAPSULATING ALL TASKS: WPS+WRF+UPP MPI PROGRAMS ARE EXECUTED USING MPI-START OUTPUT OF SIMULATION IS UPLOADED TO STORAGE ELEMENT (SE) TIME OVERHEAD BY DATA TRANSFERS BETWEEN CE AND SE: 2 MINUTES Grid UI Grid User Interface WMS Workload Management System VO Virtual Organization CE Computing Elements GG Grid Gate LRMS Local Resource Management System WN Working Node SE Storage Element PBS Portable Batch System

CONCLUSION MANAGEMENT TOOLS ARE BUILT AND FULFILL DESIGNED PURPOSES TO LOCATE THE OPTIMAL CONFIGURATION WITH GIVEN SCENARIO FOR IMS MODEL PARAMETER TUNING (MICROSTEP-MIS) HYBRID PROGRAMMING MODEL (MPI + OPENMP) SEEMS A NATURAL FIT FOR THE WAY MOST CLUSTERS ARE BUILT TODAY THE GRID OVERHEAD IS CAUSED MAINLY BY THE TRANSFER OF BIG FILES BETWEEN THE SE AND CE

FUTURE DIRECTIONS Ø GRID AT THE MOMENT, IN EUROPEAN GRID INFRASTRUCTURE (EGI), ONLY A FEW GRID SITES AND VIRTUAL ORGANIZATIONS (VO) ARE SUPPORTING MPI AND OPENMP APPLICATIONS Ø CLOUD PERFORMANCE OVERHEAD ASSOCIATED WITH VIRTUALIZATION OF INTERCONNECTION NETWORK WRF IS REPORTED TO RUN ON VIRTUALIZED INFINIBAND INTERCONNECT WITH ONLY 15% OVERHEAD WHICH MAKES FULLY VIRTUALIZED HPC CLUSTERS VIABLE SOLUTION Ø ACCELERATORS PARTS OF WRF WERE PORTED TO NVIDIA GPU AND INTEL XEON PHI WITH PROMISING RESULTS

THANK YOU FOR YOUR ATTENTION MANUFACTURING WEATHER FORECASTING SIMULATIONS ON HPC INFRASTRUCTURES LADISLAV.HLUCHY@SAVBA.SK INSTITUTE OF INFORMATICS, SLOVAK ACADEMY OF SCIENCES ECW 2016 - ENVIRONMENTAL COMPUTING WORKSHOP - ESCIENCE 2016