HPC Trends and Directions in the Earth Sciences Per Nyberg nyberg@cray.com Sr. Director, Business Development



Similar documents
The Fusion of Supercomputing and Big Data: The Role of Global Memory Architectures in Future Large Scale Data Analytics

Cray s Storage History and Outlook Lustre+ Jason Goodman, Cray LUG Denver

Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy?

Data Centric Systems (DCS)

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Jean-Pierre Panziera Teratec 2011

Data management challenges in todays Healthcare and Life Sciences ecosystems

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

Hadoop on the Gordon Data Intensive Cluster

The Future of Data Management

Data Centric Computing Revisited

PACE Predictive Analytics Center of San Diego Supercomputer Center, UCSD. Natasha Balac, Ph.D.

Cray: Enabling Real-Time Discovery in Big Data

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Infrastructure Matters: POWER8 vs. Xeon x86

HPC & Big Data THE TIME HAS COME FOR A SCALABLE FRAMEWORK

Interactive data analytics drive insights

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

HPC and Large Scale Data Analytics. SOS17 Conference Jekyll Island, Georgia

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Dell* In-Memory Appliance for Cloudera* Enterprise

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

Achieving Performance Isolation with Lightweight Co-Kernels

Big Data Performance Growth on the Rise

Big Workflow: More than Just Intelligent Workload Management for Big Data

Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Barry Bolding, Ph.D. VP, Cray Product Division

Sun Constellation System: The Open Petascale Computing Architecture

Oracle Big Data SQL Technical Update

The Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays

Running Oracle s PeopleSoft Human Capital Management on Oracle SuperCluster T5-8 O R A C L E W H I T E P A P E R L A S T U P D A T E D J U N E

Software-defined Storage Architecture for Analytics Computing

Dell s SAP HANA Appliance

Kriterien für ein PetaFlop System

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System

Storage, Cloud, Web 2.0, Big Data Driving Growth

HadoopTM Analytics DDN

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Integrated Grid Solutions. and Greenplum

Amazon EC2 Product Details Page 1 of 5

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Overview: X5 Generation Database Machines

Unified Computing Systems

ECMWF HPC Workshop: Accelerating Data Management

NERSC Data Efforts Update Prabhat Data and Analytics Group Lead February 23, 2015

IT Platforms for Utilization of Big Data

Clusters: Mainstream Technology for CAE

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Accelerating Server Storage Performance on Lenovo ThinkServer

BSC vision on Big Data and extreme scale computing

Real-Time Big Data Analytics SAP HANA with the Intel Distribution for Apache Hadoop software

STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

The Open Cloud Near-Term Infrastructure Trends in Cloud Computing

Big Data Web Analytics Platform on AWS for Yottaa

HPC data becomes Big Data. Peter Braam

Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise

REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT

2009 Oracle Corporation 1

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

With DDN Big Data Storage

IBM Netezza High Capacity Appliance

BigData & Analytics Research and Developement at Petrobras

ioscale: The Holy Grail for Hyperscale

PRIMERGY server-based High Performance Computing solutions

Summit and Sierra Supercomputers:

Data Analytics at NERSC. Joaquin Correa NERSC Data and Analytics Services

How To Build A Cisco Ukcsob420 M3 Blade Server

Next-Generation Networking for Science

200 flash-related. petabytes of. flash shipped. patents. NetApp flash leadership. Metrics through October, 2015

Cluster, Grid, Cloud Concepts

Hur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

Growth of Unstructured Data & Object Storage. Marcel Laforce Sr. Director, Object Storage

Cisco Solutions for Big Data and Analytics

Deploying Flash- Accelerated Hadoop with InfiniFlash from SanDisk

The Flash-Transformed Financial Data Center. Jean S. Bozman Enterprise Solutions Manager, Enterprise Storage Solutions Corporation August 6, 2014

System Architecture. In-Memory Database

HPC technology and future architecture

Transcription:

HPC Trends and Directions in the Earth Sciences Per Nyberg nyberg@cray.com Sr. Director, Business Development

Cray Solutions for the Earth Sciences Why Cray? Cray solutions support: Range of modeling capabilities Research and operational environments Shortened research to operations Experience in delivering and operating the world s largest and most complex systems Commitment to long-term partnerships Market Presence Cray systems are used in a wide range of areas and configurations: NWP and Climate Used as model development platforms for extreme scale architectures Compute and storage sizes range from Terascale to Petascale What s Next? Continued emphasis on system-wide approach to scalability hardware and software Impact of latency sensitive big and fast data Impact of analytics workloads on HPC architectures Ultimately integrated HPC environments are the capability that will turn data in to insight and discovery. 2

Cray Announcements in Weather, Climate and Oceanography over the Last Two Years Feb 2014 Oct 2014 June 2013 June 2014 Sept 2013 June 2014 June 2013 April 2014 Jan 2013 April 2014 3

Cray XC Design Areas for Weather and Climate Tightly integrated hardware and software stack designed for Sustained Application Performance and Scalability: Uniquely capable of both large scale high resolution deterministic and ensemble workloads Highly tuned software stack maximizes the cycles to the application Operational workloads and environments Investment Protection Upgradable by Design: Key system elements (CPUs, memory) can be easily upgraded as faster and more capable components become available Systems can be easily expanded with additional blades / cabinets Programming Environment designed for application life and portability User Productivity 4

Trends and Market Drivers 5

Data and Computational Drivers Today s science is: Data-intensive Data-driven Compute-intensive Multi-disciplinary Multi-scale, multi-physics Driven by multiple dimensions. Data Tsunami is defying standard approaches to interpretation Volume and complexity of data are too much for either humans or current technologies for effective analysis Creating necessity to reexamine current applications and develop new applications domains Fidelity / Resolution Data and Compute Resources Requires multi-dimensional technology stack hardware, system software and applications Figure adopted from Computing Issues for WCRP Weather and Climate Modeling, James Kinter and Michael Wehner 6

The Current Wave of Big Data. Complex, unstructured, created by sensors,. Data driven discovery and advanced analytics are rapidly becoming a competitive differentiators providing insight and predictive capabilities 7

Simulation was the first big data market -- IDC Isotropic/anisotropic FWI elastic modeling/rtm Elastic FWI visco elastic modeling Visco elastic FWI petro-elastic inversion Identify all protein sequences using public resources and metagenomics data, and systematic modelling of proteins belonging to the family Aero Optimisation & CFD-CSM Improving the prediction of protein structure by coupling new bioinformatics algorithm and massive molecular dynamics simulation approaches Full multidisciplinary design optimization Systematic identification of biological partners of proteins. CFD-based noise simulation Coupled AGCMs at 50km atmosphere and 1 deg ocean Earth System Models at 10 km atmosphere and 1/10 deg ocean Full earth system models with carbon feedback cycle at 1km atmosphere and 1/100 deg ocean Terascale Petascale Exascale Major changes to applications and algorithms to address complexity and extend scalability 8

The Evolving HPC Workload: Simulation, data driven discovery and advanced analytics Largest ever hurricane simulation at an urban scale resolution of 500m Large-scale engine bay airflow optimization to improve vehicle fuel consumption Highest-definition sub-surface images of oil fields improving development and production efficiencies Healthcare fraud detection across disparate databases Cyberattack detection through the discovery of relationships from multiple sources in billion point graphs Viewer sentiment analysis to optimize television guest appearances Predictive models to determine optimal athlete performance and likelihood of injury Large, dynamic and complex analytics for portfolio risk analysis 9

The Evolving HPC Workload: Economics of emerging use-cases will take part in driving architectures across all areas The Demand Side Healthcare Fraud Detection (Source: IDC) 5 separate databases for the big USG health care programs under Centers for Medicare and Medicaid Services (CMS) Estimated fraud: $150B ~ $450B <$5B caught today) ORNL, SDSC have evaluation contracts to unify the databases and perform fraud detection on various architectures. Sports Analytics (Source: Vince Genarro) Player, field of play and consumer data has exploded The stakes have grown dramatically $50 $100 million decisions are commonplace Winning Drives Profitability Largest ever hurricane simulation at an urban scale resolution of 500m Large-scale engine bay airflow optimization to improve vehicle fuel consumption Highest-definition sub-surface images of oil fields improving development and production efficiencies Health fraud detection across disparate databases Cyberattack detection through the discovery of relationships from multiple sources in billion point graphs Viewer sentiment analysis to optimize television guest appearances Predictive models to determine optimal athlete performance and likelihood of injury Large, dynamic and complex analytics for portfolio risk analysis 10

Growth in Data Science: Monsanto Example Background What do they do? In Oct 2013 Monsanto acquired The Climate Corporation for ~$1B. The Climate Corporation s expertise is in data science. Use a wide range of information to provde insights and recommendations for farmers, such as planting and irrigation schedules. offering <farmers> novel options in the way they manage risk on farm including weather, which is the single biggest risk farmers face on an annual basis. Based on a Hadoop implementation that creates weather projections for the next two years at every 2.5 by 2.5 kilometer grid across the U.S. Mapped out the most likely 10,000 outcomes per location using different variations of likely patterns to create a probabilistic view of weather. Goal Goal is to help farmers understand the risks of their practices and to help them reduce those risks by changing their practices and by helping underwrite weather insurance against adverse effects. Sources: http://www.monsanto.com/features/pages/monsanto-acquires-the-climate-corporation.aspx http://www.datamation.com/applications/hadoop-makes-a-big-data-splash-1.html 11

Emergence of Latency-Sensitive Analytics Low-latency applications require performance optimizations Memory-storage hierarchies Fast interconnects Response time frames <30ms 30ms 10min streaming data tweets event logs IoT SQL/ad hoc queries BI visualization exploration Low-Latency Hadoop tomorrow Streaming data / Data in motion >10min summarization aggregation indexing ETL Batch Hadoop today Stationary data 12

Enabling More Complexity & Capability Big Data à Fast Data Global Memory + Fast Interconnects Fast Data Supercomputers & Analytic Appliances SAN Interconnects Big Data LAN/WAN Interconnects Public Clouds Enterprise Data (structured) CLOUD GRID 13

System Architecture Differences Supercomputing Scalable computing w/high BW, low-latency, global memory architectures Tightly integrated processor-memory-interconnect & network storage Minimize data movement load the mesh into memory Move data for loading, check-pointing or archiving Basketball court sized systems Large-Scale Data Analytics Distributed computing at largest scale Divide-and-conquer approaches on Service Orientated Architectures Maximize data movement-- scan/sort/stream all the data all the time Lowest cost processor-memory-interconnect & local storage Warehouse sized private and public clouds 14

The Fusion of Supercomputing with Large Scale Data Analytics The cost per FLOP is dropping dramatically Cost of moving data dominates performance and energy consumption Key Trends New levels of memory and storage hierarchy are emerging Effective Parallelism of O(Million ~ Billion) Analytic application needs for memory size and IO performance are insatiable! Future systems are being influenced by the converged infrastructure of compute & storage Converged infrastructure of Hyperscale demands management capability of O(10000) nodes 15

Changing Needs The Fusion of Supercomputing and Big & Fast Data Modeling The World Cray Supercomputers solving grand challenges in science, engineering and analytics Math Models Modeling and simulation augmented with data to provide the highest fidelity virtual reality results Data-Intensive Processing High throughput event processing & data capture from sensors, data feeds and instruments Data Models Integration of datasets and math models for search, analysis, predictive modeling and knowledge discovery COMPUTE STORE ANALYZE 16

Cray Integrated Environments for Simulation and Discovery Compute Store Analyze Supercomputers Flexible Clusters Hybrid Architectures Integrated Storage & Data Management Tiered Storage Archive Data Discovery Advanced Analytics C O M P U T E S T O R E A N A L Y Z E 10/30/14 ECMWF HPC Workshop 2014 17

Recently Announced: Urika-XA Platform Turnkey Advanced Analytics Platform Next-Generation System Architecture Engineered for Performance Hadoop and Spark ecosystem Emerging analytic workloads Open platform for current and future frameworks Single pane of glass for system management Innovative use of storage technologies Battle-tested on cutting-edge government/scientific analytic applications Ready for the enterprise Dense footprint: over 1,500 cores, 6TB memory 38TB SSD and 120TB POSIXcompliant high-performance storage InfiniBand Cray Adaptive Runtime for Hadoop Scale out to multi-rack configurations 18

Recent Cray XC40 Announcement What is it? Follow-on to highly successful Cray XC30 Already received orders for over 70,000 sockets Much of which is already installed What s New? DataWarp applications I/O accelerator New tier of high performance flash SSD directly connected to the Cray XC40 compute nodes. Delivers a balanced and cohesive end-toend system architecture Support for Intel Xeon processor E5-2600 v3 product family, formerly code named Haswell, Comes in small packages too the Cray XC40-AC Cray s core technologies in a smaller package Same Powerful and Productive Software Ecosystem Same Aries Infrastructure Same DataWarp I/O Options Only the cooling is different. C O M P U T E S T O R E A N A L Y Z E 10/30/14 ECMWF HPC Workshop 2014 19

Trends in the Memory/Storage Subsystem CPU On Node Off Node CPU Memory (DRAM) Storage (HDD) Distant Storage (WAN/Tape) Near Memory (HBM/HMC) Far Memory (DRAM) Near Storage (NVDIMM) Mid Storage (SSD) Far Storage (HDD) On Node Off Node Distant Storage (Object/WAN/Tape) Today Future 20

Two Recent Large XC Orders that will Impact Future Technologies and Applications Throughout the Community Los Alamos / Sandia Trinity 42 Pflop system Heterogeneous System Haswell + KNL 3PB SSD DataWarp Capability NERSC8 Installation Date: 2Q 2016 Self-hosted KNL System ~28 Pflops Peak Transitioning user base to many-core processing NERSC Exascale Science Applications Program (NESAP) 21

NERSC Exascale Science Applications Program (NESAP) Collaboration with Cray and Intel to prepare for Cori, the Cray XC to be deployed at NERSC in 2016. NESAP was launched to ensure that the highly diverse workloads of the DOE science community continue to be supported as over 5,000 users make the transition to Cori. Application areas: Fusion Energy Sciences, High Energy Physics, Nuclear Physics, Biological and Environmental Research: ESM Global Climate Modeling, John Dennis (NCAR) High-Resolution Global Coupled Climate Simulation Using The Accelerated Climate Model for Energy (ACME), Hans Johansen (LBNL) Multi-Scale Ocean Simulation for Studying Global to Regional Climate Change, Todd Ringler (LANL) Large number of community models are involved. 22

Moving Forwards: Adapting to Simulation and Data-Intensive Workloads by Adding Value at the Edge of the Network Configurable Injection and Global Bandwidth Data Processors x86 Many-core / GPU Multi-threaded Flexible compute for specialized functions System Software High throughput scheduling Adaptive runtimes / Adaptive routing Compilers, Auto-tuning, Programming models Middleware stack to support simulation and analytics workflows Memory Pools Storage I/O Network I/O DRAM Active NVRAM/NVDIMM HMC/HBM Tiered Storage from SSD s to Disk to Tape Tightly coupled Software Stack for RAID Very Low Latency Intelligent I/O & Protocol Load Balancing Specialized Data Ingest Policy Based Data Movement/Software Defined Network System-wide approach to workflow performance 23

Summary Weather, climate and ocean modeling is a key community within Cray s customer base. Data driven discovery and advanced analytics are rapidly becoming a competitive differentiators in both traditional and non-traditional HPC areas. The HPC workload is evolving and is driving the need for advanced architectures. Cray s vision is that the fusion of Supercomputing and Big & Fast Data is the capability that will turn data into insight and discovery. 24

Thank you for your attention 25