Data Centric Systems (DCS)

Similar documents
Data Centric Computing Revisited

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

Crossing the Performance Chasm with OpenPOWER

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell

Next Generation Operating Systems

Big Data, Integration and Governance: Ask the Experts

Data Center and Cloud Computing Market Landscape and Challenges

Architectures and Platforms

Data Centric Interactive Visualization of Very Large Data

A Strategic Approach to Unlock the Opportunities from Big Data

Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED

High Performance Computing. Course Notes HPC Fundamentals

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

HPC & Big Data THE TIME HAS COME FOR A SCALABLE FRAMEWORK

BSC vision on Big Data and extreme scale computing

DGE /DG Connect

EL Program: Smart Manufacturing Systems Design and Analysis

Visualization and Data Analysis

White Paper The Numascale Solution: Extreme BIG DATA Computing

numascale White Paper The Numascale Solution: Extreme BIG DATA Computing Hardware Accellerated Data Intensive Computing By: Einar Rustad ABSTRACT

White Paper The Numascale Solution: Affordable BIG DATA Computing

Enabling the Use of Data

Petascale Software Challenges. Piyush Chaudhary High Performance Computing

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

IBM System x reference architecture solutions for big data

The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.

Hur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Datacenter Operating Systems

Impact of Big Data in Oil & Gas Industry. Pranaya Sangvai Reliance Industries Limited 04 Feb 15, DEJ, Mumbai, India.

PRIMERGY server-based High Performance Computing solutions

Easier - Faster - Better

ANALYTICS IN BIG DATA ERA

Make the Most of Big Data to Drive Innovation Through Reseach

Hadoop Cluster Applications

CHAPTER 1 INTRODUCTION

Luncheon Webinar Series May 13, 2013

Infrastructure Matters: POWER8 vs. Xeon x86

A New Era of Computing

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

SGI Big Data Ecosystem - Overview. Dave Raddatz, Big Data Solutions & Performance, SGI

Data Intensive Science and Computing

Big Data Processing: Past, Present and Future

HPC technology and future architecture

Petascale Visualization: Approaches and Initial Results

Next Generation GPU Architecture Code-named Fermi

Oracle Big Data SQL Technical Update

Taming Big Data Storage with Crossroads Systems StrongBox

Introducing EEMBC Cloud and Big Data Server Benchmarks

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Network for Sustainable Ultrascale Computing (NESUS)

PACE Predictive Analytics Center of San Diego Supercomputer Center, UCSD. Natasha Balac, Ph.D.

Laurence Liew General Manager, APAC. Economics Is Driving Big Data Analytics to the Cloud

Enabling High performance Big Data platform with RDMA

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Simplifying Storage Operations By David Strom (published 3.15 by VMware) Introduction

Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber

Understanding the Value of In-Memory in the IT Landscape

REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT

Pedraforca: ARM + GPU prototype

Dr. John E. Kelly III Senior Vice President, Director of Research. Differentiating IBM: Research

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Parallel Computing. Benson Muite. benson.

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out

Software Development around a Millisecond

The Fusion of Supercomputing and Big Data: The Role of Global Memory Architectures in Future Large Scale Data Analytics

TRAINING PROGRAM ON BIGDATA/HADOOP

Achieving Performance Isolation with Lightweight Co-Kernels

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

EXPLORATION TECHNOLOGY REQUIRES A RADICAL CHANGE IN DATA ANALYSIS

Virtual Platforms Addressing challenges in telecom product development

Cloud and Big Data Standardisation

Data Refinery with Big Data Aspects

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1

STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

Big Data Challenges in Bioinformatics

From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Transcription:

Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1

Data Centric Systems for a new Era 2

Volume in Exabytes Powerful Information: Big Data and the New Era of Computing Data volume is on the rise Dimensions of data growth 9000 8000 7000 6000 Sensors & Devices Social Media Terabytes to exabytes of existing data to process Volume Velocity Streaming data, milliseconds to seconds to respond 5000 4000 3000 2010 2015 VoIP Enterprise Data Structured, Variety unstructured, text, multimedia Veracity Uncertainty from inconsistency, ambiguities, etc. Big Data and Exascale High Performance Computing are driving many similar computer systems requirements 3

Data is Becoming the World s New Natural Resource 1 trillion connected objects and devices on the planet generating data by 2015 2. 5 billion gigabytes of data generated every day Data is the new basis of competitive advantage 2014 International Business Machines Corporation 4

The Challenge is to move from Data to Insight to Decisions 2014 International Business Machines Corporation 5

Big Data Driving Common Requirements High Performance Analytics Unstructured data Giga- to petascale data Primarily data mining algorithms High Performance Computing Structured data Exascale data sets Primarily scientific calculations Driver: Enhanced context Improves decision making Incorporate modeling and simulation for better predictions Incorporate sensor data Evolving requirements Driver: Doing more with models Real-time decision making Uncertainty quantification Sensitivity analysis Metadata extraction Data Centric Systems 6

Architecture 7

Big Data Requires a Data-Centric Architecture Compute-Centric Model Data-Centric Model input output Massive Parallelism Persistent Memory Data lives on disk and tape Move data to CPU as needed Deep storage hierarchy Data lives in persistent memory Many CPU s surround and use Shallow/Flat storage hierarchy Major impact on hardware, systems software, and application design

IBM Research Data Centric Design Principles Massive data requirements drive a composable architecture for big data, complex analytics, modeling and simulation. The DCS architecture will appeal to segments experiencing an explosion of data and the associated computational demands Principle 1: Minimize data motion Principle 2: Compute Everywhere Principle 3: Modularity Principle 4: Application-driven design Principle 5: Leverage OpenPower to Accelerate Innovation 9

IBM Research OpenPOWER Foundation MISSION: The OpenPOWER Consortium s mission is to create an open ecosystem, using the POWER Architecture to share expertise, investment and validated and compliant server-class IP to serve the evolving needs of customers. Opening the architecture to give the industry the ability to innovate across the full Hardware and Software stack Includes SOC design, Bus Specifications, Reference Designs, FW OS and Hypervisor Open Source Example: POWER CPU Tesla GPU Driving an expansion of enterprise class Hardware and Software stack for the data center Building a vibrant and mutually beneficial ecosystem for POWER + 10

IBM Research Building collaboration and innovation at all levels Implementation / HPC / Research System / Software / Services I/O / Storage / Acceleration Chip / SOC Boards / Systems Welcoming new members in all areas of the ecosystem 100+ inquiries and numerous active dialogues underway 30 th member just signed

DCS System Attributes Commercially Viable System High Performance Analytics (HPA) and HPC markets Scale from sub-rack to 100+ rack systems Composed of cost effective components Upgradeable and modular solutions Holistic System Design Storage, network, memory, and processor architecture and design Market demands, customer input, and workflow analysis influence design Extensive Software Stack Compiler, tool chain, and ecosystem support O/S, virtualization, system management Evolutionary approach to retain prior investments New Data Centric Model System Quality Reliability, availability, serviceability 12

Software 13

Software Challenges Beyond 2017 Exascale systems must address science and analytics mainstreams Broad range of users and skills Range of computation requirements: dense compute, viz, etc Applications will be workflow driven Workflows involve many different applications Complex mix of algorithms, Strong Capability Requirements UQ, Sensitivity Analysis, Validation and Verification elements Usability / Consumability Data Centric Model Unique large scale attributes must also be addressed Reliability Energy efficiency Underlying data scales pose significant challenge which complicates systems requirements Convergence of analytics, modeling, simulation, visualization, and data management 14

Role of the Programming Model in Future Systems IBM Research Recent programming models focus is on node level complexity Computation and control are the primary drivers Data and communication are secondary considerations Little progress on system wide programming models MPI, SHMEM, PGAS... Data Centric Model Future, data centric systems will be workflow driven, with computation occurring at different levels of the memory and storage hierarchy New and challenging problems for software and application developers Programming models must evolve to encompass all aspects of the data management and computation requiring a data-centric programming abstraction where: Compute, data and communication are equal partners High level abstraction will provide user means to reason about data and associated memory attributes There will be co-existence with lower level programming models 2012 IBM Corporation

Some Strategic Directions With Pragmatic Choices IBM Research Refine the node-level model and extend to support system wide elements Separate programming model concerns from language concerns Revisit holistic language approach build on the endurance of HPCS languages and evolve the concepts developed there: Places, Asynchrony.. Data Centric Model Investigate programming model concepts that are applicable across mainstream and HPC Focus on delivery through existing languages: Libraries, runtimes, compilers and tools Strive for cross vendor support Pursue open collaborative efforts but in the context of real integrated system development 2012 IBM Corporation

Workflows 17

Establishing Workload-Optimized Systems Requirements IBM Research Conceptual View of Data Fusion/Uncertainty Quantification Workflow Data Centric Applications Conceptual View of Data Intensive-Integer Workflow Low Spatial Locality Floating Point OPS Integer OPS High Spatial Locality Conceptual View of Data Intensive with Floating Point Workflow Region defined by LINPACK Compute Centric Applications 2010 IBM Corporation

DCS Workflows: mixed compute capabilities required Oil and Gas Analytics Capability: Reservoir 4-40 Racks Seismic Complex code Data Dependent Code Paths / Computation Lots of indirection / pointer chasing Often Memory System Latency Dependent C++ templated codes Limited opportunity for vectorization Limited scalability Limited threading opportunity Analytics Graph Analytics Financial Analytics All Source Analytics Science 2-20 Racks 1-5+ Racks 1-100 s Racks Value At Risk Image Analysis Massively Parallel Compute Capability: Simple kernels, Ops dominated (e.g. DGEMM, Linpack) Simple data access patterns. Can be preplanned for high performance. Throughput Capability 19 12/9/13

Single Thread ingle Thread with SIMD Vector SIMD with Predication Multi Thread SMP Cluster Increasing Address Scope Degrees of Parallelism in global scope System addressable Memory SMP Memory Socket DRAM High Bandwidth Memory Local Store Global Names: S/W conventions for Object naming Example K/V stores Global Address: H/W conventions Coherency mechanisms Distributed Computing: Active Memory Active Communication Active Storage Register / Vector File Increasing Parallelism Disjoint Address Spaces: I/O space Accelerator space

Summary Data Centric Systems The ability to generate, access, manage and operate on ever larger amounts and varieties of data is changing the nature of traditional technical computing - Technical Computing is evolving, the market is changing The extraction from Big Data of meaningful insights, enabling real time and predictive decision making across a range of industrial, commercial, scientific and government domains, requires similar computation techniques that have traditionally been characteristic of Technical Computing The era of Cognitive Supercomputers: Convergence in many future workflow requirements Big Data driven analytics, modeling, visualization, and simulation A time of significant disruption - Data is emerging as the critical natural resource of this century An optimized full system design and integration challenge Innovation in multiple areas, building off of an open ecosystem working with our OpenPOWER partners: System architecture and design, modular building blocks Hardware technology Software enablement Best in class resilience Development of DCS systems in close collaboration with technology providers and partners in multiple areas with focus on: Workload-driven co-design New hardware and software technologies Programming models and tools Open-source software 21