The Fusion of Supercomputing and Big Data. Peter Ungaro President & CEO



Similar documents
The Fusion of Supercomputing and Big Data: The Role of Global Memory Architectures in Future Large Scale Data Analytics

HPC Trends and Directions in the Earth Sciences Per Nyberg Sr. Director, Business Development

Cray: Enabling Real-Time Discovery in Big Data

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

SW and HW System Architecture For Big Data CHPC 2013 Conference Cape Town, SA

Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy?

Data Centric Systems (DCS)

HPC and Large Scale Data Analytics. SOS17 Conference Jekyll Island, Georgia

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Digital Catapult. The impact of Big Data in a Connected Digital Economy Future of Healthcare. Mark Wall Big Data & Analytics Leader.

With DDN Big Data Storage

Hur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER

The Future of Data Management

Oracle Big Data SQL Technical Update

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Six Days in the Network Security Trenches at SC14. A Cray Graph Analytics Case Study

Building a Scalable Big Data Infrastructure for Dynamic Workflows

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

From Data to Foresight:

IBM System x reference architecture solutions for big data

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Dell* In-Memory Appliance for Cloudera* Enterprise

urika! Unlocking the Power of Big Data at PSC

REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT

Deploying Big Data to the Cloud: Roadmap for Success

BUILDING A SCALABLE BIG DATA INFRASTRUCTURE FOR DYNAMIC WORKFLOWS

Unlocking the Intelligence in. Big Data. Ron Kasabian General Manager Big Data Solutions Intel Corporation

BSC vision on Big Data and extreme scale computing

Storage, Cloud, Web 2.0, Big Data Driving Growth

How to Leverage Big Data in the Cloud to Gain Competitive Advantage

HPC & Big Data THE TIME HAS COME FOR A SCALABLE FRAMEWORK

IBM Data Warehousing and Analytics Portfolio Summary

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Teradata s Big Data Technology Strategy & Roadmap

HadoopTM Analytics DDN

Luncheon Webinar Series May 13, 2013

ANY SURVEILLANCE, ANYWHERE, ANYTIME

1 Performance Moves to the Forefront for Data Warehouse Initiatives. 2 Real-Time Data Gets Real

Technical White Paper. October Real-Time Discovery in Big Data Using the Urika-GD. Appliance G OVERN M ENT.

How To Handle Big Data With A Data Scientist

locuz.com Big Data Services

Big Workflow: More than Just Intelligent Workload Management for Big Data

InfiniBand Update Addressing new I/O challenges in HPC, Cloud, and Web 2.0 infrastructures. Brian Sparks IBTA Marketing Working Group Co-Chair

Introduction to urika. Multithreading. urika Appliance. SPARQL Database. Use Cases

ANY THREAT, ANYWHERE, ANYTIME Scalable.Infrastructure.to.Enable.the.Warfi.ghter

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

PACE Predictive Analytics Center of San Diego Supercomputer Center, UCSD. Natasha Balac, Ph.D.

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory

Bricata Next Generation Intrusion Prevention System A New, Evolved Breed of Threat Mitigation

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Extend your analytic capabilities with SAP Predictive Analysis

How To Make Data Streaming A Real Time Intelligence

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

GC3 Use cases for the Cloud

SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON

Intel Xeon Processor E5-2600

Security Monitoring and Enforcement for the Cloud Model

Percipient StorAGe for Exascale Data Centric Computing

Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches

Big Data Processing: Past, Present and Future

Modernizing Hadoop Architecture for Superior Scalability, Efficiency & Productive Throughput. ddn.com

Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division

The Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Using an In-Memory Data Grid for Near Real-Time Data Analysis

Intel Platform and Big Data: Making big data work for you.

ONE platform for ALL YOUR DATA Radim Petrzela February 26 th, 2013

The Future of Data Management with Hadoop and the Enterprise Data Hub

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Advanced Metering Information Systems

Oracle Big Data Building A Big Data Management System

90% of your Big Data problem isn t Big Data.

Interactive data analytics drive insights

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

Using In-Memory Computing to Simplify Big Data Analytics

Parallel Computing. Benson Muite. benson.

Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

A Brief Introduction to Apache Tez

Integrating a Big Data Platform into Government:

A New Era Of Analytic

Big Data simplified. SAPSA Impuls, Stockholm Martin Faiss & Niklas Packendorff, SAP

Big Data and Your Data Warehouse Philip Russom

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Seagate HPC /Big Data Business Tech Talk. December 2014

Big Data Performance Growth on the Rise

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

How To Create A Data Science System

Big Data - Infrastructure Considerations

Dell s SAP HANA Appliance

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

Protecting Big Data Data Protection Solutions for the Business Data Lake

SMB Direct for SQL Server and Private Cloud

Data management challenges in todays Healthcare and Life Sciences ecosystems

YarcData urika Technical White Paper

Transcription:

The Fusion of Supercomputing and Big Data Peter Ungaro President & CEO

The Supercomputing Company Supercomputing Big Data

Because some great things never change

One other thing that hasn t changed. Cray s commitment to innovation: R&D is the core of our company & drives our different offerings

The first wave of big data

The current wave of Big Data.

Cray then Cray now and there is more to come.

Cray s Vision: The Fusion of Supercomputing and Big & Fast Data Modeling The World Cray Supercomputers solving grand challenges in science, engineering and analytics Data Models Integration of datasets and math models for search, analysis, predictive modeling and knowledge discovery Data- Intensive Processing High throughput event processing & data capture from sensors, data feeds and instruments Math Models Modeling and simulation augmented with data to provide the highest fidelity virtual reality results Compute Store Analyze

Data-Intensive Processing is driving the need for advanced architectures

Big Data s Methods for Analysis Take all these different data sources and put them together and then help me find something about the data that I don t already know

Solutions for Advanced Analytics Data Warehouses +Extensions NoSQL Databases Big Data Solutions Hadoop / MapReduce Graph Analytics These solutions can overlap, but also can be very complementary as each has strengths & weaknesses

System Architecture Differences Supercomputing Scalable computing w/high BW, low-latency, Global Mem Architectures Highly integrated processor-memory-interconnect & network storage Minimize data movement load the mesh into memory Move data for loading, check-pointing or archiving Basketball court sized systems Large-scale Data Analytics Distributed computing at largest scale Divide-and-conquer approaches on Service Orientated Architectures Maximize data movement-- Scan/Sort/Stream all the data all the time Lowest cost processor-memory-interconnect & local storage Warehouse sized clouds

Applications Differences Computational requirements far outweighed by requirements of memory hierarchy Working data set doesn t fit in memory (Capacity limited) Floating point or integer unit stalled for data access (Bandwidth limited) Highly irregular, cache-unfriendly, data access (Latency limited) On a given platform, applications are either Memory-bound (capacity, bandwidth, latency) Interconnect-bound (bandwidth, latency) I/O-bound (capacity, bandwidth, latency) Mirror opposite of compute-bound problems FLOPS matter less no easy path to higher peak computing rates Infrastructure requirements are quite different Must have specialized nodes and adaptive runtimes to handle differences

Can t The Cloud Do This?

Big Data Fast Data Global Memory + Fast Interconnects Fast Data Supercomputers & Analytic Appliances SAN Interconnects Big Data CLOUD LAN/WAN Interconnects Public & Private Clouds Enterprise Data (structured) GRID

Extending Adaptive Supercomputing to Big Data Workloads

Big Data Appliance for Real-Time Data Discovery Supervisor Intervention CSR Resolution Call Escalations AVR Failure Cabinet Failure Work Orders 3 rd Party Service Tech Truck Rolls Web Service Call Center Events Set-Top Box feeds Customers Residential Accounts Detecting Find Analyze new customer fraud cyber patterns threats churn Discover new drug re-purposing opportunities

Drug Discovery: Finding New Treatments for Cancer Goal Data sets Technical Challenges Use a Systems Biology approach to discover new Cancer treatments by understanding cancer formation & growth at a molecular level TCGA (The Cancer Genome Atlas), DrugBank, Pathway Commons, Human Protein Reference Database, Domine, Volume and velocity of data; probabilistic and temporal inter-relationships; extremely complex, ad-hoc queries frequently requiring the addition of new data sources Users Research Scientists Usage model In-silico analysis to identify biological pathways leading to formation and growth of cancer at a molecular level; identify compounds known to inhibit those mechanisms as possible drug candidates Augmenting Existing data warehouse, Statistical Analytics

Cycle of Discovery in Cancer Research Load or Augment Required Data from many diverse data sources Others Validate Hypothesis against entire dataset 5.3 billion relationships Formulate Partial Hypothesis Statistical analysis used to assist In the amount of time it takes to validate one hypothesis, we can now validate 1,000 hypotheses increasing our success rate significantly.

Flexibility in Workflow Hypotheses can be composed in SPARQL or using a Web based query composer GUI Results can be viewed in a number of ways

An Urika Moment Repurpose HIV drug for Cancer TP53 is frequently mutated in most tumor types ABCG2, also known as Breast Cancer Resistance Protein (BCRP), is associated with TP53 mutation in TCGA breast cancer data Nelfinavir, an HIV protease inhibitor, also binds ABCG2 and many other proteins High-throughput cell line screening of breast cancer cells recently identified Nelfinavir as a selective inhibitor. It can be brought to HER2-breast cancer treatment trials with the same dosage regimen as that used among HIV patients. [Shim et al. JNCI 2012] They discovered this in just 6 weeks!

Purpose-Built, Turnkey Hadoop Solutions Best Hadoop Distribution Performance of a Cray Turnkey Solution High Value Hadoop Security Comprehensive, and fast, encryption Proven HPC Cray technology & expertise Reliable Rapid ROI runs asadvertised Performance Power to accommodate current & future goals Performance Faster Hive, Cache acceleration, etc. Vast Scale Grow to meet any mission requirements Support One throat to choke, for the whole stack Reliability Will meet any challenge, without surprises Management Intel Manager for Hadoop Software Holistic Design Balanced compute, networking & storage Maintenance Update & evolve, without concerns Maintenance Easy to maintain & accommodate change

Adapting to Data-Intensive Computing: Adding Value at the Edge of the Network Data Processors x86 Many-core GPU Multi-threaded Adaptive Supercomputing Compilers, Auto-tuning Operating System High throughput scheduling, Adaptive Runtime, Network Memory Pools High Performance Memory DRAM Active NVRAM HMC The most capable interconnects will be key to analytic workloads Storage I/O Network I/O Storage SAS to PCIe Tightly coupled Software Stack for RAID Very Low Latency Intelligent I/O Protocol Load Balancing Specialized Data Ingest Policy Based Data Movement Software Defined Network

Integrated HPC Environments are the capability that will turn data in to insight and discovery Advanced Analytics Appliances Storage & Data Management Supercomputers

Our Cray s Vision: Roadmap Exascale++ Fusion Data Ingest (Cloud) Simulation Analytics Multiple Roadmap Steps: Combine workflows & data into a single system Aggressive local and global memory capabilities Tightly integrated software stack (& runtimes) across all 3 capabilities

Our Vision Build a world-class integrated supercomputing environment that enables transformational computing across a broad set of science, engineering and advanced analytics (big data) applications

11/27/2013 28