PACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD. Natasha Balac, Ph.D.



Similar documents
Hadoop on the Gordon Data Intensive Cluster

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Demystifying The Data Scientist

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

Benchmarking Cassandra on Violin

Application-Focused Flash Acceleration

U"lizing the SDSC Cloud Storage Service

A Close Look at PCI Express SSDs. Shirish Jamthe Director of System Engineering Virident Systems, Inc. August 2011

Data Centric Systems (DCS)

System Architecture. In-Memory Database

Performance Analysis of Flash Storage Devices and their Application in High Performance Computing

Can Flash help you ride the Big Data Wave? Steve Fingerhut Vice President, Marketing Enterprise Storage Solutions Corporation


Application and Micro-benchmark Performance using MVAPICH2-X on SDSC Gordon Cluster

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router

EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server

Early Cloud Experiences with the Kepler Scientific Workflow System

SLIDE 1 Previous Next Exit

With DDN Big Data Storage

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance. Alex Ho, Product Manager Innodisk Corporation

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Lab Evaluation of NetApp Hybrid Array with Flash Pool Technology

NextGen Infrastructure for Big DATA Analytics.

Deploying Flash- Accelerated Hadoop with InfiniFlash from SanDisk

Caching Mechanisms for Mobile and IOT Devices

Barry Bolding, Ph.D. VP, Cray Product Division

Sun Constellation System: The Open Petascale Computing Architecture

1 Bull, 2011 Bull Extreme Computing

Big Data in OpenTopography

A Theory of the Spatial Computational Domain

EMC XtremSF: Delivering Next Generation Performance for Oracle Database

Chapter 7. Using Hadoop Cluster and MapReduce

10- High Performance Compu5ng

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age

Analysis of VDI Storage Performance During Bootstorm

Accelerating Real Time Big Data Applications. PRESENTATION TITLE GOES HERE Bob Hansen

New Jersey Big Data Alliance

Flash at the price of disk Redefining the Economics of Storage. Kris Van Haverbeke Enterprise Marketing Manager Dell Belux

Summary: This paper examines the performance of an XtremIO All Flash array in an I/O intensive BI environment.

Panasas: High Performance Storage for the Engineering Workflow

Hadoop Cluster Applications

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012

EMC VMAX 40K: Mainframe Performance Accelerator

IT Survey Frank Dwyer Senior Director, Information Technology The Salk Institute, La Jolla, CA

Big Fast Data Hadoop acceleration with Flash. June 2013

Managing Data Center Power and Cooling

Hadoop: Embracing future hardware

Inca User-level Grid Monitoring

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Virtualization of the MS Exchange Server Environment

The Hardware Dilemma. Stephanie Best, SGI Director Big Data Marketing Ray Morcos, SGI Big Data Engineering

Challenges in e-science: Research in a Digital World

HPC data becomes Big Data. Peter Braam

BIG DATA The Petabyte Age: More Isn't Just More More Is Different wired magazine Issue 16/.07

New Storage System Solutions

Data Center Solutions

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

DataStax Enterprise, powered by Apache Cassandra (TM)

Building a Scalable Big Data Infrastructure for Dynamic Workflows

Amazon EC2 Product Details Page 1 of 5

Big Data Services at DKRZ

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

Cluster Computing at HRI

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

Essentials Guide CONSIDERATIONS FOR SELECTING ALL-FLASH STORAGE ARRAYS

How To Scale Myroster With Flash Memory From Hgst On A Flash Flash Flash Memory On A Slave Server

Big Data Mining Services and Knowledge Discovery Applications on Clouds

Transcription:

PACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD Natasha Balac, Ph.D.

Brief History of SDSC 1985-1997: NSF national supercomputer center; managed by General Atomics 1997-2007: NSF PACI program leadership center; managed by UCSD PACI: Partnerships for Advanced Computational Infrastructure 2007-2009: Internal transition to support more diversified research computing still NSF national resource provider 2009-future: Multi-constituency cyberinfrastructure (CI) center 2 provide data-intensive CI resources, services, and expertise for campus, state, and nation Approaching $1B in lifetime contract and grant activity!

Mission: Transforming Science and Society Through Cyberinfrastructure The comprehensive infrastructure needed to capitalize on dramatic advances in information technology has been termed cyberinfrastructure. 3 D. Atkins, NSF Office of Cyberinfrastructure

Today s Information Technologies Drive 21 st Century Solutions Means to an end: Cyberinfrastructure is the foundation for modern research and education Cyberinfrastructure components: Digital data Computers Wireless and wireline networks Personal digital devices Scientific instruments Storage Software Sensors People What therapies can be used to cure or control cancer? What plants work best for biofuels? Will California levees hold in a large-scale earthquake? 4 Can financial data be used to accurately predict market outcomes?

5 Simulation of SoCal Earthquake

The TeraGrid: Providing High-End Resources to Academic Researchers 11 Resource Providers (RPs) Grid Infrastructure Group (GIG) Led by UC/ANL Annual Budget RPs: ~$50M total GIG: ~$15M Compute Power ~20 compute systems >2 PF aggregate Most flops in TACC s Ranger & NICS Kraken New systems coming Additional HW Resources >4 PB disk storage Wide-area file systems >60 PB tape storage 6

7

A data-intensive supercomputer based on SSD flash memory and virtual shared memory Emphasizes MEM and IO over FLOPS A custom-designed system based on COTS to accelerate access to massive databases in all fields of science, engineering, medicine, and social science The NSF s most recent Track 2 award to SDSC 8

Gordon is designed specifically for data-intensive HPC applications Such applications involve very large data-sets or very large input-output requirements Two data-intensive application classes are important and growing Data Mining the process of extracting hidden patterns from data with the amount of data doubling every three years, data mining is becoming an increasingly important tool to transform this data into information. Wikipedia Data-Intensive Predictive Science solution of scientific problems via simulations that generate large amounts of data

1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Why Gordon? EXPONENTIAL DATA GROWTH Growth of digital data is exponential Driven by advances in sensors, networking, and storage technologies 10,000,000.00 1,000,000.00 HDD Capacity vs. Random IOPS vs. Access Density Trends 100,000.00 10,000.00 1,000.00 100.00 10.00 1.00 0.10 0.01 RED SHIFT While processor speeds have been increasing at Moore s Law DRAM and disk speeds stay almost constant System utilization is cut in half every 18 months? It takes 2x time to read your disk every 18 months? Capacity (MB) Max Random IOPS Rate Ballpark Access Density 10

Modern HPC systems have a latency gap 11

Gordon architecture fills that gap Flash O(TB) 1,000 cycles 12

High Performance Computing (HPC) vs High Performance Data (HPD) Attribute HPC HPD Key HW metric Peak FLOPS Peak IOPS Architectural features Many small-memory multicore nodes Fewer large-memory SMP nodes Typical application Numerical simulation Database query Data mining Concurrency High concurrency Low concurrency or serial Data structures Data easily partitioned e.g. grid Data not easily partitioned e.g. graph Typical disk I/O patterns Large block sequential Small block random Typical usage mode Batch process Interactive 13

SDSC Wins SC09 Data Challenge With Dash Prototype System 5.2 Tflops prototype of Gordon with 4 TB of flash New Teragrid resource that has already sped up some scientific calculations by 2 orders-of-magnitude 14

15 Exponential Growth of Research Data Protein Data Bank @ SDSC

SDSC s Triton Resource computational and data resource for UC researchers Repository for valuable research databases UCSD/UC Research Labs and Sensor Nets Data Oasis 2 4 Petabytes Large scale Storage for research data sets Administered with the UCSD Libraries PDAF (Petascale Data Analysis Facility) Designed for analysis of very large data sets x 3 2 Triton Compute Cluster Condo-style cluster starter to create expandable UCSD computing resources. Unique computer for extreme-scale data analysis Seed for UCSD s home grown supercomputer Triton Resource 16

Questions? www.sdsc.edu For further information, contact Natasha Balac (natashab@sdsc.edu)