NERSC Data Efforts Update Prabhat Data and Analytics Group Lead February 23, 2015



Similar documents
Data and Analytics Strategy Prabhat Data and Analytics Group Lead February 23, 2015

Data Analytics at NERSC. Joaquin Correa NERSC Data and Analytics Services

Workflow Tools at NERSC. Debbie Bard NERSC Data and Analytics Services

Big Data and Scientific Discovery

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell

Storage Systems: 2014 and beyond. Jason Hick! Storage Systems Group!! NERSC User Group Meeting! February 6, 2014

DNS Big Data

Trinity Advanced Technology System Overview

Wrangler: A New Generation of Data-intensive Supercomputing. Christopher Jordan, Siva Kulasekaran, Niall Gaffney

Workflow Tools at NERSC. Debbie Bard NERSC Data and Analytics Services

Ibis: Scaling Python Analy=cs on Hadoop and Impala

.nl ENTRADA. CENTR-tech 33. November 2015 Marco Davids, SIDN Labs. Klik om de s+jl te bewerken

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

Ali Ghodsi Head of PM and Engineering Databricks

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

KBase and Globus Online Nexus. Shreyas Cholia NERSC/LBL

Big Data Research in the AMPLab: BDAS and Beyond

Cloud Computing through Virtualization and HPC technologies

Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon and Lavanya Ramakrishnan Cray XE6 Training February 8, 2011

Scientific Computing Meets Big Data Technology: An Astronomy Use Case

Cloud Compu)ng. Yeow Wei CHOONG Anne LAURENT

Science Gateway Services for NERSC Users

Denis Caromel, CEO Ac.veEon. Orchestrate and Accelerate Applica.ons. Open Source Cloud Solu.ons Hybrid Cloud: Private with Burst Capacity

Archiving and Sharing Big Data Digital Repositories, Libraries, Cloud Storage

NERSC File Systems and How to Use Them

Performance of HPC Applications on the Amazon Web Services Cloud

Introduc8on to Apache Spark

Big Data and Clouds: Challenges and Opportuni5es

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

SUMMIT. November 2010

Streaming items through a cluster with Spark Streaming

Putting Eggs in Many Baskets: Data Considerations in the Cloud. Rob Futrick, CTO

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Return on Experience on Cloud Compu2ng Issues a stairway to clouds. Experts Workshop Nov. 21st, 2013

ECDF Infrastructure Refresh - Requirements Consultation Document

Berkeley Research Computing. Town Hall Meeting Savio Overview

HPC data becomes Big Data. Peter Braam

Optimizing Data Management at the Advanced Light Source with a Science DMZ

Mendel at NERSC: Multiple Workloads on a Single Linux Cluster

A GPU-Enabled HPC System for New Communities and Data Analytics

Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing

Big Data and Cloud Computing for GHRSST

1 Bull, 2011 Bull Extreme Computing

SC09 Tutorial M06 Cluster Construc5on Tutorial

XSEDE Data Analytics Use Cases

Virident HGST Leading the Flash Pla6orm Transforma:on March 2014

Parallels Solu+ons for Business Keeping IT in Control of Mac in the Enterprise. Carlos Capó Sr. Manager, Global Business Solu6ons

Scaling Spark on HPC Systems

Cray XC30 Hadoop Platform Jonathan (Bill) Sparks Howard Pritchard Martha Dumler

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Engagement Strategies for Emerging Big Data Collaborations

EVALUATION OF ISSUE TRACKING AND PROJECT MANAGEMENT TOOLS FOR USE ACROSS ALL CSIRO RADIO TELESCOPES FACILITIES

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

Big Data Challenges in Bioinformatics

Beyond Embarrassingly Parallel Big Data. William Gropp

Scaling Out With Apache Spark. DTL Meeting Slides based on

Portable, Scalable, and High-Performance I/O Forwarding on Massively Parallel Systems. Jason Cope

High Performance Big Data Analy5cs powered by Unique Web Accelera5on and NoSQL. The Big Data Engine

Kriterien für ein PetaFlop System

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

GRNET Cloud Compu7ng Services An Overview

Next-Gen Big Data Analytics using the Spark stack

Cloudian The Storage Evolution to the Cloud.. Cloudian Inc. Pre Sales Engineering

Working with HPC and HTC Apps. Abhinav Thota Research Technologies Indiana University

HPC Growing Pains. Lessons learned from building a Top500 supercomputer

Parallel Large-Scale Visualization

Architectures for massive data management

OS/Run'me and Execu'on Time Produc'vity

TLD Data Analysis. ICANN Tech Day, Dublin. October 19th 2015 Maarten Wullink, SIDN. Klik om de s+jl te bewerken

Overview of HPC Resources at Vanderbilt

Data management challenges in todays Healthcare and Life Sciences ecosystems

How To Create A Data Visualization With Apache Spark And Zeppelin

The 3 questions to ask yourself about BIG DATA

Lecture 2 Parallel Programming Platforms

Introduc)on of Pla/orm ISF. Weina Ma

Connecting Researchers, Data & HPC

An Open Dynamic Big Data Driven Applica3on System Toolkit

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

BENCHMARKING V ISUALIZATION TOOL

SPRACE Site Report. Guilherme Amadio SPRACE UNESP

Dell In-Memory Appliance for Cloudera Enterprise

The Fusion of Supercomputing and Big Data. Peter Ungaro President & CEO

A Seman(c Approach to Cloud Infrastructure Management at Freudenberg IT

Accelerating Application Performance on Virtual Machines

Cloud Compu)ng for Science. Keith R. Jackson Computa)onal Research Division

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix

Cray s Storage History and Outlook Lustre+ Jason Goodman, Cray LUG Denver

Hybrid Software Architectures for Big

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Hadoop on the Gordon Data Intensive Cluster

One-Size-Fits-All: A DBMS Idea Whose Time has Come and Gone. Michael Stonebraker December, 2008

Private Cloud Website Solu2on

Percipient StorAGe for Exascale Data Centric Computing

Intel IT s Cloud Journey. Speaker: [speaker name], Intel IT

Data Requirements from NERSC Requirements Reviews

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

Unlocking the True Value of Hadoop with Open Data Science

HPC Trends and Directions in the Earth Sciences Per Nyberg Sr. Director, Business Development

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

Transcription:

NERSC Data Efforts Update Prabhat Data and Analytics Group Lead February 23, 2015-1 -

A little bit about myself Computer Scien.st Brown, IIT Delhi Real- 3me Graphics, Virtual Reality, HCI Computa3onal Research Division Scien3fic Visualiza3on, HPC Data Management, Parallel I/O Sta3s3cs, Machine Learning, Big Data Climate Scien.st Pursuing PhD in UC Berkeley w/ Bill Collins Broadly interested in: Physics (Par3cle Physics, Astronomy) Chemistry (Material Science) Biology (Neuroscience, BioImaging) - 2 -

Data and Analytics Services (DAS) Team DAS Team Member Shreyas Cholia Yushu Yao AnneUe Greiner Joaquin Correa Burlen Loring Jeff Porter Oliver Ruebel Dani Ushizima Technology Areas Gateways, Web, Grid Databases, Analy3cs UI, Web Imaging, Machine Learning Vis Data Management Vis, Analy3cs Imaging, R R. K. Owen NIM Michael Urashka Web - 3 -

DAS Team Goal: Enable Data-Centric Science at Scale Big Data So=ware Broad ecosystem of capabili3es and technologies Research, evaluate, deploy and maintain so\ware Customize and op3mize for NERSC/HPC pla^orms Engaging NERSC Users Broad user base support 1-1 in- depth engagement - 4 -

How to get help? Documenta.on: hup://www.nersc.gov/users/so\ware/data- visualiza3on- and- analy3cs/ Rou.ne startup/troubleshoo.ng ques.ons: Trouble 3cket system In- depth 1-1 collabora.ons: e- mail prabhat@lbl.gov - 5 -

DOE Facilities are Facing a Data Deluge Genomics Climate Astronomy Physics Light Sources 6

We currently deploy separate Compute Intensive and Data Intensive Systems Compute Intensive Data Intensive Carver Genepool - 7 - PDSF

Cori: Unified architecture for HPC and Big Data 64 Cabinets of Cray XC System 50 cabinets Knights Landing manycore compute nodes 10 cabinets Haswell compute nodes for data par44on ~4 cabinets of Burst Buffer 14 external login nodes Aries Interconnect (same as on Edison) Lustre File system 28 PB capacity, 432 GB/sec peak performance NVRAM Burst Buffer for I/O accelera.on Significant Intel and Cray applica.on transi.on support Delivery in mid- 2016; installa.on in new LBNL CRT - 8 -

Popular features of a data intensive system can be supported on Cori Data Intensive Workload Need Local Disk Large memory nodes Cori Solu.on NVRAM burst buffer 128 GB/node on Haswell; Op3on to purchase fat (1TB) login node Massive serial jobs Complex workflows Communicate with databases from compute nodes Stream Data from observa3onal facili3es Easy to customize environment Policy Flexibility NERSC serial queue prototype on Edison; MAMU More (14) external login nodes; CCM mode for now Proposed Compute Gateway Node COE Proposed Compute Gateway Node COE Proposed User Defined Images COE Improvements coming with Cori: Rolling upgrades, CCM, MAMU, above COEs would also contribute - 9 -

Big Data Software Portfolio Capabili.es Technology Areas Tools, Libraries Data Transfer + Access Globus, Grid Stack, Authen3ca3on Portals, Gateways, RESTful APIs Globus Online, Grid FTP Drupal/Django, NEWT Data Processing Workflows Swi\, Fireworks, Data Management Formats, Models Databases Storage, I/O, Movement HDF5, NetCDF SRM Data Analy3cs Sta3s3cs, Machine Learning python, R, ROOT Data Visualiza3on Backend Infrastructure Imaging SciVis InfoVis Analy3cs Stack Databases Virtualiza3on - 10 - OMERO, Fiji, VisIt, Paraview BDAS SciDB, MySQL, PostgreSQL, MongoDB Docker

Analytics Software Strategy Science Applica3ons Climate, Cosmology, Kbase, Materials, BioImaging, Analy3cs Capabili3es Sta3s3cs, Machine Learning Image Processing Graph Analy3cs Database Opera3ons Tools + Libraries R, python, MLBase MATLAB GraphX SQL Run3me Framework MPI Spark SciDB, PostGres Resource Management Filesystems (Lustre), Batch/Queue Systems Hardware SandyBridge/KNL chipset, Burst Buffers, Aries Interconnect

Top NERSC Production Workflows Advanced Light Source SPOT suite Real 3me reconstruc3on, experimental steering Materials Project Cosmology Supernovae/Transient classifica.on pipeline JGI (Genome Sequencing, Assembly) - 12 -

Concluding Remarks Vision: NERSC is the hub for Scien.fic Big Data Hardware So\ware, Services People (staff, vendors, researchers) We welcome your feedback! - 13 -

Ques3ons? Contact: prabhat@lbl.gov - 14 -