New Jersey Big Data Alliance

Similar documents
CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21)

Addressing Big Data Challenges in Simulation-based Science

Exploring Software Defined Federated Infrastructures for Science

Big Data Challenges in Bioinformatics

Special Issue on Advances of Utility and Cloud Computing Technologies and Services

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

globus online The Research Cloud Steve Tuecke Computation Institute University of Chicago and Argonne National Laboratory

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)

PACE Predictive Analytics Center of San Diego Supercomputer Center, UCSD. Natasha Balac, Ph.D.

The Hartree Centre helps businesses unlock the potential of HPC

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

The National Consortium for Data Science (NCDS)

Big Data Performance Growth on the Rise

SECURE AND TRUSTWORTHY CYBERSPACE (SaTC)

Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router

Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber

CS 698: Special Topics in Big Data. Chapter 2. Computing Trends for Big Data

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India

Big Data. George O. Strawn NITRD

Overview of HPC Resources at Vanderbilt

Datacenters and Cloud Computing. Jia Rao Assistant Professor in CS

Big Data R&D Initiative

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21) $100,070,000 -$32,350,000 / %

The HP IT Transformation Story


Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Predictive Analytics for Demand Forecasting and Planning Managers A Big Data Challenge Hans Levenbach, Delphus, Inc.

Distributed Big Data and Analytics (DBDA)

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

Hur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER

A 5 Year Total Cost of Ownership Study on the Economics of Cloud Storage

Next Generation Operating Systems

Data Centric Computing Revisited

HPC-related R&D in 863 Program

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

Trends in High-Performance Computing for Power Grid Applications

Grid Computing Vs. Cloud Computing

From Distributed Computing to Distributed Artificial Intelligence

EMC ACADEMIC ALLIANCE

RevoScaleR Speed and Scalability

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

Data Centric Systems (DCS)

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India

Lakeside. Lakeside Software and IBM: Statement of Fact. SysTrack. A Lakeside Software White Paper November 2011

Computing at the HL-LHC

Center for Dynamic Data Analytics (CDDA) An NSF Supported Industry / University Cooperative Research Center (I/UCRC) Vision and Mission

HPC & Big Data THE TIME HAS COME FOR A SCALABLE FRAMEWORK

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms

Development of a Computational and Data-Enabled Science and Engineering Ph.D. Program

Brochure. Update your Windows. HP Technology Services for Microsoft Windows 2003 End of Support (EOS) and Microsoft Migrations

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

A Big Data Workshop introducing a smarter decision methodology for your organisation through advanced analytics and career opportunity

Building Platform as a Service for Scientific Applications

The Open Cloud Near-Term Infrastructure Trends in Cloud Computing

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

How to leverage SAP HANA for fast ROI and business advantage 5 STEPS. to success. with SAP HANA. Unleashing the value of HANA

Data management challenges in todays Healthcare and Life Sciences ecosystems

Lecture 2 Cloud Computing & Virtualization. Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Case Study. Performance Testing of Medical Association Builder Portal. Case Study. US-based Non-profit Medical Association (Healthcare)

NASA's Strategy and Activities in Server Side Analytics

Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak

National Big Data R&D Initiative

Simple. Extensible. Open.

Overview: X5 Generation Database Machines

Cloud Computing Paradigm

Microsoft Hyper-V chose a Primary Server Virtualization Platform

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

Big Data Challenges in Simulation-based Science

How To Build A Cloud Computer

Transcription:

Rutgers Discovery Informatics Institute (RDI 2 ) New Jersey s Center for Advanced Computation New Jersey Big Data Alliance Manish Parashar Director, Rutgers Discovery Informatics Institute (RDI 2 ) Professor, Department of Electrical and Computer Engineering December 05, 2012 Modern Science & Society Transformed by Compute & Data New Paradigms & Practices End-to-end: Seamless access, aggregation, interactions Integrative, multi-scale, online Data-driven, Data/Computeintensive Multi-disciplinary/scale collaborations Individuals, groups, teams, communities, networks Unprecedented opportunities, challenges 1

12/12/12 Many Challenges Computing Multicore; large and increasing core counts, deep memory hierarchies New prgm. model, concerns (fault tolerance, energy, etc) New models & technologies: Clouds, grids, hybrid manycore, accelerators, deep storage hierarchies, Data Generating more data than in all of human history: preserve, mine, share? How do we create data scientists/engineers? Software Complex applications on coupled compute-data-networked environments, tools needed Modern apps: 106+ lines, many groups contribute, take decades People Multidisciplinary expertise essential! Appropriate academic program, career tracks 3 Advanced Computing Infrastructure Large scale, distributed, heterogeneous, multicore/manycore, accelerators, deep storage hierarchies, experimental systems. Titan - Cray XK7 27 PF / 56 K cores 16-core CPU + GPU Gemini 3D torus 710 TB memory Sequoia IBM BG/Q 20 PF / 1.6 M cores 18-core processor 5D torus 1.5PB memory Worldwide LHC Computing Grid >140 sites; ~250k cores; ~100 PB disk XSEDE Worlds Largest Grid 11 Resource Providers Modern Datacenters 1M servers 50-100 MW Special Purpose HW (Anton) > 100 time acceleration of MD simulations 2

Data Crisis: Information Big Bang Data generation == 4 x Moore s Law Data Crisis: Information Big Bang NSB Report: Long-Lived Digital Data Collections Enabling Research and Education in the 21st Century PCAST Digital Data Industry Wired, Nature Storage Networking Industry Association (SNIA) 100 Year Archive Requirements Survey Report NSF Experts Study there is a pending crisis in archiving we have to create long-term methods for preserving information, for making it available for analysis in the future. 80% respondents: >50 yrs; 68% > 100 yrs Data generation == 4 x Moore s Law 3

12/12/12 Ack. A Blatecky Big Data in Science & Engineering: Modern network/instruments/ experiments/ Large Hadron Collider Blanco 4m on Cerro Tololo SKA project Image credit: Valerio Mezzanotti for The New York Times Image credit: Roger Smith/NOAO/AURA/NSF Above is proposed image 4

Exploding Data Volumes in Lifescience x10 7 in 14 years Figure credit: Ian Foster, ANL The New Big-Data/Big-Compute Reality Economic advantages depend on the data you have and your ability to transform that data into meaningful and timely insights Industries nimble enough to interpret & use the data in new ways to add value will be the leaders Traditional decision-making structures must be adapted to incorporate data scientists in business and research Integrate compute and data into reasoning/problem solving processes Need to move away from our fixation on data size - data quality & the ability to analyze it are more important Working harder is not enough. We need to work smarter! 5

12/12/12 Rutgers Discovery Informatics Institute: RDI2 Ø Fundamentally integrated research, education, ACI and industry partnerships to address core CDS&E / BigData challenges Ø Broaden access to state- of- the- art computing technology; integrate multidisciplinary research with ACI and industry partnerships Ø Enable large- scale data analytics, computational modeling, and simulations, all of which are playing an increasingly important role in both academic and commercial research and innovation. Ø Only university- based advanced computation center in NJ, and one of about ten in US, with an industry partnership program Rutgers University, IBM Open Supercomputer Center By Heather Haddon 3/26/12 Rutgers University and Interna1onal Business Machines Corp. (IBM +1.11%) will cut the ribbon Tuesday on a technology center in New Jersey that houses a $3.3 million supercomputer stacks of processors that can digest massive quan11es of data in a frac1on of the 1me that a desktop unit would take. Named "IBM Blue Gene/P," the machine, about the size of two refrigerators, will be one of the most powerful computers in the Northeast, with thousands of central processing units, or CPUs. IBM hopes in the coming year it will make the pres1gious "TOP 500" list of the world's most powerful computers, determined by a group of academic 6

Key Programmatic Areas Research Provide Rutgers researchers access to computational resources and technical expertise necessary to increase accuracy and scale of their research Promote interdisciplinary collaborations to increase grant competitiveness Advanced Computing Infrastructure Deploy a balanced cyber infrastructure composed of advanced data and compute and communication capabilities Experimental platforms (e.g., accelerators, novel storage and network technologies) Expertise (system and application facing) Education and Training Variety of education and training programs for faculty, students and industry Masters degrees, certificates, technical modules, industry-specific workshops Industry Engagement and Economic Development RDI 2 s Industry Partnership Program will assist private firms in overcoming the cost and knowledge barriers associated with advanced computation RDI 2 will promote economic development by attracting new firms to New Jersey and encouraging existing firms to stay in-state RDI 2 Advanced Computing Infrastructure The goal of the RDI@ ACI is to deploy a balanced cyberinfrastructure composed of advanced data and compute and communication capabilities that can drive research, innovation, and economic development Ø Phase I Excalibur IBM Blue Gene /P Supercomputer Ø 2,000 Nodes, 8,000 Cores Ø 24 Terabytes of RAM Ø 300 400 Terabytes of memory Ø Result of an Rutgers-IBM partnerships Ø Phase II Plans: Ø Acquisition of a 10 12 petabyte storage container with co-located analytics Ø Upgrade to Blue Gene/Q (or equivalent) Ø Connectivity to national cyberinfrastructure Ø Phase III Plans: Ø Deploy one of the top academic supercomputers in the world 7

RDI 2 s Industry Partnership Programs NSF Cloud and Autonomic Computing (CAC) Center NSF Center for Dynamic Data Analytics (CDDA) CAC/CDDA are multidisciplinary NSF centers of excellence in cloud and autonomic computing research Foster long-term collaborative partnerships among industry, academia, and government Have a well-established Industry Partnership Program and many industry partners RDI 2 s Industry Membership Program Ø Tiered membership structure provides industry members with a range of benefits and resources including, but are not limited to: Ø Access to leading experts in ACI and its applications Ø Access to limited-access technology reports Ø Access to top graduate students Ø First right to negotiate commercial licenses on certain technologies Ø Invitation to participate in bi-annual center-wide meetings for program evaluation 8

RDI 2 Tiered Membership Model Level 1 Advisory Board Member Level 2 Advisory Board Member Level 3 Associate Level 4 Associate Steering CommiJee Member Yes No No No Advisory Board Member Yes 2 seats Yes 1 seat No No Biannual Meetings Yes Yes Yes Yes First right to negotiate commercial licenses Yes Yes Yes Royalty- free access to institute technology Yes Yes Yes Yes, but only in industry thrust Yes, but only in industry thrust Access to ACI Experts Yes Yes Yes Yes Membership Fee 100K/yr cash only 50K/yr Cash/In- kind 30K/yr Cash/In- 20 kind 15K/yr Cash/In- kind Membership: A Pathway to Further Collaboration Collaborative Research Agreements RDI 2 Industry Membership Sponsored Research Agreements Service Agreements Student Recruitment Opportunities 21 9

Summary Large scale compute and data are revolutionary science, engineering and society Advanced Computing Infrastructure + Big Data New opportunities and challenges Requires new thinking in research and education practices Multidisciplinary collaborations are critical! Rutgers Discovery Informatics Institute (RDI 2 ) Integrating multidisciplinary research, ACI and education @ Rutgers http://rdi2.rutgers.edu/ Industry Partnership Programs NSF Cloud and Autonomic Computing Center (CAC) http:// nsfcac.rutgers.edu/ Some upcoming event (Spring 2013) NJ Big Data Alliance NJ Virtual Computational Institute Thank You! Manish Parashar, Ph.D. Prof., Dept. of Electrical & Computer Engr. Rutgers Discovery Informatics Institute (RDI 2 ) Cloud & Autonomic Computing Center (CAC) Rutgers, The State University of New Jersey Email: parashar@rutgers.edu WWW: http://rdi2.rutgers.edu / http://parashar.rutgers.edu 10