IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph ejoseph@idc.com Steve Conway sconway@idc.



Similar documents
High-Performance Data Analysis: HPC Meets Big Data

HPC Trends, Big Data And The Emerging Market For High Performance Data Analysis (HPDA)

HPC Market Update, HPC Trends In the Oil/Gas Sector and IDC's Top 10 Predictions for Earl Joseph, HPC Program Vice President

HPC & Big Data THE TIME HAS COME FOR A SCALABLE FRAMEWORK

How does Big Data disrupt the technology ecosystem of the public cloud?

IDC HPC Update at ISC 14. Earl Joseph Steve Conway Chirag Dekate Lloyd Cohen

Big Data jako součást našeho života. Zdenek Panec: June, 2015

The Fusion of Supercomputing and Big Data. Peter Ungaro President & CEO

Big Workflow: More than Just Intelligent Workload Management for Big Data

Data Centric Systems (DCS)

A New Era Of Analytic

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

BUILDING A SCALABLE BIG DATA INFRASTRUCTURE FOR DYNAMIC WORKFLOWS

IDC s Top 10 HPC Market Predictions for 2010

BIG DATA-AS-A-SERVICE

Make the Most of Big Data to Drive Innovation Through Reseach

Strategic Decisions Supported by SAP Big Data Solutions. Angélica Bedoya / Strategic Solutions GTM Mar /2014

Seagate HPC /Big Data Business Tech Talk. December 2014

Building a Scalable Big Data Infrastructure for Dynamic Workflows

FUJITSU Integrated System PRIMEFLEX for HPC

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

High-Performance Analytics

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Regional Vision and Strategy The European HPC Strategy

Big Data. George O. Strawn NITRD

White Paper: SAS and Apache Hadoop For Government. Inside: Unlocking Higher Value From Business Analytics to Further the Mission

Cisco, Big Data and the Internet of Everything. Paul Davies, Big Data Sales Solution Leader, EMEAR Data Center

Here comes the flood Tools for Big Data analytics. Guy Chesnot -June, 2012

With DDN Big Data Storage

Big Data. Fast Forward. Putting data to productive use

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Scaling from Workstation to Cluster for Compute-Intensive Applications

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers

Working Together to Promote Business Innovations with Grid Computing

THE REAL-TIME OPERATIONAL VALUE OF BIG DATA MATT DAVIES

I D C T E C H N O L O G Y S P O T L I G H T. B i g D a t a a n d E C M : Making Smarter Decisions

SAP Makes Big Data Real Real Time. Real Results.

Big Data lisää älyä tiedosta

NITRD and Big Data. George O. Strawn NITRD

APPROACHABLE ANALYTICS MAKING SENSE OF DATA

BMW11: Dealing with the Massive Data Generated by Many-Core Systems. Dr Don Grice IBM Corporation

Accelerating Innovation

PRIMERGY server-based High Performance Computing solutions

Data Centric Computing Revisited

Dr. John E. Kelly III Senior Vice President, Director of Research. Differentiating IBM: Research

Big Data Analytics: Today's Gold Rush November 20, 2013

Build Your Competitive Edge in Big Data with Cisco. Rick Speyer Senior Global Marketing Manager Big Data Cisco Systems 6/25/2015

Modern Financial Markets and Data Intensive Science: Leveraging 35 Years of Federal Research

Big Data Challenges in Bioinformatics

White Paper. Version 1.2 May 2015 RAID Incorporated

Innovation that Matters an IBM story

The Future of Data Management

Personalized Medicine and IT

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

How Big Data is Different

Big Data Technologies Compared June 2014

Hybrid Cluster Management: Reducing Stress, increasing productivity and preparing for the future

PDF PREVIEW EMERGING TECHNOLOGIES. Applying Technologies for Social Media Data Analysis

W o r l d w i d e B u s i n e s s A n a l y t i c s S o f t w a r e F o r e c a s t a n d V e n d o r S h a r e s

Busin i ess I n I t n e t ll l i l g i e g nce c T r T e r nds For 2013

Hur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER

PBS Works: The Trusted Suite for HPC Workload Management

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

From Petabytes (10 15 ) to Yottabytes (10 24 ), & Before 2020! Steve Barber, CEO June 27, 2012

Are You Ready for Big Data?

GRID COMPUTING: A NEW DIMENSION OF THE INTERNET

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010

I D C V E N D O R S P O T L I G H T. S t o r a g e Ar c h i t e c t u r e t o Better Manage B i g D a t a C hallenges

Scale-out NAS Unifies the Technical Enterprise

HPC Software in 2014 September Christopher G. Willard, Ph.D., Chief Research Officer

Top 10 Automotive Manufacturer Makes the Business Case for OpenStack

Big Data Executive Survey

The data explosion is transforming science

Government Technology Trends to Watch in 2014: Big Data

Optimized Hadoop for Enterprise

Technology and Trends for Smarter Business Analytics

Enterprise Data Integration

VIEWPOINT. High Performance Analytics. Industry Context and Trends

Introduction to Red Hat Storage. January, 2012

Global Headquarters: 5 Speen Street Framingham, MA USA P F

SGI HPC Systems Help Fuel Manufacturing Rebirth

Create and Drive Big Data Success Don t Get Left Behind

The Power of Pentaho and Hadoop in Action. Demonstrating MapReduce Performance at Scale

I D C T E C H N O L O G Y S P O T L I G H T

How To Handle Big Data With A Data Scientist

Leveraging Information For Smarter Business Outcomes With IBM Information Management Software

HPC on AWS. Hiroshi Kobayashi, Dev./Lab. IT System HGST Japan, Ltd. Jun 3, 2015

HPC Trends and Directions in the Earth Sciences Per Nyberg Sr. Director, Business Development

From Big Data to Smart Data Thomas Hahn

A Hurwitz white paper. Inventing the Future. Judith Hurwitz President and CEO. Sponsored by Hitachi

Accelerating Innovation with Self- Service HPC

Integrating a Big Data Platform into Government:

DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY

Are You Ready for Big Data?

Big Data at Cloud Scale

Information Visualization WS 2013/14 11 Visual Analytics

Transcription:

IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph ejoseph@idc.com Steve Conway sconway@idc.com Chirag Dekate cdekate@idc.com Bob Sorensen bsorensen@idc.com

IDC Has Over 1,000 Analysts In 62 Countries

Agenda A Short HPC Market Update Big Data Challenges and Short Comings The High End of Big Data Examples of Very Large Big Data Examples of How Big Data is Redefining High Performance Computing

HPC Market Update

What Is HPC? IDC uses these terms to cover all technical servers used by scientists, engineers, financial analysts and others: HPC HPTC Technical Servers Highly computational servers HPC covers all servers that are used for computational or data intensive tasks From a $5,000 deskside server up to over $550 million dollar supercomputer

Top Trends in HPC 2013 declined overall by $800 million For a total of $10.3 billion Mainly due to a few very large systems sales in 2012 that weren t repeated in 2013 We expect growth in 2015 to 2018 Software issues continue to grow The worldwide Petascale Race is at full speed GPUs and accelerators are hot new technologies Big data combined with HPC is creating new solutions in new areas

IDC HPC Competitive Segments: 2013 HPC Servers $10.3B Supercomputers (Over $500K) $4.0B Workgroup (under $100K) $1.6B Divisional ($250K - $500K) $1.4B Departmental ($250K - $100K) $3.4B

HPC WW Market Trends: A 17 Year Perspective C

HPC Market Forecasts

HPC Forecasts Forecasting a 7.4% yearly growth from 2013 to 2018 2018 should reach $14.7 billion

HPC Forecasts: By Industry/Applications

The Broader HPC Market

Big Data Challenges And Shortcomings

Defining Big Data

HPDA Market Drivers More input data (ingestion) More powerful scientific instruments/sensor networks More transactions/higher scrutiny (fraud, terrorism) More output data for integration/analysis More powerful computers More realism More iterations in available time The need to pose more intelligent questions Smarter mathematical models and algorithms Real time, near-real time requirements Catch fraud before it hits credit cards Catch terrorists before they strike Diagnose patients before they leave the office Provide insurance quotes before callers leave the phone

Top Drivers For Implementing Big Data

Organizational Challenges With Big Data: Government Compared To All Others

Big Data Meets HPC And Advanced Simulation

High Performance Data Analysis Needs HPC resources High complexity (algorithms) High time-criticality High variability (On premise or in cloud) Simulation & analytics Search, pattern discovery Iterative methods Established HPC users + new commercial users Data of all kinds The 4 V s: volume, variety, velocity, value Structured, unstructured Partitionable, non-partitionable Regular, irregular patterns

HPC Adoption Timeline (Examples) 1960 1970 1980 1990 2000 2012

Very Large Big Data Examples

NASA

Square Kilometre Arrary Radio Astronomy for Astrophysics

CERN LHC: the world s leading accelerator -- Multiple Nobel Prizes for particle physics work Innovation driven by the need to distribute massive data sets and the accompanying applications Altas, one of CERN s two detectors, generates 1PB of data per second when running! (Not all of this is distributed). Private cloud distribution to scientists in 20 EU member states plus observer states (single largest user is the U.S.) Today: only.0000005% of the data is used

NOAA

HPC Will Be Used More for Managing Mega-IT Infrastructures

Examples of Big Data Redefining HPC

Use Case: PayPal Fraud Detection The Problem Finding suspicious patterns that we don t even know exist in related data sets

What Kind of Volume? PayPal s Data Volumes And HPDA Requirements

Where Paypal Used HPC

The Results $710 million saved in fraud that they wouldn t have been able to detect before (in the first year)

There Are New Technologies That Will Likely Cause A Mass Explosion In Data Requiring HPDA Solutions

GEICO: Real-Time Insurance Quotes Problem: Need accurate automated phone quotes in 100ms. They couldn t do these calculations nearly fast enough on the fly. Solution: Each weekend, use a new HPC cluster to pre-calculate quotes for every American adult and household (60 hour run time)

Something To Think About -- GEICO: Changing The Way One Approaches Solving a Problem Instead of processing each event one-at-a-time, process it for everyone on a regular basis It can be dramatically cheaper, faster and offers additional ways to be more accurate But most of all it can create new and more powerful capabilities Examples: For home loan applications calculate for every adult in the US and every home in the US For health insurance fraud track every procedure done on every US person by every doctor and find patterns

Something To Think About -- GEICO: Changing The Way One Approaches Solving a Problem Future Examples (continued): If you add-in large scale data collection via sensors like GPS, drones and RFID tags: New car insurance rules The insurance company doesn t have to pay if you break the law -- like speeding and having an accident You could track every car at all times then charge $2 to see where the in-laws are in traffic if they are late for a wedding Google maps could show in real-time where every letter and package is located But crooks could also use it in many ways e.g. watching ATM machines, looking for when guards are on break,

U.S. Postal Service

U.S. Postal Service

U.S. Postal Service MCDB = memory-centric database

CMS: Government Health Care Fraud 5 separate databases for the big USG health care programs under Centers for Medicare and Medicaid Services (CMS) Estimated fraud: $150B-$450B <$5B caught today) ORNL, SDSC have evaluation contracts to unify the databases and perform fraud detection on various architectures

Schrödinger: Cloud-based Lead Discovery for Drug Design NOVARTIS/SCHROEDINGER: Pharmaceutical company Novartis increased resolution of drug discovery algorithm 10x and wanted to use it to test 21 million small molecules as drug candidates Novartis used the Schroedinger drug discovery app in AWS public cloud, with the help of Cycle Computing Initial run used 51,000 AWS cores and took $14,000 and <4 hours and its getting cheaper Later run used 156,000 AWS cores with comparable costs and time

Schrödinger: Cloud-based Lead Discovery for Drug Design

Global Financial Services: Company X One of the most respected firms in the global financial services industry updates detailed information daily on several million companies around the world. Clients use the firm's credit ratings and other company information in making lending decisions and for other planning, marketing, and business decision making. The firm uses statistical models to develop a company's scores and ratings, and for years, the ratings have been prepared and analyzed locally in near real time by the firm's personnel around the world. This practice is a major competitive advantage but resulted in the creation of hundreds of distinct databases and more than a dozen scoring environments. Several years ago, the company established a goal of centralizing these resources and chose SAS as the centralization mechanism, including SAS Grid Manager as part of the software stack.

Global Financial Services: Company X The centralized IT infrastructure created using SAS preserves the advantages of the company's locally created ratings and reports. The new infrastructure provides an effective environment for analytics development and accommodates multiple testing, debt, and production environments in a single stack. It is flexible enough to allow dynamic prioritization among these environments, according to a company executive. With help from SAS Grid Manager, the company can maximize the use of its computing resources. The software automatically assigns jobs to server nodes with available capacity, instead of having users wait in queue for time on fully utilized nodes. The company executive estimates that it might cost 30% more to purchase servers with enough capacity to handle these peak workloads on their own.

Global Financial Services: Company X Several million clients use the firm s credit ratings to help make lending decisions Goal: increase efficiency for updating ratings Result: HPC multi-cluster grid boosted efficiency 30% -- no need to buy additional clusters yet.

Real Estate Worldwide vacation exchange & rental leader Goal: Update property valuations several times per day (not possible with enterprise servers) Results: HPC technology enabled all updates in 8-9 hours Avoided move to heuristics Allowed company to focus on rental side

Outcomes-Based Medical Diagnosis and Treatment Planning Enter the patient s history and symptomology. While the patient is still in the office, sift through millions of archived patient records for relevant outcomes. Provider considers the efficacies of various treatments for similar patients (but is not bound by the findings). Ergo, this functions as a powerful decision-support tool. Benefits: better outcomes + rein in costly outlier practices.

Digital Television Services A global leader with 30 million subscribers Goal: maximize revenue & customer satisfaction during high-growth period Result: HPC has added 7.5 million in annual revenue while increasing satisfaction

IDC HPDA Server Forecast Fast growth from a small starting point: $933 M HPDA ecosystem >$2.6B in 2018

IDC HPDA Storage Forecast Storage is the fastest-growing HPC market (9% CAGR) And HPDA storage will grow even faster (26.5% CAGR)

In Summary

Summary: HPDA Market Opportunity HPDA: simulation + newer highperformance analytics IDC predicts fast growth from a small starting point HPC and high-end commercial analytics are converging Algorithmic complexity is the common denominator Economically important use cases are emerging No single HPC solution is best for all problems

HPDA User Talks: HPC User Forums, UK, Germany, France, China and U.S. HPC in Evolutionary Biology, Andrew Meade, University of Reading HPC in Pharmaceutical Research: From Virtual Screening to All-Atom Simulations of Biomolecules, Jan Kriegl, Boehringer- Ingelheim European Exascale Software Initiative, Jean-Yves Berthou, Electricite de France Real-time Rendering in the Automotive Industry, Cornelia Denk, RTT-Munich Data Analysis and Visualization for the DoD HPCMP, Paul Adams, ERDC Why HPCs Hate Biologists, and What We're Doing About It, Titus Brown, Michigan State University Scalable Data Mining and Archiving in the Era of the Square Kilometre Array, the Square Kilometre Array Telescope Project, Chris Mattmann, NASA/JPL Big Data and Analytics in HPC: Leveraging HPC and Enterprise Architectures for Large Scale Inline Transactional Analytics in Fraud Detection at PayPal, Arno Kolster, PayPal, an ebay Company Big Data and Analytics Vendor Panel: How Vendors See Big Data Impacting the Markets and Their Products/Services, Panel Moderator: Chirag Dekate, IDC Data Analysis and Visualization of Very Large Data, David Pugmire, ORNL The Impact of HPC and Data-Centric Computing in Cancer Research, Jack Collins, National Cancer Institute Urban Analytics: Big Cities and Big Data, Paul Muzio, City University of New York Stampede: Intel MIC And Data-Intensive Computing, Jay Boisseau, Texas Advanced Computing Center Big Data Approaches at Convey, John Leidel Cray Technical Perspective On Data-Intensive Computing, Amar Shan Data-intensive Computing Research At PNNL, John Feo, Pacific Northwest National Laboratory Trends in High Performance Analytics, David Pope, SAS Processing Large Volumes of Experimental Data, Shane Canon, LBNL SGI Technical Perspective On Data-Intensive Computing, Eng Lim Goh, SGI Big Data and PLFS: A Checkpoint File System For Parallel Applications, John Bent, EMC HPC Data-intensive Computing Technologies, Scott Campbell, Platform/IBM The CEA-GENCI-Intel-UVSQ Exascale Computing Research Centre, Marie-Christine Sawley, Intel

Questions? ejoseph@idc.com sconway@idc.com cdekate@idc.com

Big Data Software Shortcomings -- Today

Big Data Software:

Big Data Software Technology Stack