ICRI-CI Retreat Architecture track

Similar documents

Capstone Overview Architecture for Big Data & Machine Learning. Debbie Marr ICRI-CI 2015 Retreat, May 5, 2015

Situation. Flew 11,482 km to greet Yale Have to fight again with Bob What can I fill-in after this extraordinary speaker? Defiantly a challenge

Memory Channel Storage ( M C S ) Demystified. Jerome McFarland

Cloud Data Center Acceleration 2015

Resource Efficient Computing for Warehouse-scale Datacenters

Scaling from Datacenter to Client

Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance. Alex Ho, Product Manager Innodisk Corporation

Next Generation Operating Systems

PSAM, NEC PCIe SSD Appliance for Microsoft SQL Server (Reference Architecture) September 11 th, 2014 NEC Corporation

Exascale Challenges and General Purpose Processors. Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

Xeon+FPGA Platform for the Data Center

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

Hadoop: Embracing future hardware

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25

LSI SAS inside 60% of servers. 21 million LSI SAS & MegaRAID solutions shipped over last 3 years. 9 out of 10 top server vendors use MegaRAID

How SSDs Fit in Different Data Center Applications

Datacenter Operating Systems

A Scalable VISC Processor Platform for Modern Client and Cloud Workloads

How To Build An Ark Processor With An Nvidia Gpu And An African Processor

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Certification: HP ATA Servers & Storage

LSI MegaRAID CacheCade Performance Evaluation in a Web Server Environment

The MAX5 Advantage: Clients Benefit running Microsoft SQL Server Data Warehouse (Workloads) on IBM BladeCenter HX5 with IBM MAX5.

Application-Focused Flash Acceleration

Intel RAID SSD Cache Controller RCS25ZB040

Energy Efficient MapReduce

Intel Solid-State Drives Increase Productivity of Product Design and Simulation

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Oracle Exadata Database Machine for SAP Systems - Innovation Provided by SAP and Oracle for Joint Customers

Oracle Database Reliability, Performance and scalability on Intel Xeon platforms Mitch Shults, Intel Corporation October 2011

NV-DIMM: Fastest Tier in Your Storage Strategy

IOS110. Virtualization 5/27/2014 1

Performance Management for Cloudbased STC 2012

Using Synology SSD Technology to Enhance System Performance. Based on DSM 5.2

HP Cloudline Overview

Certification Document bluechip STORAGEline R54300s NAS-Server 03/06/2014. bluechip STORAGEline R54300s NAS-Server system

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Accelerating Real Time Big Data Applications. PRESENTATION TITLE GOES HERE Bob Hansen

Windows Server 2008 R2 Hyper V. Public FAQ

Analyzing the Virtualization Deployment Advantages of Two- and Four-Socket Server Platforms

VP/GM, Data Center Processing Group. Copyright 2014 Cavium Inc.

Enabling High performance Big Data platform with RDMA

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

I/O virtualization. Jussi Hanhirova Aalto University, Helsinki, Finland Hanhirova CS/Aalto

Maximum performance, minimal risk for data warehousing

2009 Oracle Corporation 1

Using Synology SSD Technology to Enhance System Performance Synology Inc.

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Packet-based Network Traffic Monitoring and Analysis with GPUs

Non Volatile Memory Invades the Memory Bus: Performance and Versatility is the Result

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

High-Density Network Flow Monitoring

Essentials Guide CONSIDERATIONS FOR SELECTING ALL-FLASH STORAGE ARRAYS

MaxDeploy Hyper- Converged Reference Architecture Solution Brief

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

Accelerating Microsoft Exchange Servers with I/O Caching

Data Center and Cloud Computing Market Landscape and Challenges

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

FlashSoft Software from SanDisk : Accelerating Virtual Infrastructures

Full and Para Virtualization

Hardware Based Virtualization Technologies. Elsie Wahlig Platform Software Architect

Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting

Minimize cost and risk for data warehousing

Maximizing SQL Server Virtualization Performance

LSI MegaRAID FastPath Performance Evaluation in a Web Server Environment

Intel Xeon Processor E5-2600

Data Center Specific Thermal and Energy Saving Techniques

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda

A Quantum Leap in Enterprise Computing

NEC Express5800/A2000 Series Scalable Enterprise Server.OR CX (Core Xeon) High Operation-ability

Oracle Exadata: The World s Fastest Database Machine Exadata Database Machine Architecture

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Current Status of FEFS for the K computer

SSDs: Practical Ways to Accelerate Virtual Servers

MS EXCHANGE SERVER ACCELERATION IN VMWARE ENVIRONMENTS WITH SANRAD VXL

WHITE PAPER. Flash in the SAN Panacea or Placebo?

Emerging Solutions. Laura Stark Senior Vice President and General Manager

All-Flash Arrays Weren t Built for Dynamic Environments. Here s Why... This whitepaper is based on content originally posted at

The next step in Software-Defined Storage with Virtual SAN

Databases Going Virtual? Identifying the Best Database Servers for Virtualization

SSDs: Practical Ways to Accelerate Virtual Servers

FPO. Expanding Intel Architecture Flexibility in the Data Center. Markus Leberecht Data Center Solutions Architect, Intel EMEA March 20, 2013

Open Source Flash The Next Frontier

Big Fast Data Hadoop acceleration with Flash. June 2013

Big Data Performance Growth on the Rise

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution

SQL Server Virtualization

System Architecture. In-Memory Database

MySQL performance in a cloud. Mark Callaghan

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

I/O Virtualization The Next Virtualization Frontier

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Solid State Drive Architecture

Transcription:

ICRI-CI Retreat Architecture track Uri Weiser June 5 th 2015 - Funnel: Memory Traffic Reduction for Big Data & Machine Learning (Uri) - Accelerators for Big Data & Machine Learning (Ran) - Machine Learning for Architecture Contex-based Prefetching (Yoav) - Memory Intensive Architecture (Avinoam) 1

ICRI-CI Architecture track Theme Past theme: Develop new Heterogeneous Architecture concepts and Architecture for Machine Learning and employ Machine Learning to develop architecture Next phase Capstone: Optimized IA for Big Data & Machine Learning Workloads Funnel Accelerators: Architecture for Machine Learning Machine Learning for Architecture Memory Intensive Architecture New Continuation 2

ICRI-CI Architecture track research activities Past and Future Hetero: Provide an energy tool to be used for future SoC energy partition Power management: next step in Heterogeneous computing Funnel: Proof of Concept and potential (collaboration Intel Lab - Debbie Marr s group) Machine Learning for Architecture - Context Aware prediction Accelerators: Associative processors Memory Intensive Architecture 3

The Funnel Research A Funnel is a pipe with a wide, often conical mouth and a narrow stem May 2015 4

Environment Era of Big Data Data centers Size 1 million m 2 Power 100MWatts Power cost (US 2014) $10B (1) Power Usage Efficiency PUE = 1.2 to 2.0 1 Joule saved in computing saves around 1.5 Joule of data center energy (1) http://www.computerworld.com/article/2598562/data-center/data-centers-are-the-new-polluters.html 5

Datacenter Power Hardware subsystem power Louis Borroso talk http://www.cs.berkeley.edu/~rxin/db-papers/warehousescalecomputing.pdf 6

Energy: From: Bill Dally (nvidia and Stanford), Efficiency and Parallelism, the challenges of future computing From: Mark Horowitz, Stanford Computing s Energy Problems 7

Energy: From: Bill Dally (nvidia and Stanford), Efficiency and Parallelism, the challenges of future computing From: Mark Horowitz, Stanford Computing s Energy Problems 8

Energy in mind 20,000pJ/256 bits 20pJ/op 10X 100X 200pJ/Instruction @28nm technology 26 From: Bill Dally (nvidia and Stanford), Efficiency and Parallelism, the challenges of future computing 9

10

Energy: DRAM 11

Data movements Data source (SSD/NIC) MC Front End $ Operations CPU DRAM Copy 12 12

Data movements Read Once Data source (SSD/NIC) MC Front End $ Operations CPU DRAM Copy Cache/Memory are not effective if: Cache related: Reuse distance: >1M access Memory related: Reuse distance: >1G access Read once data should NOT reside in either standard Cache nor DRAM 13 13

Data Movement reduction Read Once Reduction of data movement via computing as close as possible to the data source In Big Data Processing (especially in ETL* stage) huge amount of data is being READS ONCE Why direct data to DRAM? Use HW cyclic buffer (e.g. DDIO/DCA) Funnel idea Where should the Funnel reside? DISK/SSD/NIC Front end By pass DRAM via DDIO/DCA *ETL=Extract Transform Load 14

Dedicated ETL servers clusters Perf limited by I/O, disk, network, database, HDFS, etc. A lot of data gets moved around. For some customers more time is spent in ETL than in ML Training. Dedicated ML Training server clusters For deep learning, this is where people are using accelerators to make training faster. Not very scalable, wall-clock time (latency) oriented. Application Servers Machine Learning Inference/Classificatio n/prediction/scoring is embedded in the larger context of whole-application services. Usually no accelerators. Very scalable, throughput-oriented 15

ETL Extract, Transform, Load Lot of data moving IN Data movements Simple transformations simple computation Bottleneck Performance limited by I/O, disk, network, database, HDFS Data accessed not frequently but huge data is accessed Energy Bottleneck? - ETL more than ML Training 16

Big Data system Flow Data In ETL ML Clint 17

Big Data Flow Data In ETL ML Clint 18

The Funnel N f = Funnel ratio BW in = BW out * N f BW in BW out Move computing closer to data source Data BW out N 1 N 2 N 3 BW in BW 1 BW 2 Computing Engine Decrease in BW What should be the INPUT BW to fully utilize the computing engine? BW in =BW out *N 3 *N 2 *N 1 19

The Funnel If data is consumed in highest rate by the computing engine DISK/ SSD/ NIC 0.4GB/s SATA SATA SATA SATA SATA N 1 =1 N 1 =1 N 1 =1 15GB/s 50GB/s PCIe PCIe PCIe PCIe DDR4 DDR4 Computing Engines Balance the system!!! If you use Funnel remember system should be balanced BW in =BW out *N 3 *N 2 *N 1 20

Process data - location? Read once Predefine filter Dynamic filter Data source (SSD/NIC) F1 Front End F2 $ Operations CPU DRAM Read once data should NOT reside in either standard Cache nor DRAM Save energy? Save DRAM space Save data movements 21 21

Research Plan Identify access patterns related to buckets Write once - Read once Write many Read once Write once Read Many Write many- Read many Provide solution to the ETL stage to reduce energy and improve performance Funnel I at Disk level Funnel II at the front end Proof Of Concept via DDIO 22

Moving data in ETL wastes energy in big data infrastructure and applications Our Research Comprehend data flow and access patterns in big data applications Data Flow IO: Disk / Network Flow through: SATA / QPI / Chipset / DDR / DRAM / Caches Data Read/Write patterns Apply energy-efficient solutions for each data Funnel: Move computation to data when possible Funnel: Aggregate data early on to reduce communications Store data on optimized memory structures based on usage 23

Open issues and future research SW and OS Co-Processor or Heterogeneous system Compatibility Application awareness of the feature 24

Summary The Funnel functions execute close to the data source Reduction of Data movement Free up system s memory resources (re-spark) Simple-energy-efficient engines at the front end Issues Compatibility issue: Apps, OS, Amount of energy saving. 25

Thanks 26