Capstone Overview Architecture for Big Data & Machine Learning. Debbie Marr ICRI-CI 2015 Retreat, May 5, 2015



Similar documents
ICRI-CI Retreat Architecture track

Xeon+FPGA Platform for the Data Center

Architectures and Platforms

Intel Xeon +FPGA Platform for the Data Center

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Intel Xeon Processor E5-2600

Exascale Challenges and General Purpose Processors. Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation

Infrastructure Matters: POWER8 vs. Xeon x86

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

MOC 20467B: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

IMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications

Data Center and Cloud Computing Market Landscape and Challenges

A Brief Introduction to Apache Tez

Cray: Enabling Real-Time Discovery in Big Data

FPGAs for Trusted Cloud Computing

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

Parallel Computing. Benson Muite. benson.

NVIDIA Tools For Profiling And Monitoring. David Goodwin

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage

Application of Predictive Analytics for Better Alignment of Business and IT

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

HP Smart Array Controllers and basic RAID performance factors

Solid State Storage in Massive Data Environments Erik Eyberg

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

Introducing EEMBC Cloud and Big Data Server Benchmarks

VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS

BSC vision on Big Data and extreme scale computing

Architectures for Big Data Analytics A database perspective

In-Memory Data Management for Enterprise Applications

POWER8 Performance Analysis

Scaling up to Production

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Sequential Performance Analysis with Callgrind and KCachegrind

Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED

Intel DPDK Boosts Server Appliance Performance White Paper

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

High-Density Network Flow Monitoring

Securing the Intelligent Network

Understanding Data Locality in VMware Virtual SAN

RevoScaleR Speed and Scalability

Accelerating Web-Based SQL Server Applications with SafePeak Plug and Play Dynamic Database Caching

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

Data Centric Systems (DCS)

HPC data becomes Big Data. Peter Braam

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

Software-defined Storage Architecture for Analytics Computing

MS EXCHANGE SERVER ACCELERATION IN VMWARE ENVIRONMENTS WITH SANRAD VXL

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Parallel Algorithm Engineering

Lecture 36: Chapter 6

Diablo and VMware TM powering SQL Server TM in Virtual SAN TM. A Diablo Technologies Whitepaper. May 2015

Evaluation Report: Supporting Microsoft Exchange on the Lenovo S3200 Hybrid Array

How To Build A Cloud Computer

Cloud Data Center Acceleration 2015

Research Statement. Hung-Wei Tseng

Microsoft Private Cloud Fast Track

SGI High Performance Computing

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

HP Moonshot: An Accelerator for Hyperscale Workloads

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Design Cycle for Microprocessors

Networking Virtualization Using FPGAs

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Rapid System Prototyping with FPGAs

Virtuoso and Database Scalability

The Methodology Behind the Dell SQL Server Advisor Tool

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Parametric Analysis of Mobile Cloud Computing using Simulation Modeling

A Close Look at PCI Express SSDs. Shirish Jamthe Director of System Engineering Virident Systems, Inc. August 2011

Numerix CrossAsset XL and Windows HPC Server 2008 R2

Building a High Performance Deduplication System Fanglu Guo and Petros Efstathopoulos

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

What's New in SAS Data Management

PERFORMANCE MODELS FOR APACHE ACCUMULO:

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

Part I Courses Syllabus

bigdata Managing Scale in Ontological Systems

Measuring Cache and Memory Latency and CPU to Memory Bandwidth

Linux Performance Optimizations for Big Data Environments

Lab Evaluation of NetApp Hybrid Array with Flash Pool Technology

Parallel Programming Survey

YarcData urika Technical White Paper

Performance monitoring at CERN openlab. July 20 th 2012 Andrzej Nowak, CERN openlab

EMC XtremSF: Delivering Next Generation Performance for Oracle Database

Reference Architecture, Requirements, Gaps, Roles

Desktop Virtualization and Storage Infrastructure Optimization

Transcription:

Capstone Overview Architecture for Big Data & Machine Learning Debbie Marr ICRI-CI 2015 Retreat, May 5, 2015

Accelerators Memory Traffic Reduction Memory Intensive Arch. Context-based Prefetching Deep Learning SimNets Distributed Methods for Deep Learning Scene Understanding Saliency Estimation Statistics of Depth Images Arguments for Persuasive Discussion Universal Semantics Transcript Quality Inference for NLP Relations and Events Extraction Knowledge Graphs Hybrid Models Syntactic & Semantic Reranking Language Modeling 2 nd -order Embedding Mental Phenotyping Reinforcement Learning

Distributed Deep Learning Library Accelerators for Big Data and Machine Learning Conversational Speech Understanding

Big Data / Machine Learning Hardware Infrastructure View

AAL & ICRI-CI Accelerator Investments Extract / Transform / Load (ETL) Key Metrics: I/O, network, storage: latency, capacity, bandwidth Training Key Metrics: Wall-clock time to train model Capstone: Accelerators for Big IL/Accelerator Data Architecture and Lab Machine Learning Inference (Predict/Score/Classify) Key Metrics: Throughput Latency Power (thermals) Energy (battery-life)

Example: Text Analytics Pipeline

Capstone Optimized IA for Big Data & Machine Learning Goal: Break-through performance and energy-efficiency for a big data analytics platform 1. Data movement within/across nodes, where and when to (not) store 2. Computation placed in the storage & network hierarchy 3. New accelerators for big data 4. Applications and usage of new memory technologies (e.g. memristors) 5. Leveraging ML algorithms for new microarchitectures NETWORK OF MUCH MORE DATA Current NETWORK OF MUCH MORE DATA Future P: Processor $: Cache M: Memory PA: Processor Accelerator TA: Traditional Accelerator MA: Memory Accelerator NA: New Accelerator ALL DATA BIG DATA Small DATA SSD NA SSD ALL DATA Small DATA TA M $ MA M $ old new BIG data Med. Data Small Data PA P PA P

Plan and Timeline Goal: Break-through performance and energy-efficiency for a big data analytics platform Time Plan Status Q1'15 Education, background, select target & workloads. Bi-weekly meetings since Q4'14: * Education & background: target area and algorithms * Looked at Hi-Bench and search for ETL workloads * High-level analysis of workloads. Q2'15 Q3'15 Q4'15 Q1'16 Q2'16 Q3'16 -Q2'17 Broad-stroke microarchitecture, performance, and workloads Next-level of detail for microarchitecture, performance, and workloads High-level simulations and models of workloads on microarchitecture Detailed simulations and models of workloads on microarchitecture Bring simulator and workloads in-house for further analysis and in-house assessment Parallel work on accelerators for target workloads and microarchitecture. 2 F2F meetings setup: * 5/3 at the Technion * 6/11 at Intel Jones Farm

Example: Algorithms -> Accelerators Step 1: Study algorithm Study algorithm, usage, alternatives, benefits, trade-offs Get multiple code & datasets MovieLens Analyze code on several datasets Online Dating

Performance Example: Algorithms -> Accelerators Step 2: Optimize algorithm Low-level optimizations Vectorizing Threading (better) Software prefetches Branch prediction Higher-level optimizations Algorithm & Data format changes Compression Re-ordering/pipelining phases Identify bottlenecks & pain points 1-25x 3-9x Higher-level opt Low-level opt

Performance Example: Algorithms -> Accelerators Step 3: Accelerators Compute characteristics Serial Parallel Operations Bandwidth characteristics Caching vs. streaming Gathers/scatters Working set size Communication 4-100x 1-25x 3-9x Accelerators Higher-level opt Low-level opt

Performance Example: Algorithms -> Accelerators Step 4: Prototype FPGA Modify program to offload work to FPGA Accelerator functionality Synthesis Power, Area, Frequency Simulations Put it all together inc. overhead Programmability New instruction? PCIe device? Competitive Analysis Transfer to product 4-100x 1-25x 3-9x Accelerators Higher-level opt Low-level opt

ICRI-CI Architecture Track Year 4-5 Reduction of Memory traffic for Big Data Funnel: Reduction of Data Movement in Big Data system Prof. Uri Weiser Prof. Avinoam Kolodny Accelerators for Big Data & Machine Learning Novel Accelerators Prof. Ran Ginosar Prof. Oded Schwartz Machine Learning for Architecture Adaptive Systems Prof. Yoav Etsion Prof. Uri Weiser Prof. Shie Mannor Memory Intensive Architecture: New memory based machine New Technology based Architecture Dr. Shahar Kvatinsky Prof. Avinoam Kolodny Prof. Eby Friedman (Technion/Rochester) Prof Yuval Cassuto