Xeon+FPGA Platform for the Data Center



Similar documents
Intel Xeon +FPGA Platform for the Data Center

Data Center and Cloud Computing Market Landscape and Challenges

Networking Virtualization Using FPGAs

Cloud Data Center Acceleration 2015

FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab

Assessing the Performance of Virtualization Technologies for NFV: a Preliminary Benchmarking

High-performance vswitch of the user, by the user, for the user

Exascale Challenges and General Purpose Processors. Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25

Embedded Systems: map to FPGA, GPU, CPU?

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Memory Channel Storage ( M C S ) Demystified. Jerome McFarland

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Research Report: The Arista 7124FX Switch as a High Performance Trade Execution Platform

High-performance vnic framework for hypervisor-based NFV with userspace vswitch Yoshihiro Nakajima, Hitoshi Masutani, Hirokazu Takahashi NTT Labs.

High-Density Network Flow Monitoring

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Programmable Networking with Open vswitch

Full and Para Virtualization

Infrastructure Matters: POWER8 vs. Xeon x86

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

White Paper. Innovate Telecom Services with NFV and SDN

CFD Implementation with In-Socket FPGA Accelerators

Advanced Computer Networks. Network I/O Virtualization

Intel DPDK Boosts Server Appliance Performance White Paper

How Linux kernel enables MidoNet s overlay networks for virtualized environments. LinuxTag Berlin, May 2014

ICRI-CI Retreat Architecture track

Accelerating the Data Plane With the TILE-Mx Manycore Processor

Enabling Technologies for Distributed Computing

Network Virtualization Technologies and their Effect on Performance

Enabling Technologies for Distributed and Cloud Computing

LS DYNA Performance Benchmarks and Profiling. January 2009

A Scalable VISC Processor Platform for Modern Client and Cloud Workloads

SOFTWARE-DEFINED: MAKING CLOUDS MORE EFFICIENT. Julian Chesterfield, Director of Emerging Technologies

Intel Network Builders: Lanner and Intel Building the Best Network Security Platforms

A Low Latency Library in FPGA Hardware for High Frequency Trading (HFT)

ECLIPSE Performance Benchmarks and Profiling. January 2009

The virtualization of SAP environments to accommodate standardization and easier management is gaining momentum in data centers.

Enhance Service Delivery and Accelerate Financial Applications with Consolidated Market Data

Router Architectures

2015 Global Technology conference. Diane Bryant Senior Vice President & General Manager Data Center Group Intel Corporation

LSI SAS inside 60% of servers. 21 million LSI SAS & MegaRAID solutions shipped over last 3 years. 9 out of 10 top server vendors use MegaRAID

Parallel Computing. Benson Muite. benson.

Driving force. What future software needs. Potential research topics

Virtualization. Dr. Yingwu Zhu

Developing High-Performance, Flexible SDN & NFV Solutions with Intel Open Network Platform Server Reference Architecture

Implementing a unified Datacenter architecture for service providers

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

Big Data Performance Growth on the Rise

Virtualization. Types of Interfaces

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Intel Open Network Platform Release 2.1: Driving Network Transformation

FPGA-based MapReduce Framework for Machine Learning

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

Rackspace Cloud Databases and Container-based Virtualization

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

Operating System Support for Multiprocessor Systems-on-Chip

VNF & Performance: A practical approach

Cluster, Grid, Cloud Concepts

Capstone Overview Architecture for Big Data & Machine Learning. Debbie Marr ICRI-CI 2015 Retreat, May 5, 2015

Scalable Architecture for Accelerating IA Designs. SYSTEM ON A CHIP (SoC) 1-2 Gbps

QoS & Traffic Management

Hardware Based Virtualization Technologies. Elsie Wahlig Platform Software Architect

The search engine you can see. Connects people to information and services

SDN software switch Lagopus and NFV enabled software node

Multi-Threading Performance on Commodity Multi-Core Processors

BSC vision on Big Data and extreme scale computing

Accelerate OpenStack* Together. * OpenStack is a registered trademark of the OpenStack Foundation 1

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

An Oracle Technical White Paper November Oracle Solaris 11 Network Virtualization and Network Resource Management

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and IBM FlexSystem Enterprise Chassis

Moving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim?

The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology

7a. System-on-chip design and prototyping platforms

Next Generation Operating Systems

COMPUTING. Centellis Virtualization Platform An open hardware and software platform for implementing virtualized applications

Ericsson Introduces a Hyperscale Cloud Solution

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Going Linux on Massive Multicore

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

How To Build A Cisco Ukcsob420 M3 Blade Server

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

HP Moonshot: An Accelerator for Hyperscale Workloads

Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting

PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE

Performance Counter. Non-Uniform Memory Access Seminar Karsten Tausche

Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber

Telecom - The technology behind

Hardware- and Network-Enhanced Software Systems for Cloud Computing

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

High Performance or Cycle Accuracy?

QLogic 16Gb Gen 5 Fibre Channel in IBM System x Deployments

Accelerate Cloud Computing with the Xilinx Zynq SoC

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

Parallel Algorithm Engineering

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and Dell PowerEdge M1000e Blade Enclosure

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

New Cluster-Ready FAS3200 Models

Transcription:

Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG

Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system 2

Exponential growth in mobile. 3

is driving Data Center growth 4

Leading to search for greater performance efficiencies 5

Across Data Center Workloads Diverse workloads: Cloud Services: Search, Web Servers,.. Analytics: Big Data, Machine Learning, Scientific: Genomics, Security, Communication: Packet Processing, Virtual Switching, Storage: Compression, Deduplication, Changing dynamics: No single killer app Emerging new apps drive changes in workloads 6

A homogenous compute platform for the Data Center? 7

Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system 8

Motivation for Accelerators Enhanced Performance: Accelerators compliment CPU cores to meet market needs for performance of diverse workloads in the Data Center: Enhance single thread performance with tightly coupled accelerators or compliment multi-core performance with loosely coupled accelerators via PCIe or QPI attach Move to Heterogeneous Computing: Moore s Law continues but demands radical changes in architecture and software.. Architectures will go beyond homogeneous parallelism, embrace heterogeneity, and exploit the bounty of transistors to incorporate application-customized hardware.

Accelerator Architecture General Purpose Cores Cost (Area, Power) Ease of Programming/ Development Reconfigurable Fixedfunction Flexibility Performance Efficiency: Performance/Watt, Performance/$ Programming Complexity : Effort, Cost

Accelerator Attach PCIe attach Cost (Latency, Granularity) QPI attach On-core On-Chip On-Package Distance from Core Best attach technology might be application or even algorithm dependent

Coherency and Programming Model Data Movement In-line Accelerator processes data fully or partially from direct I/O Shared Virtual Memory : Virtual addressing eliminates need for pinning memory buffers Zero-copy data buffers Interaction between Core and Accelerator Off-load Hybrid : algorithm implemented on host and accelerator

Proposed Platform for the Data Center FPGA with coherent low-latency interconnect: Simplified programming model Support for virtual addressing Data Caching Enables new classes of algorithms for acceleration with: Full access to system memory Support for efficient irregular data pattern access Remapping of algorithms from off-load model to hybrid processing model Fine grained interactions 13

IVB+FPGA Software Development Platform Software Development for Accelerating Workloads using Xeon and coherently attached FPGA in-socket DDR3 Processor FPGA Module Intel Xeon E5-26xx v2 Processor Altera Stratix V DDR3 DDR3 DDR3 Intel Xeon E5-2600 v2 Product Family QPI FPGA DDR3 DDR3 QPI Speed Memory to FPGA Module Expansion connector to FPGA Module 6.4 GT/s full width (target 8.0 GT/s at full width) 2 channels of DDR3 (up to 64 GB) PCIe 3.0 x8 lanes - maybe used for direct I/O e.g. Ethernet DMI2 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe* 3.0 x8 PCIe* 3.0 x8 Features Software Configuration Agent, Caching Agent,, (optional) Memory Controller Accelerator Abstraction Layer (AAL) runtime, drivers, sample applications Heterogeneous architecture with homogenous platform support 14

Programming Interfaces CPU FPGA Host Application Accelerator Function Units (AFU) Accelerator Abstraction Layer Service API Virtual Memory API Physical Memory API Uncore Addr Translation QPI/KTI Link, Protocol, & PHY CCI extended CCI standard QPI/KTI Programming interfaces will be forward compatible from SDP to future MCP solutions Simulation Environment available for development of SW and RTL 15

Programming Interfaces : OpenCL CPU OpenCL Host Code OpenCL Application FPGA OpenCL Kernel Code OpenCL RunTime C F G OpenCL Kernels Accelerator Abstraction Layer Service API Virtual Memory API Physical Memory API VirtMem Physical Memory API CCI Extended CCI Standard QPI/UPI/P CIe System Memory Unified application code abstracted from the hardware environment Portable across generations and families of CPUs and FPGAs 16

Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system 17

XEON+FPGA in the Cloud : integration with SDI and RSA Cloud Users Workload IA Optimized Software SDI (Exposing IA capabilities) Cloud Service Provider Cloud Controller Placing workload Intel HAAS dynamic node composition IA Logical Node with Acceleration Intel Rackscale 18

XEON+FPGA in the Cloud: IP Store Concept Cloud Users Intel Accelerators 3rd party IP developers Workload IA Optimized Software 3rd party Accelerators 3rd party Accelerators SDI (Exposing IA capabilities) -IP Catalogue Cloud Service Provider Cloud Controller Placing workload Intel HAAS Intel Store Client static/dynamic FPGA programming IA Node XEON+FPGA 19

Example Usage : Deep Learning Framework for Visual Understanding node cluster CCI Interface SRAM Controller device Read Write DMA Reg Access IP Regist ers Processing Tile n Processing Tile 1 Processing Tile 0 primitives Control State Machine Weights P E Inputs P E Outputs P E 20

Example Usage: Accelerating Open VSwitch w/dpdk VM0 VirtIO QEMU VM1 VirtIO QEMU Userspace Offload DMA Engine to FPGA : Frees up CPU cycles to perform more useful work ovs-vswitchd DPDK Libraries IVSHEM VHOST Reduce cache pollution. FPGA DMA Userspace forwarding netdev TAP Open vswitch Kernel Module ovsdv-server Kernel space Add support for Packet Classification, ACL, and other functions including Direct I/O in FPGA Physical Switch/NIC 21

Example Usage: High Frequency Trading Accelerator CPU Host Application Feed Parser Trading Logic / Statistics Ethernet PHY & MAC Order Generation FPGA 22

Academic Research Call for Proposals: Intel-Altera Heterogeneous Architecture Research Platform Program Submitted by Nicholas Carter Intel-Altera Heterogeneous Architecture Research Platform (HARP) Program Intel Corporation and Altera Corporation are pleased to announce the Heterogeneous Architecture Research Platform (HARP) program, which will provide faculty with computer systems containing Intel microprocessors and an Altera Stratix V FPGA module that incorporates Intel QuickAssist Technology in order to spur research in programming tools, operating systems, and innovative applications for accelerator-based computing systems. 23

Q & A 24