FPGA Accelerator Virtualization in an OpenPOWER cloud Fei Chen, Yonghua Lin IBM China Research Lab
Trend of Acceleration Technology Acceleration in Cloud is Taking Off Used FPGA to accelerate Bing search on 1632 servers A 6*8 2D-torus design for high throughput network topology Appliance TB scale problem Acceleration architecture for single node Dedicate acceleration resource Acceleration in Cloud PB scale problem Architecture for thousands of nodes Shared acceleration resources Storage >2000PB, processing 10~100PB/day, log 100TB~1PB/day Using FPGA for storage controller Used GPU for Deep Learning Acceleration programming becomes hot topics OpenCL, Sumatra (Oracle), LiMe (IBM), Proprietary accelerator framework Close innovation model Innovations Required Open framework to enable accelerator sharing & integration Open innovation model through eco-system Scalable acceleration fabric Open framework for accelerator integration and sharing Accelerator resource abstraction, re-configuration and scheduling in cloud Modeling & advisory tool for dynamic acceleration system composition 2
Resources on FPGA are huge Resources on FPGA Programmable resources Logic cells (LCs) DSP slices: Fixed/floating-point On-chip memory blocks Clock resources Miscellaneous periphrals (Xilinx Virtex as an example) DDR3 controllers PCIe Gen3 interfaces 10G Ethernet controllers... Hard processor core PowerPC: Xilinx Virtex-5 FXT ARM: Xilinx Zynq-7000 Atom: Intel + Altera E600C FPGA Capacity Trends Xilinx Virtex UltraScale 440 FPGA, the largestscale FPGA in the world delivered in2014, consists of more than 4 million logical cells (LCs). Using this chip, we can build up to 250 AES crypto accelerators, or 520 ARM7 processor cores. 3
FPGA on Cloud Double Win Cloud benefits from FPGA Performance Power consumption FPGA benefits from cloud Lower the cost Tenants need not purchase and maintain FPGAs. Tenants pay for accelerators only when using them. More applications High FPGA utilization Ecosystem Grow with the cloud ecosystem 4
Motivation of Accelerator/FPGA as Service in Cloud Enable the manageability Can FPGA (pool) be managed in data center? ID, location, reconfiguration, performance, etc. Reduce deployment complexity How to orchestrate FPGA/accelerator resources with VM, network and storage resources easily, according to the need of application? Reduce system cost How to reduce system cost through sharing FPGA resources among applications, VMs, containers? App. App. App. VM VM VM Container Container Container Dynamic, Flexible, Priority controllable Bring high value of cloud infrastructure Could we generate new value for IaaS? host Container VM Network Storage 5
Companies or individual developers could upload and sale their accelerator through market place (e.g. on OpenPOWER) FPGA Ecosystem in Cloud Accelerator Market Place Accelerator Cloudify Tool (in plan) Accelerator market place will do the cloudify for accelerator, through integrating the service layer with accelerator and compilation All the integration, compilation, test, verification and certification will be done in automatic way. Cloud Tenants Pay for the usage of accelerator, rather than license and hardware Get the accelerator service in selfservice way Use the single HEAT orchestrator to finish the workload deployment with accelerator, together with compute, network, and storage. Accelerator developers Cloud service provider will buy the Cloudified accelerators on market place Create the Service Category for FPGA accelerator, and sale on the cloud as service HEAT orchestrator Compute Network Storage POWER8/PowerKVM FPGA accelerator OpenStack extension for accelerator service Service logics for accelerator service in FPGA Cloud Service Provider FPGA cards
Accelerator as Service on SuperVessel Accelerator MarketPlace for developers to upload and compile the accelerators for SuperVessel POWER cloud Allow user to request different size of cluster Fig.2 Cloud users could apply accelerator when creating VM Fig.1 Accelerator MarketPlace for SuperVessel Cloud 7
HW Modules Enabling FPGA virtualization in OpenStack cloud KVM-based Compute Node Utilities APIs Guest Process Bitfile Library Virtual Machine Openstack Guest Agent Virtual Guest Guest Control FPGA OS Module Driver Utilities Driver Docker-based Compute Node Applications APIs Virtual Machine Openstack Agent Utilities Driver Library APIs Images Library APIs Images Hypervisor Host Control Module Host Driver Hardware DRAM Service Logic FPGA Kernel Hardware Control Module / Driver CAPI FPGA Tenant Tenant Openstack-based Cloud Control Node Scheduler Compute Node Enhanced OpenStack FPGA Compute Node Components for FPGA framework 8
FPGA accelerator as Services online on SuperVessel Cloud Try it here: www.ptopenlab.com Super Marketplace (Online) SuperVessel Cloud Service (Preparing) SuperVessel Big Data and HPC Service OpenPOWER Enablement Service (Online) Super Class Service (Preparing) Super Project Team Service 1.VM and container service 2.Storage service 3.Network service 4.Accelerator as service 5.Image service 1.Big Data: MapReduce (Symphony), SPARK 2.Performance tuning service 1.X-to-P migration 2.AutoPort Tool 3.OpenPOWER new system test service 1.On-line video courses 2.Teacher course management 3.User contribution management 1.Project management service 2.DevOps automation SuperVessel Cloud Infrastructure Docker Storage IBM POWER servers OpenPOWER server FPGA/GPU 9
Thanks! 10
FPGA Implementation Computer Apps OS Hardware Service Logic C D A B FPGA chip B A D C Service Sublayer Platform Sublayer Reconfig Controller Eth DRAM User Sublayer High Bandwidth I/O Registers Job Queue Job Scheduler PCIe / CAPI DMA Engine Context Controller Switch Security Controller A B C D : Shared FPGA resource Service Sublayer : Job Queue, Switch, Platform Sublayer : DRAM, PCIe, ICAP, The FPGA subsystem is designed as a computer system. 11
System Implementation 3. VM request (with accelerator) Compute Node Compiler Scheduler 1. Accelerator source code package 5. Launch VM Control Node Compute 4. acc_file Dashboard 2. image_file Glance Compiler Control Node: Nova, Glance, Horizon, Neutron, Swift Compute Node: Nova Compute Compiler: FPGA incremental compiler environment 12
CV Average Latency (ms) Total Bandwidth(MB/s) Bandwidth (BM/s) Bandwidth (MB/s) (1) Accelerator Sharing Evaluation Evaluation (2) Management Bandwidth Control (3) Management Priority Control 1300 1200 1100 Host One VM 1600 1200 800 Process 0 Process 1 Total Reduce VM bandwidth Increase P0 bandwidth Reduce P0 bandwidth 10000 1000 100 10 Process 0 Process 1 1194 MB/s 25 MB/s 1000 900 1 2 3 4 5 6 7 8 Number of Processes 400 0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 Time 1 4 3 1 11 21 31 41 51 61 Time (second) (1) Host: All processes run in host environment One VM: All processes run in one VM VMs : Each process runs in one VM AESs: Each VM uses one independent AES accelerator 2 1 0 80% 60% 0.21ms 2.3ms 0.22ms 1 11 21 31 41 51 61 Time (second) (3) Process 0 : 256KB payload, 100 times per second Process 1 : 4MB payload, best effort use. Same priority during second 1 ~ 38. Raise process 0 priority at second 38. 40% 20% 0% 1 11 21 31 41 51 61 Time (second) Process 1 begin Priority control 13