Moving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim?



Similar documents
Rambus Smart Data Acceleration

Bringing Intelligence to the Cloud Edge

Dell Data Center Solutions: A Strong Alternative to DIY in Hyperscale

THE JOURNEY TO COMPOSABLE INFRASTRUCTURE CISCO TAKES AN EVOLUTIONARY APPROACH WITH UCS M-SERIES & C3260 COMBINED WITH UCS MANAGEMENT FRAMEWORK

Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

EXECUTIVE SUMMARY. Page 1 Multivendor Datacenter Supply Chain Suits Multivendor Clouds 1 December 2014 Copyright 2014 Moor Insights & Strategy

Data Center and Cloud Computing Market Landscape and Challenges

HP Moonshot System. Table of contents. A new style of IT accelerating innovation at scale. Technical white paper

Infrastructure Matters: POWER8 vs. Xeon x86

2015 Global Technology conference. Diane Bryant Senior Vice President & General Manager Data Center Group Intel Corporation

FPGA-based MapReduce Framework for Machine Learning

HP Moonshot: An Accelerator for Hyperscale Workloads

FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab

FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25

Xilinx SDAccel. A Unified Development Environment for Tomorrow s Data Center. By Loring Wirbel Senior Analyst. November

White Paper EMBEDDED GPUS ARE NO GAMBLE

Segmenting the Internet of Things (IoT)

HARDENED MULTI-FACTOR AUTHENTICATION INCREASES ENTERPRISE PC SECURITY

A Vision for Tomorrow s Hosting Data Center

Xeon+FPGA Platform for the Data Center

Applied Micro development platform. ZT Systems (ST based) HP Redstone platform. Mitac Dell Copper platform. ARM in Servers

7a. System-on-chip design and prototyping platforms

Can Jabil Revolutionize the Supply Chain?

The Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays

TBR. IBM x86 Servers in the Cloud: Serving the Cloud. February 2012

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

Greater Continuity, Consistency, and Timeliness with Business Process Automation

Make the Most of Big Data to Drive Innovation Through Reseach

Bricata Next Generation Intrusion Prevention System A New, Evolved Breed of Threat Mitigation

2016 Trends in Datacenter Technologies

How To Calculate Hd Costs

DATACENTER INFRASTRUCTURE MANAGEMENT SOFTWARE. Monitoring, Managing and Optimizing the Datacenter

BUILD VERSUS BUY. Understanding the Total Cost of Embedded Design.

high-performance computing so you can move your enterprise forward

Advanced Core Operating System (ACOS): Experience the Performance

Maximum performance, minimal risk for data warehousing

The NEW POSSIBILITY. How the Data Center Helps Your Organization Excel in the Digital Services Economy

SAP SE - Legal Requirements and Requirements

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

Hardware and Software

How To Build An Ark Processor With An Nvidia Gpu And An African Processor

2016 Trends in Storage

1. Securing Untrusted Layer 2 Networks The Different Processing Approaches to Implementing Network Encryption... 3

Taking Virtualization

System x x86 servers from Lenovo achieve top customer satisfaction scores. January 2015 TBR T EC H N O LO G Y B U S I N ES S R ES EAR C H, I N C.

White Paper. Innovate Telecom Services with NFV and SDN

NIOS II Based Embedded Web Server Development for Networking Applications

MS Exchange Server Acceleration

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Ericsson Introduces a Hyperscale Cloud Solution

Analyze, Validate, and Optimize Business Application Performance

HP ProLiant DL380 G5 takes #1 2P performance spot on Siebel CRM Release 8.0 Benchmark Industry Applications running Windows

From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller

Easier - Faster - Better

Headstrong: SAP Solution Helps Streamline and Accelerate Financial Services Application Development

Vendor Update Intel 49 th IDC HPC User Forum. Mike Lafferty HPC Marketing Intel Americas Corp.

Unified Computing Systems

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May Copyright 2014 Permabit Technology Corporation

Intel RAID SSD Cache Controller RCS25ZB040

White Paper. S2C Inc Technology Drive, Suite 620 San Jose, CA 95110, USA Tel: Fax:

TOP 5 REASONS WHY FINANCIAL SERVICES FIRMS SHOULD CONSIDER SDN NOW

QLogic 16Gb Gen 5 Fibre Channel in IBM System x Deployments

Architectures and Platforms

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

Dell* In-Memory Appliance for Cloudera* Enterprise

Data Centric Systems (DCS)

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

Enterprise Application Performance Management: An End-to-End Perspective

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

INCREASING EFFICIENCY WITH EASY AND COMPREHENSIVE STORAGE MANAGEMENT

HPC & Big Data THE TIME HAS COME FOR A SCALABLE FRAMEWORK

Server Consolidation for SAP Business Solutions on Lenovo X6 Systems with X ARCHITECTURE technology and Intel Xeon E7 v2 Processors

Numerix CrossAsset XL and Windows HPC Server 2008 R2

FPGA-based Multithreading for In-Memory Hash Joins

ELEC 5260/6260/6266 Embedded Computing Systems

Cloud Infrastructure Operational Excellence & Reliability

IBM System x reference architecture solutions for big data

Brochure. Update your Windows. HP Technology Services for Microsoft Windows 2003 End of Support (EOS) and Microsoft Migrations

Express5800 Scalable Enterprise Server Reference Architecture. For NEC PCIe SSD Appliance for Microsoft SQL Server

How To Build A Cloud Computer

The role of Access and Control in DCIM

Building a Flash Fabric

Whitepaper Performance Testing and Monitoring of Mobile Applications

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Transcription:

Moving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim? Successful FPGA datacenter usage at scale will require differentiated capability, programming ease, and scalable implementation models Executive Summary General-purpose server processors are reaching diminishing returns limits, as performance-per-watt improvements slow and workloads become more specialized. Certain workload classes are open to acceleration by compute offload or alternative (non-cpu) architectures including digital signal processors (DSP), graphics processing units (GPU), field programmable gate arrays (FPGA), and custom logic. While these accelerators historically have been attached to CPUs via offload interconnects, they increasingly are being integrated onto system-on-chip (SoC) designs. As these technologies mature, Moor Insights & Strategy believes that datacenter workloads deployed at scale will use application-specific acceleration models. FPGAs are gaining momentum in product prototyping and could address the long tail of high-value/low-volume production applications where custom logic is too expensive but other accelerator options are insufficient. Industry leaders (Microsoft, Baidu, Intel, leading global server OEMs such as HP and Dell, etc.) are driving usage models and future technology around FPGA optimization for datacenter workloads. To drive mainstream FPGA adoption in the datacenter, technology providers must develop robust, production-quality implementations that are not performanceconstrained by system architecture. And they must provide intuitive, integrated development environments and tools to make FPGA programming accessible to mainstream application programmers. Application Acceleration in the Datacenter Software defined infrastructure (SDI) models provide large scale datacenters with opportunities to deploy custom hardware solutions optimized for each workload. Moor Insights & Strategy believes that workloads deployed at scale are moving to an application-specific acceleration model. Where multiple racks are dedicated to specific workloads, the initial purchasing efficiency (capital expense) of buying generic IT infrastructure is outweighed by the lifetime operating efficiency (operating expense) of buying fewer, more expensive resources that perform the same task at lower power consumption and less floor space. Some datacenter operators will pay more up-front for equipment that runs specific workloads faster and more efficiently. Certain workload classes are open to acceleration by a range of non-cpu architectures. DSPs and specialized vector processing units were used in high-performance computing in the 1980s and 1990s. DSPs are still favored by legacy telecommunications workloads for their signal processing capabilities. GPUs pulled ahead in the 2000s, as PC gaming drove vendors to improve their rendering pipeline to a more general purpose flow control with a programming model also capable of Page 1 Moving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim? 2 December 2014

compute offload. GPUs continue to work well for processing a large number of small tasks running in parallel (SIMD instructions). FPGA offload acceleration for server workloads is now emerging as a potential solution for complex CPU-like tasks that GPUs cannot handle and for non-standard functions that create CPU and DSP bottlenecks. Moore s Law as applied to FPGA technology has allowed FPGAs to move from glue logic and quick fixes to a more complex set of general purpose logic addressing broader use cases. As technology advances continue, more workloads will open up to FPGA acceleration. Advances in FPGA technology allow for a more powerful class of autonomous, reconfigurable processors with highspeed interfaces that eventually could replace standard general-purpose servers for specific workloads. A growing number of emerging acceleration alternatives are available, and analytics algorithms, application packages, and environments are evolving rapidly. For example, the Spark analytics framework is gaining momentum but did not exist three years ago. Moor Insights & Strategy believes that several acceleration architecture winners will emerge over the next five to ten years based on the wide range of workload-specific requirements. Figure 1 illustrates potential models for application acceleration implementation. This framework is directional and workload-dependent. Figure 1: Application Acceleration Implementation Models Are FPGAs in the Datacenter Ready for Prime Time? Key benefits of an FPGA-based solution over other workload acceleration solutions are flexibility for use with multiple functions and reprogrammability. A programmer can program an FPGA to perform one set of functions (e.g., graphics) and then reprogram it for something entirely different (e.g., pattern recognition). While not as fast or efficient as custom application-specific integrated circuits (ASICs), FPGAs can offer order-ofmagnitude performance gains for specific workloads without requiring an expensive custom design plus the added benefit of re-configurability. Page 2 Moving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim? 2 December 2014

Like anything else subject to Moore s Law, FPGA manufacturing densities and costs are improving over time. They provide enough gates for pattern recognition and analytics as a part of a server SoC or in a peer compute environment with a CPU (rather than architectures that require PCIe add-in cards which may limit performance due to system latency). Until now, the primary FPGA use case has been to accelerate emulation for advanced product prototype, design, simulation, and test environments. However, these systems were cost prohibitive and inaccessible to all but the most advanced of product development organizations. Now, as mainstream servers with FPGA acceleration come to market, access to the power of FPGAs will be democratized providing the ability for a broad base of consumer and industrial Internet of Things (IoT) product organizations to build specialized logic for close to real-time simulation. Further, specific workloads and segments are being identified as mass deployment candidates for FPGA-based solutions, including textual search, machine learning, image processing, cryptology, seismic analysis, signal processing, Monte Carlo analysis, MapReduce, and Memcached. A primary inhibitor to adopting FPGA-based computing solutions at scale in the datacenter is programmability. Successful FPGA programming used to depend on C/C++ language programming skills, low level hand-coding in RTL, and manual tuning, all combined with deep insight into compute and applications architectures. Supporting a deployment of reconfigurable hardware at scale will require a software stack capable of detecting failures while providing a seamless interface to software applications. Key industry leaders experimenting with FPGAs today believe that incorporating domain-specific languages (such as Scala or OpenCL), FPGA targeted C-to-gates tools (such as AutoESL or Impulse C), and libraries of reusable components and design patterns will allow FPGAs to target high-value workloads in the near term. These tools, along with more integrated development environments, are beginning to provide FPGA programming capability to mainstream application programmers. Moor Insights & Strategy believes FPGAs are quickly reaching a point of adoption for datacenter workloads with improved performance/power efficiencies for CPU-like tasks, lower costs, and easier programmability for application programmers. Key Players to Watch A number of leaders across the industry including large service providers, silicon providers, and system manufacturers are driving usage models and future technology around optimizing reconfigurable processors for datacenter workloads. Microsoft conducted a large scale pilot deployment of 1,600+ Open Compute Servers with FPGAs (code-named Catapult) for the company s Bing page-ranking service. The results were 2x improvement in search throughput and 29% reduction in search processing latency. Microsoft expects to deploy this solution to all Bing servers in one datacenter in 2015. Baidu, a leading search engine provider in China, has worked with both Altera and Xilinx on FPGA solutions to accelerate deep neural networks for machine learning Page 3 Moving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim? 2 December 2014

applications. Under various workloads, Baidu found that the FPGA boards were several times more efficient than either a CPU or GPU. Altera plans to integrate FPGA technology with 64-bit ARMv8 cores. Altera demonstrated an FPGA chip with Intel s Quick Path Interconnect (QPI) connecting to an Intel Xeon processor. Xilinx has FPGA integrated with ARM 32-bit cores in-market, and also FPGA accelerators for Intel Xeon systems based on the QPI interface. Intel announced it intends to integrate FPGA capability with a Xeon E5 class product via QPI in a single package that will fit into a standard E5 socket. SRC Computers offers a line of reconfigurable processing engines designed to accelerate performance on high-performance computing and hyperscale workloads. SRC s FPGA solution is autonomous, dynamically reconfigurable, and peer-based (not a CPU-dependent offload engine). HP intends to productize solutions with offload accelerators and alternative compute technologies (e.g., ARMv8-based servers) in the Moonshot product family. Moonshot is focused on optimized application performance to improve datacenter economics over traditional servers for scale-out workloads. Moonshot cartridges with DSPs and GPUs are already in-market, and HP has expressed interest in adding FPGA-based cartridges in the future. Dell has partnered with Convey Systems on an x86/fpga server appliance to accelerate image processing for hyperscale customers. IBM is expected introduce an optimized data analytics solution stack in the coming months based on POWER8 servers and FPGA offload acceleration technology. Altera and Xilinix are both members of the OpenPOWER Foundation. Call to Action Scale-out datacenter customers should determine which workloads could benefit from acceleration technologies versus general purpose processors. FPGAs may be a good match to address the long tail of high-value/low-volume applications where a custom ASIC design is too expensive but other accelerator options are insufficient. Moore s Law will allow for continued improvement in FPGA gate capacity and sophistication making it possible for FPGAs to address additional scale-out datacenter workloads over time. Server solutions that include FPGA acceleration can democratize algorithm acceleration by providing low cost, accessible, reconfigurable acceleration platforms to enable advanced product simulation and analytics for developing IoT devices and for deploying back-end services. Those who determine FPGAs to be an option for their datacenter workloads or product development environments should evaluate the available use cases and workload studies from leading end users and work with the leading vendors on prototypes of their FPGA-based solutions. Technology providers of FPGA-based server solutions must prepare users for mainstream adoption over the next several years. They must provide robust, production-quality implementations that are not performance-constrained by system architecture, coupled with intuitive, integrated development environments and tools to make FPGA programming accessible to mainstream application programmers. Page 4 Moving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim? 2 December 2014

Important Information About This Brief Inquiries Please contact us if you would like to discuss this report, and Moor Insights & Strategy will promptly respond. Citations This note or paper can be cited by accredited press and analysts, but must be cited incontext, displaying author s name, author s title and Moor Insights & Strategy. Nonpress and non-analysts must receive prior written permission by Moor Insights & Strategy for any citations. Licensing This document, including any supporting materials, is owned by Moor Insights & Strategy. This publication may not be reproduced, distributed, or shared in any form without Moor Insights & Strategy's prior written permission. Disclosures Moor Insights & Strategy provides research, analysis, advising, and consulting to many high-tech companies mentioned in this paper. No employees at the firm hold any equity positions with any companies cited in this document. DISCLAIMER The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and typographical errors. Moor Insights & Strategy disclaims all warranties as to the accuracy, completeness, or adequacy of such information and shall have no liability for errors, omissions, or inadequacies in such information. This document consists of the opinions of Moor Insights & Strategy and should not be construed as statements of fact. The opinions expressed herein are subject to change without notice. Moor Insights & Strategy provides forecasts and forward-looking statements as directional indicators and not as precise predictions of future events. While our forecasts and forward-looking statements represent our current judgment on what the future holds, they are subject to risks and uncertainties that could cause actual results to differ materially. You are cautioned not to place undue reliance on these forecasts and forward-looking statements, which reflect our opinions only as of the date of publication for this document. Please keep in mind that we are not obligating ourselves to revise or publicly release the results of any revision to these forecasts and forward-looking statements in light of new information or future events. 2014 Moor Insights & Strategy. Company and product names are used for informational purposes only and may be trademarks of their respective owners. Page 5 Moving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim? 2 December 2014