Moving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim? Successful FPGA datacenter usage at scale will require differentiated capability, programming ease, and scalable implementation models Executive Summary General-purpose server processors are reaching diminishing returns limits, as performance-per-watt improvements slow and workloads become more specialized. Certain workload classes are open to acceleration by compute offload or alternative (non-cpu) architectures including digital signal processors (DSP), graphics processing units (GPU), field programmable gate arrays (FPGA), and custom logic. While these accelerators historically have been attached to CPUs via offload interconnects, they increasingly are being integrated onto system-on-chip (SoC) designs. As these technologies mature, Moor Insights & Strategy believes that datacenter workloads deployed at scale will use application-specific acceleration models. FPGAs are gaining momentum in product prototyping and could address the long tail of high-value/low-volume production applications where custom logic is too expensive but other accelerator options are insufficient. Industry leaders (Microsoft, Baidu, Intel, leading global server OEMs such as HP and Dell, etc.) are driving usage models and future technology around FPGA optimization for datacenter workloads. To drive mainstream FPGA adoption in the datacenter, technology providers must develop robust, production-quality implementations that are not performanceconstrained by system architecture. And they must provide intuitive, integrated development environments and tools to make FPGA programming accessible to mainstream application programmers. Application Acceleration in the Datacenter Software defined infrastructure (SDI) models provide large scale datacenters with opportunities to deploy custom hardware solutions optimized for each workload. Moor Insights & Strategy believes that workloads deployed at scale are moving to an application-specific acceleration model. Where multiple racks are dedicated to specific workloads, the initial purchasing efficiency (capital expense) of buying generic IT infrastructure is outweighed by the lifetime operating efficiency (operating expense) of buying fewer, more expensive resources that perform the same task at lower power consumption and less floor space. Some datacenter operators will pay more up-front for equipment that runs specific workloads faster and more efficiently. Certain workload classes are open to acceleration by a range of non-cpu architectures. DSPs and specialized vector processing units were used in high-performance computing in the 1980s and 1990s. DSPs are still favored by legacy telecommunications workloads for their signal processing capabilities. GPUs pulled ahead in the 2000s, as PC gaming drove vendors to improve their rendering pipeline to a more general purpose flow control with a programming model also capable of Page 1 Moving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim? 2 December 2014
compute offload. GPUs continue to work well for processing a large number of small tasks running in parallel (SIMD instructions). FPGA offload acceleration for server workloads is now emerging as a potential solution for complex CPU-like tasks that GPUs cannot handle and for non-standard functions that create CPU and DSP bottlenecks. Moore s Law as applied to FPGA technology has allowed FPGAs to move from glue logic and quick fixes to a more complex set of general purpose logic addressing broader use cases. As technology advances continue, more workloads will open up to FPGA acceleration. Advances in FPGA technology allow for a more powerful class of autonomous, reconfigurable processors with highspeed interfaces that eventually could replace standard general-purpose servers for specific workloads. A growing number of emerging acceleration alternatives are available, and analytics algorithms, application packages, and environments are evolving rapidly. For example, the Spark analytics framework is gaining momentum but did not exist three years ago. Moor Insights & Strategy believes that several acceleration architecture winners will emerge over the next five to ten years based on the wide range of workload-specific requirements. Figure 1 illustrates potential models for application acceleration implementation. This framework is directional and workload-dependent. Figure 1: Application Acceleration Implementation Models Are FPGAs in the Datacenter Ready for Prime Time? Key benefits of an FPGA-based solution over other workload acceleration solutions are flexibility for use with multiple functions and reprogrammability. A programmer can program an FPGA to perform one set of functions (e.g., graphics) and then reprogram it for something entirely different (e.g., pattern recognition). While not as fast or efficient as custom application-specific integrated circuits (ASICs), FPGAs can offer order-ofmagnitude performance gains for specific workloads without requiring an expensive custom design plus the added benefit of re-configurability. Page 2 Moving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim? 2 December 2014
Like anything else subject to Moore s Law, FPGA manufacturing densities and costs are improving over time. They provide enough gates for pattern recognition and analytics as a part of a server SoC or in a peer compute environment with a CPU (rather than architectures that require PCIe add-in cards which may limit performance due to system latency). Until now, the primary FPGA use case has been to accelerate emulation for advanced product prototype, design, simulation, and test environments. However, these systems were cost prohibitive and inaccessible to all but the most advanced of product development organizations. Now, as mainstream servers with FPGA acceleration come to market, access to the power of FPGAs will be democratized providing the ability for a broad base of consumer and industrial Internet of Things (IoT) product organizations to build specialized logic for close to real-time simulation. Further, specific workloads and segments are being identified as mass deployment candidates for FPGA-based solutions, including textual search, machine learning, image processing, cryptology, seismic analysis, signal processing, Monte Carlo analysis, MapReduce, and Memcached. A primary inhibitor to adopting FPGA-based computing solutions at scale in the datacenter is programmability. Successful FPGA programming used to depend on C/C++ language programming skills, low level hand-coding in RTL, and manual tuning, all combined with deep insight into compute and applications architectures. Supporting a deployment of reconfigurable hardware at scale will require a software stack capable of detecting failures while providing a seamless interface to software applications. Key industry leaders experimenting with FPGAs today believe that incorporating domain-specific languages (such as Scala or OpenCL), FPGA targeted C-to-gates tools (such as AutoESL or Impulse C), and libraries of reusable components and design patterns will allow FPGAs to target high-value workloads in the near term. These tools, along with more integrated development environments, are beginning to provide FPGA programming capability to mainstream application programmers. Moor Insights & Strategy believes FPGAs are quickly reaching a point of adoption for datacenter workloads with improved performance/power efficiencies for CPU-like tasks, lower costs, and easier programmability for application programmers. Key Players to Watch A number of leaders across the industry including large service providers, silicon providers, and system manufacturers are driving usage models and future technology around optimizing reconfigurable processors for datacenter workloads. Microsoft conducted a large scale pilot deployment of 1,600+ Open Compute Servers with FPGAs (code-named Catapult) for the company s Bing page-ranking service. The results were 2x improvement in search throughput and 29% reduction in search processing latency. Microsoft expects to deploy this solution to all Bing servers in one datacenter in 2015. Baidu, a leading search engine provider in China, has worked with both Altera and Xilinx on FPGA solutions to accelerate deep neural networks for machine learning Page 3 Moving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim? 2 December 2014
applications. Under various workloads, Baidu found that the FPGA boards were several times more efficient than either a CPU or GPU. Altera plans to integrate FPGA technology with 64-bit ARMv8 cores. Altera demonstrated an FPGA chip with Intel s Quick Path Interconnect (QPI) connecting to an Intel Xeon processor. Xilinx has FPGA integrated with ARM 32-bit cores in-market, and also FPGA accelerators for Intel Xeon systems based on the QPI interface. Intel announced it intends to integrate FPGA capability with a Xeon E5 class product via QPI in a single package that will fit into a standard E5 socket. SRC Computers offers a line of reconfigurable processing engines designed to accelerate performance on high-performance computing and hyperscale workloads. SRC s FPGA solution is autonomous, dynamically reconfigurable, and peer-based (not a CPU-dependent offload engine). HP intends to productize solutions with offload accelerators and alternative compute technologies (e.g., ARMv8-based servers) in the Moonshot product family. Moonshot is focused on optimized application performance to improve datacenter economics over traditional servers for scale-out workloads. Moonshot cartridges with DSPs and GPUs are already in-market, and HP has expressed interest in adding FPGA-based cartridges in the future. Dell has partnered with Convey Systems on an x86/fpga server appliance to accelerate image processing for hyperscale customers. IBM is expected introduce an optimized data analytics solution stack in the coming months based on POWER8 servers and FPGA offload acceleration technology. Altera and Xilinix are both members of the OpenPOWER Foundation. Call to Action Scale-out datacenter customers should determine which workloads could benefit from acceleration technologies versus general purpose processors. FPGAs may be a good match to address the long tail of high-value/low-volume applications where a custom ASIC design is too expensive but other accelerator options are insufficient. Moore s Law will allow for continued improvement in FPGA gate capacity and sophistication making it possible for FPGAs to address additional scale-out datacenter workloads over time. Server solutions that include FPGA acceleration can democratize algorithm acceleration by providing low cost, accessible, reconfigurable acceleration platforms to enable advanced product simulation and analytics for developing IoT devices and for deploying back-end services. Those who determine FPGAs to be an option for their datacenter workloads or product development environments should evaluate the available use cases and workload studies from leading end users and work with the leading vendors on prototypes of their FPGA-based solutions. Technology providers of FPGA-based server solutions must prepare users for mainstream adoption over the next several years. They must provide robust, production-quality implementations that are not performance-constrained by system architecture, coupled with intuitive, integrated development environments and tools to make FPGA programming accessible to mainstream application programmers. Page 4 Moving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim? 2 December 2014
Important Information About This Brief Inquiries Please contact us if you would like to discuss this report, and Moor Insights & Strategy will promptly respond. Citations This note or paper can be cited by accredited press and analysts, but must be cited incontext, displaying author s name, author s title and Moor Insights & Strategy. Nonpress and non-analysts must receive prior written permission by Moor Insights & Strategy for any citations. Licensing This document, including any supporting materials, is owned by Moor Insights & Strategy. This publication may not be reproduced, distributed, or shared in any form without Moor Insights & Strategy's prior written permission. Disclosures Moor Insights & Strategy provides research, analysis, advising, and consulting to many high-tech companies mentioned in this paper. No employees at the firm hold any equity positions with any companies cited in this document. DISCLAIMER The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and typographical errors. Moor Insights & Strategy disclaims all warranties as to the accuracy, completeness, or adequacy of such information and shall have no liability for errors, omissions, or inadequacies in such information. This document consists of the opinions of Moor Insights & Strategy and should not be construed as statements of fact. The opinions expressed herein are subject to change without notice. Moor Insights & Strategy provides forecasts and forward-looking statements as directional indicators and not as precise predictions of future events. While our forecasts and forward-looking statements represent our current judgment on what the future holds, they are subject to risks and uncertainties that could cause actual results to differ materially. You are cautioned not to place undue reliance on these forecasts and forward-looking statements, which reflect our opinions only as of the date of publication for this document. Please keep in mind that we are not obligating ourselves to revise or publicly release the results of any revision to these forecasts and forward-looking statements in light of new information or future events. 2014 Moor Insights & Strategy. Company and product names are used for informational purposes only and may be trademarks of their respective owners. Page 5 Moving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim? 2 December 2014