The All Programmable SoC FPGA for Networking and Computing in Big Data Infrastructure Ivo Bolsens, Senior Vice President & CTO Page 1
Moore s Law: The Technology Pipeline Page 2
Industry Debates on Transistor Cost Page 3
Design Cost Estimated Chip Design Cost, by Process Node, Worldwide, 2011 Design cost ($M) Mask cost ($M) 28/22-nm 32-nm Embedded software ($M) Yield ramp-up cost ($M) 45-nm 65-nm 90-nm 130-nm 28nm = 2x 45nm cost > $170 M 180-nm ($ Million) 0 45 90 135 180 Page 4
Growing Problems for ASIC & ASSP Offerings >50% of Top 16 ASSP Vendors Losing Money Growing ASSP Gaps Eroding customer confidence in vendors High cost burden from over design for diverse needs No ability to differentiate or customize Source Public reports, Xilinx estimates Page 5
Trend Mobile Infrastructure: Scalable Platforms Capacity Indoor Outdoor Macrocell Macrocell + Active Antennas Microcell Picocell Residential Femto 4-16 Users <100mW Home Enterprise Femto 30-60 Users <250mW Office 30-200 Users <1W Dense Indoor (Malls, Transport Hubs) Urban Infill ~200 Users <1-10W Single Sector Wide Area ~200 Users/ sector 20-100W Multi-Sector Wide Area 2-3x Data Capacity of RRU Coverage Page 6
Trend Services : Different Figures of Merit Latency (ms) Computing GB, Latency (s) Storing GB, BW (mean, var) Streaming GB, BW (mean, var) latency (ms) Gaming Communicating Latency (ms), BW (mean)
Trend Wired Infrastructure: Software Defined Networks Slide credit: From Virtualizing the Net by Jon Turner (2004) Page 8
Software Defined Networking gains industry mindshare The best thing about OpenFlow or SDN, is that it s brought back a new hope to networking Networking is cool again- Jayshree, CEO - Arista Networks
Trend Data Center Infrastructure: Cloud Computing Big Data Increasing Volume, Velocity, and Variety Low power Reduce operation and cooling costs Security Both outside and inside Page 10
Impact of trends (1) Networking New network fabrics Faster, Fatter, and Flatter Software defined networking Software control plane Hardware data plane Content-aware networking Deep packet inspection Enhanced security Page 11
Impact of trends (2) Compute ARM-based microservers Improved performance per watt Hybrid SoC CPU+accelerators+fabric Cost and power reduction Larger memory Hybrid NVRAM and DRAM Latency reduction Page 12
Impact of trends (3) Storage Specialized functions Compression, encryption, memcached Custom SSD controllers Higher performance Reduced latency Data-aware storage Integrated database support Offload from processor Page 13
Future Data Centre Architecture Internet Intelligent Appliances (router, firewall) Intelligence in the network Further convergence Fewer tiers FibreChannel new topologies (torus NIC) Rack TOR TOR Convergence SAN Server Server Server Server Low latency NICs Network attached storage Server Server Direct-attached Direct-attached storage storage Page 14
Core Network Coherent Memory Bus I/O Bus and Fabrics Core Network Generic Data Center QPI AXI ACE PCIe Fabrics DRAM DRAM x86 40/80G NIC SOC Ethernet Switch 100/400G I/O Attach Accel Direct FPGA Attach Storage Storage Ctrl NVM NVM H D H D Network FPGA Attach Storage Storage Ctrl NVM NVM H D H D Memory Bus Coherency Lowest Latency I/O Bus PCIe Core Network Public and Private Ethernet Page 15
Core Network Coherent Memory Bus I/O Bus and Fabrics Core Network The New Data Center DRAM DRAM x86 QPI AXI ACE PCIe Fabrics FPGA NIC 40/80G Networking Security, DPI Low latency switch/bridge Custom fabrics SDN control plane DRAM Zynq SOC ARM Fabric Bridge Node Ctrl DRAM DRAM NVM FPGA FPGA Acc FPGA Acc Ethernet FPGA Switch 100/400G NVM Compute Transcode Search / Database DSP Filters Large Memory Hybrid NVM/DRAM ARM FPGA Storage FPGA Ctrl NVM NVM NVM NVM ARM FPGA NVM NVM H D H D Storage Custom SSD Low latency Memcache appliance Data aware storage Encrypt, Compress Page 16
Trend : More Intelligence in Embedded Systems SMART Data Center Revolution New Opportunities to Control Costs and Increase Strategic Advantage MACHINES THAT UNDERSTAND Smart wireless networks to the rescue Carriers are turning toward more intelligent network management Smart Factories For factory management in the future, it will become essential to strive to implement smart capabilities The Next Big, Digital Economy; Smart Energy The energy market is undergoing a major transformation Page 17 Page 17
Programmable & Smart Across All Markets Wireless Comms Wired Comms Data Center Embedded All Programmable Multiple Spectrums Multiple Standards (LTE, 3G) Multiple Levels of QoS Network Function Virtualization (NFV) Multiple Stds (400Gb etc) Dynamic QoS Provisioning Software Defined Networks (SDN) Multiple Stds (FCoE, iscsi ) Config Storage (SAN, NAS, SSD ) Changing Resolutions (MPixel, Fps) Emerging Video Stds (UHD, 8K/4K) Evolving Video Processing Algorithms Smarter Self Organizing Networks (SON) Cognitive Radio Smart Antenna Context Aware Network Services Self-Healing Networks Video Caching at the Edge Data Pre-Processing & Analytics Virtualized Resource Optimization Intelligent Appliances Object Detection & Analytics Automotive Collision Avoidance Industrial Machine Vision Page 18
Industry Mandates Programmable Imperative Programmable Systems Integration Insatiable Intelligent Bandwidth Page 19
The All Programmable Platform Security : Bit level operations Packet Processing : Wide Datapaths DSP Processing : Pipelined Datapaths Graphics Processing : Parallel Micro-Engines System Management : Finite State machines
The Heterogeneous MPSoC Heterogeneous Connected Scalable Parallel Configurable uengine uengine uengine ARM multicore Accelerator Accelerator Accelerator MicroBlaze micro engines X86 or ARM Platform Host Accelerator Accelerator Accelerator DSP engines Dedicated Accelerators
The Era of Heterogeneous Processing Unit Page 22
The UltraSCALE FPGA SoC Page 23
Programming the Heterogeous SoC Page 24
Virtual Cache Coherency Virtual CPU + FPGA Use Models FPGA 0 Pipelined datapath HDL programmed Control Processor FPGA 1 Pipelined datapath with SW control CPU sets register values CPU Memory 2 CPU + FPGA co-processing FPGA part of explicit address space FPGA CPU $ Memory 3 CPU + FPGA peer processing Cache Coherency FPGA $ Page 25
CPU + FPGA Evolution IO-Connected Coherent Integrated QPI ARM AXI PCIe Page 26
Complexity of building MPSoC systems High-Speed Analog Design High-Speed Interface Design SOC System Assembly DSP Hardware Design Software Development Requires distinct design skills!
Platform IP Integrator Radio pipeline Radio pipeline CPU Build/ Re-Use IP Subsystems Link/Assemble Subsystems Generate SW Drivers
Data Movement Interconnect HW/SW Design Flow CPU FPGA C-compiler Concurrent SW C-synthesis Application SW-drivers Middleware Libraries Platform AXI Hardware Wires CPU Video Codec Encryption Memory LTE Modem Page 29
High Level Synthesis (HLS) Create IP from C/C++/System C algorithm specification Abstract algorithm verification to the specification level Traditional FPGA design experience not required Page 30
Quality From of Results C Algorithm to FPGA Implementation Video frames/second 200 196 150 FPGA resources 100 6 4 50 0 51 DSP C2FPGA 2 0 RTL C2FPGA FPGA: >38 times better performance than DSP video processor QOR: C2FPGA equal to or better than RTL synthesis Ease-of-use: C2FPGA 2x fewer lines of C code than DSP processor Page 31
Programming Heterogeneous Multi-core OpenCL Hardware / Software partitioning & interfacing C Compile / Debug ARM Processor HS-SW Interfacing Domain Specific API C-HLS Accelerator synth FPGA A9 A9 Commercial Software Ecosystem Video codec Encryption Packet Processing FFT Application-Specific Search Page 32
Data Movement Interconnect HW/SW Design Flow: SW Programmer View CPU FPGA C-compiler Concurrent SW C-synthesis Application SW-drivers Middleware Libraries AXI Hardware Wires CPU Video Codec Encryption Memory LTE Modem Page 33 Application Programming
Programmable Platform: CPU + FPGA Peer Processing Core Core Accel Accel Logic L1 Cache L1 Cache L1 Cache L1 Cache Shared L2 Cache Coherency Engine Capabilities Coherent Caches for HW Coherency Engine Over NOC Interconnect Coherent Caches for SW Coherency Management DDR MemCon SRAM I/O Device DMA Coherency Benefits: Peer Processing: Direct Cache-2-Cache data movement Latency: Very low latency access to CPU (FPGA) data Usability: No SW cache flush needed Page 34
Domain Specific Abstractions IP Abstraction IPI Automation
ZED Board ZED Board Zynq Evaluation and Development Kit Low cost Zynq based community board (XC7Z020) Partnership between Avnet, Digilent, Xilinx Digilent will fulfill academic market for Xilinx University Program wwwzedboardorg Open source SW and IP Linux Eclipse based IDE Vivado HLS: C to FPGA Reference designs Page 36
Conclusions New Markets Require Heterogeneous Multi-Core SoC Modern FPGA are All Programmable SoC Software Centric Design Flow Becoming Possible Democratizing SoC Design : Targeted Teaching Platform Page 37