VP/GM, Data Center Processing Group. Copyright 2014 Cavium Inc.

VP/GM, Data Center Processing Group

Trends Disrupting Server Industry Public & Private Clouds Compute, Network & Storage Virtualization Application Specific Servers Large end users designing server HW optimized for their applications ODM Direct Model

Legacy Server Applications Single threaded or limited multi-threaded program(s) Workload performance primarily dependent on CPU/memory performance 1000s of applications used by 1000s of users Virtualization used to improve server utilization Managed by traditional IT Traditional benchmarks

Cloud Applications = New Paradigm Of Computing Highly Distributed the System(s) are the Computer Shared-nothing architectures - Distributed Data and Distributed Computation Many node environments Highly parallel add more threads, go faster Multiple OS instances fault tolerance in SW BIG DATA Large and highly distributed data sets Nodes are often special purpose Cloud Applications can benefit from a New Class of Servers New class of servers requires new class of benchmarks Copyright 2014 Cavium. Confidential.

Need for Workload Optimized Servers ONE APPLICATION used by 10M+ USERS Multiple applications consolidated in MULTI-TENANT SERVER FARM ERP Server MySQL Mail + FTP Web Service Media Streaming CRM Server SharePoint Video Media SQL Server Office365

Example Cloud Workloads Workload Graph Search Web Caching Media Serving Web Serving Data Analytics Distributed Search Distributed Storage Data Serving Example/Use Case Social media data analysis (e.g. GraphLab, Giraph) Memcached Video server e.g. DASH servers LAMP + Java/Tomcat/Ruby Hadoop (Mahout, Nutch) Elastic Search Ceph (Object/Block) and HDFS (File) NoSQL type databases (e.g. Cassandra, Hbase, )

Managed Public Key Infrastructure (MPKI) 160 140 Cloud Workloads are Different Example #1 Very different instruction miss rates mpki SpecINT2006 120 100 Scaleout workloads 80 60 40 20 0 data caching data serving map reduce media streaming web front end web search specint tpc-c tpc-e Source data from : A Case for Specialized Processors for Scale-Out Workloads Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, Babak Falsafi, In IEEE Micro's Top Picks, 2014 Copyright 2014 Cavium. Confidential.

Instruction Per Cycle (ipc) 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 Cloud Workloads are Different Example #2 IPC Traditional Benchmarks CPU Intensive Source data from : A Case for Specialized Processors for Scale-Out Workloads Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, Babak Falsafi, In IEEE Micro's Top Picks, 2014Mike

Cloud Workloads are Different Example #3 Performance Sensitivity LLC & L2 Caches Source data from : A Case for Specialized Processors for Scale-Out Workloads Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, Babak Falsafi, In IEEE Micro's Top Picks, 2014Mike

Optimum Choice and size of Caches are different for scale out workloads Lower IPC Less parallelism available Implications to Processor Design Less benefit for Aggressive, out-of-order, wide issue machines Scaleout highly parallel nature, more independent processing cores. Large number of more efficient cores provide lower power and more performance for Scale Out Workloads

How Efficient Can It Be? Complex Single Core Multiple Cores Can fit about multiple cores in area of one complex core For Scale Out workloads, multiple cores provide the best performance/unit area / watt

Cloud Workloads Demand Optimized Integration Traditional CPU vendor SoC vendor System vendor Cloud vendor CPU CPU CPU CPU Memory Memory Memory Memory Storage Storage Storage Storage Network I/O Network I/O Network I/O Network I/O Data Center I/O Data Center I/O Data Center I/O Data Center I/O Cloud Benchmarks need to address more than CPU and memory Need to include efficiency of storage and network functions and IO Challenge remains to benchmark at scale

Introducing Family of Workload Optimized Processors Up to 48 custom ARMv8 cores @ 2.5GHz 1S and 2S configuration Upto 4 DDR3/4 Memory Controllers Family Specific I/O s Standards based low latency Enet fabric virtsoc : Low latency end to end virtualization Family Specific Accelerators 4 workload optimized families: 40 GbE/ 40 GbE/ 100 GbE 100 GbE 10/40 100GbE PCIe Gen3 PCIe Gen3 PCIe Gen3 Enet Fabric Security Up to 48 2.5GHz ARM64 Cores 16MB Cache Sub System Up to 4x 72-bit DDR3/4 Controllers ThunderX_CP: Private/Public cloud, web search, web serving, web caching ThunderX_ST: Cloud storage, Analytics, Distributed Databases ThunderX_NT: Telco servers, NFV apps ThunderX_SC: Secure cloud servers SATAv3 Other IO Workload Accelerators Cavium Coherent Processor Interconnect (CCPI )

Processors for Next Gen Data Centers Public & Private Clouds Highest VM density, Highest VM performance High core count, high memory bandwidth & low latency virtsoc - core to IO low latency virtualization Integrated high bandwidth, low latency network & storage IO Compute, Network and storage virtualization virtsoc - Full virtualization of core, network and storage IO Virtualization Custom network, storage IO for each target workload Custom hardware accelerators for compute, networking, storage and security Application Specific Servers

Summary Cloud is revolutionizing next generation data center Most cloud applications are open source, Java and PHP are key programming environments this eliminates barriers for alternative architectures Large core count multi core SoCs with integrated network and storage IO and integrated purpose built cloud accelerators benefit cloud applications