SeaMicro SM10000-64 Server



Similar documents
How To Build A Cloud Server For A Large Company

UCS M-Series Modular Servers

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

AMD SEAMICRO OPENSTACK BLUEPRINTS CLOUD- IN- A- BOX OCTOBER 2013

Michael Kagan.

Reference Design: Scalable Object Storage with Seagate Kinetic, Supermicro, and SwiftStack

Support a New Class of Applications with Cisco UCS M-Series Modular Servers

Data Sheet FUJITSU Server PRIMERGY CX272 S1 Dual socket server node for PRIMERGY CX420 cluster server

Broadcom Ethernet Network Controller Enhanced Virtualization Functionality

Cisco s Massively Scalable Data Center. Network Fabric for Warehouse Scale Computer

The Future of Computing Cisco Unified Computing System. Markus Kunstmann Channels Systems Engineer

Enabling Technologies for Distributed and Cloud Computing

Towards Rack-scale Computing Challenges and Opportunities

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Virtualised MikroTik

APACHE HADOOP PLATFORM HARDWARE INFRASTRUCTURE SOLUTIONS

Cost Efficient VDI. XenDesktop 7 on Commodity Hardware

Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak

Enabling Technologies for Distributed Computing

Storing Big Data The Rise of the Storage Cloud

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

FPO. Expanding Intel Architecture Flexibility in the Data Center. Markus Leberecht Data Center Solutions Architect, Intel EMEA March 20, 2013

The Future of Cloud Networking. Idris T. Vasi

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

Brocade and EMC Solution for Microsoft Hyper-V and SharePoint Clusters

Why 25GE is the Best Choice for Data Centers

Brocade Solution for EMC VSPEX Server Virtualization

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

Oracle Exadata: The World s Fastest Database Machine Exadata Database Machine Architecture

White Paper Solarflare High-Performance Computing (HPC) Applications

A Platform Built for Server Virtualization: Cisco Unified Computing System

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine

Network Functions Virtualization Using Intel Ethernet Multi-host Controller FM10000 Family

Scaling from Datacenter to Client

PSAM, NEC PCIe SSD Appliance for Microsoft SQL Server (Reference Architecture) September 11 th, 2014 NEC Corporation

MESOS CB220. Cluster-in-a-Box. Network Storage Appliance. A Simple and Smart Way to Converged Storage with QCT MESOS CB220

Private cloud computing advances

PCI Express Impact on Storage Architectures. Ron Emerick, Sun Microsystems

Pioneer DreamMicro. Blade Server S75 Series

Block based, file-based, combination. Component based, solution based

Data Center and Cloud Computing Market Landscape and Challenges

Building All-Flash Software Defined Storages for Datacenters. Ji Hyuck Yun Storage Tech. Lab SK Telecom

HPC Update: Engagement Model

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

VMware Virtual SAN 6.2 Network Design Guide

High-performance vswitch of the user, by the user, for the user

David Lawler Vice President Server, Access & Virtualization Group

I/O Performance of Cisco UCS M-Series Modular Servers with Cisco UCS M142 Compute Cartridges

Cisco Datacenter 3.0. Datacenter Trends. David Gonzalez Consulting Systems Engineer Cisco

Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband

Get More Scalability and Flexibility for Big Data

Hadoop Size does Hadoop Summit 2013

Evaluation of Dell PowerEdge VRTX Shared PERC8 in Failover Scenario

How To Run Apa Hadoop 1.0 On Vsphere Tmt On A Hyperconverged Network On A Virtualized Cluster On A Vspplace Tmter (Vmware) Vspheon Tm (

Storage Architectures. Ron Emerick, Oracle Corporation

IBM System x family brochure

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System

Data Communications Hardware Data Communications Storage and Backup Data Communications Servers

CONFIGURATION CONCEPTS SUN SPARC ENTERPRISE M-SERIES SERVERS. James Hsieh, Sun Systems Group. Sun BluePrints Online

Cloud Optimize Your IT

Hadoop on the Gordon Data Intensive Cluster

How To Build A Cisco Ukcsob420 M3 Blade Server

VTrak SATA RAID Storage System

PCI Express Overview. And, by the way, they need to do it in less time.

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Simplifying the Data Center Network to Reduce Complexity and Improve Performance

Deploying Ceph with High Performance Networks, Architectures and benchmarks for Block Storage Solutions

Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers

HP Moonshot System. Table of contents. A new style of IT accelerating innovation at scale. Technical white paper

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

PCI Express Impact on Storage Architectures and Future Data Centers

Platfora Big Data Analytics

Pivot3 Reference Architecture for VMware View Version 1.03

50. DFN Betriebstagung

Networking Virtualization Using FPGAs

Architectural Comparison: Cisco UCS and the Dell FX2 Platform

Introduction to Cloud Design Four Design Principals For IaaS

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

Performance characterization report for Microsoft Hyper-V R2 on HP StorageWorks P4500 SAN storage

Netvisor Software Defined Fabric Architecture

PCI Express and Storage. Ron Emerick, Sun Microsystems

IVA & UCS. Frank Stott UCS Sales Specialist frstott@cisco.com Cisco and/or its affiliates. All rights reserved.

Ericsson Introduces a Hyperscale Cloud Solution

Intel Ethernet Switch Converged Enhanced Ethernet (CEE) and Datacenter Bridging (DCB) Using Intel Ethernet Switch Family Switches

EDUCATION. PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation

QuickSpecs. Models HP MSR Open Application Platform (OAP) with VMware vsphere MIM Module

Pluribus Netvisor Solution Brief

Ethernet Fabric Requirements for FCoE in the Data Center

Xeon+FPGA Platform for the Data Center

Big Data Performance Growth on the Rise

OPTIMIZING SERVER VIRTUALIZATION

Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation

HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads

CUTTING-EDGE SOLUTIONS FOR TODAY AND TOMORROW. Dell PowerEdge M-Series Blade Servers

Transcription:

SeaMicro SM10000-64 Server Building Datacenter Servers Using Cell Phone Chips Ashutosh Dhodapkar, Gary Lauterbach, Sean Lie, Ashutosh Dhodapkar, Gary Lauterbach, Sean Lie, Dhiraj Mallick, Jim Bauman, Sundar Kanthadai, Toru Kuzuhara, Gene Shen, Min Xu, Chris Zhang

Overview Power in the Datacenter Application Trends SeaMicro Architecture CPU Selection Interconnect Fabric I/O Virtualization ti Management Software Application Performance Summary Hot Chips 2011 2

Power: The Issue in the Datacenter Power is the largest Op-Ex item for an Internet company; >30% of Op Ex Volume servers consume 1% of the electricity in the US More than $3 Billion dollars per year * Datacenters reaching power limits Reducing power will extend life of existing datacenters saving 100 s of millions of dollars in CapEx * 2007 EPA Report to Congress on Server and Data Center Efficiency, Public Law 109-43 Power Provisioning for a Warehoused Sized Compute Hot Chips 2011 3

Cloud s Killer Apps Compute moving to server side Clients primarily for display: smart phones, tablets, etc. Free to users - cost amortized over large user base Optimizing datacenter TCO is critical to profitability Costs: bandwidth, power, servers, switching, storage, firewalls, load balancers, buildings, management Growing exponentially Datacenters facing new challenges around logistics of super-size Killer apps are collections of small, simple, and bursty workloads have big data sets, high concurrency, lots of communication Hot Chips 2011 4

SeaMicro Architecture Cluster in a box: integrated compute, storage, and network Large number of inexpensive, energy-efficient Cell Phone CPUs interconnected using a low-latency, high-bandwidth fabric Unlike multi-core, provides natural scaling of all system resources: O/S threads, networking stack Memory bandwidth and capacity NICs and network bandwidth Disk controllers, disk bandwidth/capacity Purpose built for Cloud s Killer Apps: architecture maps well to internet workloads Hot Chips 2011 5

SeaMicro Architecture: CPU Purpose built for Cloud s Killer Apps Collection of small, simple, and bursty workloads Big data sets, high concurrency, lots of communication CPU: Intel Atom, a Cell Phone CPU Better match the workload Leverage efficiencies of Scale Down: operate at a more efficient point on the energy-performance curve * Derive cost advantage from smaller silicon area and higher volumes Superior Performance/Watt/$ compared to server class CPUs * Adapted from Azizi et al. Energy-Performance Tradeoffs in Processor Architecture and Circuit Design: A Marginal Cost Analysis, ISCA 2010 Hot Chips 2011 6

SeaMicro Architecture: Fabric Purpose built for Cloud s Killer Apps Collection of small small, simple simple, and b bursty rst workloads orkloads Big data sets, high concurrency, lots of communication 512 CPUs interconnected using a high bandwidth fabric in a 3D torus topology High scalability: distributed architecture based on low-power ASICs High bandwidth: 1.28Tbps Low-latency: < 5us between any two nodes High resiliency: multitude of paths between two nodes allowing g easy y routing g around failures Hot Chips 2011 7

SeaMicro Architecture: I/O Virtualization Virtualized I/O devices Network/storage shared and amortized over large number of CPUs Improves utilization and enables optimizations, e.g. shared partitions Reduces components in the system, thus improving cost/power SeaMicro Ethernet FPGA SeaMicro ASIC CPU + Chipset SeaMicro Storage FPGA Ethernet Cards (Up to 8 per system each with 8x1 GbE or 2x10 GbE) Storage Cards (Upto8per per system each with 8xSATA HDD/SSD) Hot Chips 2011 8

SeaMicro ASIC 90nm G TSMC technology 15 mm 2 per node 289 pin plastic BGA package Low power: <1W per node Key Features: Fabric switch I/O virtualization Clock generation Node management Hot Chips 2011 9

SeaMicro Fabric 512 logical fabric nodes ASIC contains 2 fabric nodes CPU Y Z CPU Y Z Each node has 6 fabric links (2.5Gb/s SERDES) to neighboring nodes (2X, 2Y, 2Z) 1 PCIe link to a CPU Crossbar to switch between 7 links X Z Y ASIC Z CPU Y X Nodes connected in an 8x8x8 3D Torus Each dimension is an 8-node loop Total bandwidth is 1.28 Tbps High path diversity for redundancy PCI e Node 1G NIC SATA HBA LPC BIOS UART Fabric is cut-through, loss-less, deadlock dl free, has multiple l QOS levels l Fabric Crossbar Fabric links Hot Chips 2011 10

I/O Virtualization Node presents 4 virtual devices to CPU PCIe: Ethernet adapter, 4 SATA disks LPC (ISA): BIOS, UART CPU Node packetizes data from CPU and routes itthrough hthe fabric to multiple l shared I/O cards Ethernet traffic is routed to other nodes or network I/O cards with uplink ports Each node is a port on a distributed switch. Internal MAC addresses are hidden behind I/O card Table at ingress port on I/O card provides fabric destination Nodes keeps track of destination ports for external MACs. Fabric address encoded in Internal MAC PCI e Ethernet adapter Node SATA disks BIOS Fabric Crossbar LPC UART Disk, BIOS, and Console requests are routed to storage I/O cards which hold up to 8 SATA disks Fabric links (X,Y,Z) Hot Chips 2011 11

I/O Aggregation g Cards Bridge between the fabric and I/O Connected to the fabric in the Y-dimension Terminate fabric protocol on one side and talk to I/O devices on the other 8x1Gb/2x10Gb Ethernet Ports E-Card Two types of I/O cards E-Card: 1G/10G network connectivity S-Card: SATA storage, BIOS, Console Nodes Any node can talk to any I/O card 1 E-Card and 1 S-Card per Z-plane S-Card 8xHDD/SSD Hot Chips 2011 12

I/O Card Architecture I/O FPGA PCIe Control Plane Processor 8x Fabric Links Co-designed architecture HW/SW boundary flexible: optimized for performance High speed dt datapathsth implemented in FPGA Control plane implemented in microprocessor FPGA enables rapid feature enhancement based on customer feedback. Power/cost amortized over 100 s of nodes Hot Chips 2011 13

SM10000-64 Server 10 RU chassis Midplane Ethernet Dual mid-plane fabric interconnect Interconnect Card 64 compute cards 4 ASICs/card: 512 fabric nodes Storage Card 4-6 Dual-core CPUs/card: 512-768 cores 4GB DRAM/CPU: 1-1.5TB DRAM 1-8 shared Ethernet cards: Up to 160 Gb/s external connectivity 1-8 shared Storage cards: Up to 64 SATA/SSD drives Shared infrastructure: t N+1 redundant Power supplies, fans, management Ethernet and console HDD/SSD Compute Card Hot Chips 2011 14

Management Software Implements key real-time services Fabric routing: fault isolation and failover Ethernet control plane (MAC/IP learning, IGMP) Layer4 load balancer management Terminal server Power supply and fan control Configuration, Management, and Monitoring i Integrated DHCP server CLI and SNMP interfaces for network/storage/server management Performance/Fault monitoring tools Hot Chips 2011 15

Benchmark: Apache Bench SM10000-64 consumes 1/4 th the power, for equivalent performance Apache 2.2 with Apache Bench. CPUs running CentOS 5.4 Retrieve 16KB files over 10 min. System Configuration SeaMicro SM10000-64 256 x Dual Core 1.66GHz Atom processors 1U Xeon Server Industry Standard Dual Socket Quad Core Xeon L5630 2.13GHz Systems Under Test 1 45 Apache Throughput/Sec 1,005,056 056 1,005,120 120 Apache Request File Size 16KB 16KB System Power 2,490W 10,090W Space consumed in Racks 10 RU 45 RU Hot Chips 2011 16

Summary: SM10000-64 Internet workloads are increasingly moving compute to the server side. Minimizing Power and thus TCO of datacenters is critical to internet businesses. The SM10000-64 is a major step forward to address the challenges of the datacenter Provides a 4x reduction in power/space for equivalent performance, compared to traditional 1RU servers. Shipping in volume 768 Intel Atom cores, 1.5 TB DRAM, 1.28Tbps fabric 64 SATA/SSD disks, 160Gbps uplink network bandwidth Integrated load balancer and management SW Hot Chips 2011 17

Thank You!