Datacenter architectures Paolo Giaccone Notes for the class on Router and Switch Architectures Politecnico di Torino January 205
Outline What is a data center 2 Datacenter traffic 3 Routing and addressing basic schemes 4 Interconnection topologies 5 Hot issues Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 2 / 33
What is a data center Section What is a data center Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 3 / 33
What is a data center Scenarios Applications cloud computing cloud storage web services Consolidation of computation and networ resources Very large data centers,000-0,000-00,000 servers Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 4 / 33
What is a data center Cloud computing USA National Institute of Standards and Technologies (NIST) definition Cloud computing is a model for enabling convenient, on-demand networ access to a shared pool of configurable computing resources (e.g., networs, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. http://csrc.nist.gov/publications/nistpubs/800-45/sp800-45.pdf Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 5 / 33
What is a data center Cloud computing services Application as Service (AaaS) Provides on-demand applications over the Internet. No control on networ, servers, OS, individual application capabilities, etc. E.g., online sales application Platform as Service (PaaS) Provides platform layer resources, e.g. operating system support and software development framewors E.g., Google App Engine (Go, Java, Python, PHP), Microsoft Windows Azure (C#, Visual Basic, C++) Infrastructure as Service (IaaS) Provides on-demand infrastructural resources, usually in terms of Virtual Machine E.g., Amazon Elastic Compute Cloud (EC2), Microsoft Windows Azure Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 6 / 33
What is a data center Virtual machines (VM) Possible scenarios OS image (e.g., Linux distribution) LAMP image (Linux + Apache + MySql +PHP) Implementation 0-00 VMs on the same server, with their own IP and MAC addresses VM migration Migrate the entire VM state to achieve load-balancing / statistical multiplexing route requests to servers with better bandwidth towards the clients avoid heat hot-spots adapt to different power availability/costs Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 7 / 33
Datacenter traffic Section 2 Datacenter traffic Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 8 / 33
Data traffic Datacenter traffic Internet North-South traffic Data Center East-West traffic Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 9 / 33
Datacenter traffic Intra-DC traffic (East-West) storage replication (few flows, many data) in hadhoop, at least 3 copies of the same data, usually two in the same rac and one in another rac VM migration Networ Function Virtualization (NFV) data is processed through a sequence of VM (e.g., firewall, web server, parental control, accounting server) East-West traffic is usually larger than North-South traffic A byte transaction in North-South traffic generates on average a 00 bytes transaction in East-West traffic Cisco s Global Cloud Index tells us that, unlie in campus networs, the dominant volume of traffic in the DC traverses in an East-West direction (76%), followed by North-South traffic (7%), and finally, inter-dc traffic, which is currently comprises only at 7%, but is gradually growing. In campus networs, traffic is primarily (90+%) North-South traffic. (http://blogs.cisco.com/security/trends-in-data-center-security-part--traffic-trends, May 204) Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 0 / 33
Datacenter traffic East-West traffic patterns Unicast point-to-point communication e.g., VM migration, data bacup, stream data processing Multicast one-to-many communication e.g., software update, data replication ( 3 copies per content) for reliability, OS image provision for VM Incast many-to-one communication e.g., reduce phase in MapReduce, merging tables in databases Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 / 33
Routing and addressing basic schemes Section 3 Routing and addressing basic schemes Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 2 / 33
Routing and addressing basic schemes Addressing and routing in a data center Scenario 00,000 servers, 32 VM each 3 0 6 MAC and IPs Addressing and routing schemes Two possible (opposite) options: layer 3 data center 2 layer 2 data center Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 3 / 33
Routing and addressing basic schemes Layer-2 data center One single LAN How to deal with very large forwarding tables in switches lots of broadcast traffic (e.g. ARP) loop (unfortunately) no TTL in Ethernet pacets Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 4 / 33
Routing and addressing basic schemes Layer-3 data center One subnet per VLAN How to deal with many DHCP servers and VLANs very large number of switch/routers (around 0,000) OSPF in each router manual administrator configuration and oversight routing loops (fortunately) TTL in IP pacet VM migration when changing LAN, a new IP address is required and existing TCP connections brea Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 5 / 33
Routing and addressing basic schemes Practical solutions VXLAN scalable tunneling scheme similar to VLAN LISP provides IP address mobility across layer-3 subnets but many other solutions: FabricPath, TRILL, NVGRE, OTV, Shortest Path Bridging (SPB), etc. Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 6 / 33
Routing and addressing basic schemes Software Defined Networing (SDN) SDN Control plane is centralized and the networ can be easily programmed SDN is the common approach to control and manage a datacenter centralized control allows a coherent networ view very useful for the networ control and management custom application to enhance networ functions networ automation and traffic engineering (e.g., load balancing, resource consolidation policies, restoration, etc.) traffic monitoring and streering Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 7 / 33
Interconnection topologies Section 4 Interconnection topologies Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 8 / 33
Interconnection topologies Server pacing in a rac Standard 9 inch rac 42 EIA Units (pizza box) To the other switches N B Gbit/s ToR switch 40 server blades possible single /26 subnet ToR (Top of Rac) switch usually NB = nb example b Gbit/s server blade n 40 ports @ Gbit/s to the servers 4 ports @ 0 Gbit/s to the other switches Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 9 / 33
Interconnection topologies Interconnection among racs Leaf and spine Two stage interconnections Leaf: ToR switch Spine: dedicated switches Spine Leaves ToR ToR ToR Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 20 / 33
Interconnection topologies From Clos to Leaf and Spine topology 2 ports ports Clos networs each switching module is unidirectional each path traverses 3 modules Leaf and spine each switching module is bidirectional each path traverses either or 3 modules Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 2 / 33
Interconnection topologies From unidirectional to bidirectional networs Unidirectional Banyan (butterfly) networ Bidirectional butterfly networ Pictures taen from Interconnection Networs: An Engineering Approach, by Duato, Yalamanchili and Ni, 2003 Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 22 / 33
Examples of designs Interconnection topologies 3072 servers 3072 ports at 0 Gbit/s 30.72 Tbit/s alternative designs 96 switches with 64 ports and 32 switches with 96 ports 96 switches with 64 ports and 8 switches with 384 ports 64p 96p 64p 0 Gbit/s 0 Gbit/s 4 32 32 32 32 96 384 384p 32 96 32 96 32 32 96 32 384 8 Example taen from Cisco s Massively Scalable Data Center, 2009 Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 23 / 33
Examples of designs Interconnection topologies 644 servers 644 ports at 0 Gbit/s 6.44 Tbit/s alternative designs 92 switches with 64 ports and 32 switches with 92 ports 92 switches with 64 ports and 6 switches with 384 ports 64p 92p 64p 0 Gbit/s 0 Gbit/s 2 32 32 32 6 92 92 384p 32 92 32 92 32 32 92 6 92 6 Example taen from Cisco s Massively Scalable Data Center, 2009 Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 24 / 33
Interconnection topologies Recursive Leaf and Spine 2 ports 2 ports Leaf with 2 2 bidirectional ports 2 modules of size Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 25 / 33
Interconnection topologies Physical infrastructure POD (Point of Delivery) A module or group of networ, compute, storage, and application components that wor together to deliver a networ service. The PoD is a repeatable pattern, and its components increase the modularity, scalability, and manageability of data centers. (taen from Wiipedia) Definition taen from http://www.cisco.com/c/en/us/products/collateral/switches/ nexus-2000-series-fabric-extenders/data_sheet_c78-507093.html#_ftn Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 26 / 33
Interconnection topologies Intra-pod and inter-pod communications Design 2 P servers 2P switches with 2 ports 2 switches with P ports choose P = 2 2 3 servers 5 2 switches with 2 ports 2 ports 2 ports Pod P ports P ports P ports 2 ports 2 ports P ports 2 Pod P Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 27 / 33
Example of designs Interconnection topologies Data center with 65,536 servers in 64 pods 65,536 ports at 0 Gbit/s 655 Tbit/s P = 64 pods, = 32 in total 5, 20 switches with 64 ports P ports 2 ports 2 ports P ports Pod P ports 2 ports 2 ports P ports 2 Pod P Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 28 / 33
Interconnection topologies Other topologies Many other topologies have been devised. See for example: A. Hammadi, L. Mhamdi, A survey on architectures and energy efficiency in Data Center Networs, Computer Communications, March 204, http://www.sciencedirect.com/science/article/pii/s04036643002727 M.F. Bari, R. Boutaba, E. Esteves, L.Z. Granville, M. Podlesny, M.G. Rabbani, Qi Zhang, M.F. Zhani, Data Center Networ Virtualization: A Survey, IEEE Communications Surveys & Tutorials, 203 Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 29 / 33
Hot issues Section 5 Hot issues Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 30 / 33
Hot issues Energy consumption Datacenters are one of the largest and fastest growing consumers of electricity In 200, collectively datacenters consumed around.5% of the total electricity used world-wide (J. Koomey. Growth in Data Center Electricity Use 2005 to 200, 20. Analytic Press) Green data centers Datacenters partially or completely powered by renewables (e.g., solar, wind energy) self-generation: use their own renewable energy co-location: use the energy from existing nearby plants Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 3 / 33
Hot issues Hybrid optical-electronic datacenter Optical networs slow switching (only at flow level, at > ms scale) very high bandwidth and low energy consumption Electronic networs fast switching (at pacet level, at ns scale) high bandwidth but high energy consumption Main idea Deploy two interconnection networs in parallel optical networ for elephant flows (i.e., fast circuit switching) electronic networ for mice flows (pacet switching) Giaccone (Politecnico di Torino) Datacenter architectures Jan. 205 32 / 33
Hot issues Open compute project http://www.opencompute.org/ open datacenter architecture, sponsored by Faceboo Design Guide for Photonic Architecture 3.4 Interconnect mechanical specifications (connectors) electric powering, cooling methodology storage and server implementation leaf-and-spine architecture Design Guide for Photonic Architecture Figure 3.: Open Rac with Optical Interconnect. In this architectural concept the green lines represent optical fiber cables terminated with the New Photonic Connector. They connect the various compute systems within the rac to the Top of Rac (TOR) switch. The optical fibers could contain up to 64 fibers and still support the described New Photonic Connector mechanical guidelines. Giaccone (Politecnico di Torino) to Compute and IO Nodes The Distributed Switch functionality which was described in the previous section relies upon a collection of switching nodes interconnected through a high bandwidth, reduced form factor cable to reduce the impact of the cabling and interconnects on the system. Shown in Figure 3.4 below is one particular implementation of this scheme envisioned as part of this Photonically Enabled Architecture. In this case three compute trays are connected with a lower speed electrical interconnect, based on either PCIe or Ethernet, to a mezzanine board where the networ traffic is aggregated. In this aggregation step various signal conditioning or switching functions may be enabled in order to route the signals to and from the correct destinations. The non-local networ traffic is then sent through a Silicon Photonics module through a New Photonic Connector cable solution to the final destination, which could consist of a ToR switch, a spline switch, or an adjacent node in a distributed switching architecture. Figure 3.4: An example of a Photonically Enabled Architecture in an Open Compute mechanical arrangement using a Mezzanine Fiber In this concept the New Photonic Connector cable concept is used to enable a reduced cable burden, and front panel access, through the use of silicon photonics modules and the modular architectural concepts which were discussed earlier. Datacenter architectures Jan. 205 33 / 33